hots.plugins.clustering.*

The clustering logic is implemented by several plugins in the hots.plugins.clustering package.

Clustering builder utilities for HOTS.

hots.plugins.clustering.builder.assign_new_containers_to_nearest_cluster(clust_mat: DataFrame, label_col: str = 'cluster') → DataFrame[source]

For any row with cluster == -1, assign it to the cluster of its nearest existing container.

Mutates and returns clust_mat.

hots.plugins.clustering.builder.build_adjacency_matrix(labels_)[source]

Build the adjacency matrix of clustering.

Parameters:: labels (List) – List of clusters assigned to individuals
Returns:: Adjacency matrix
Return type:: np.array

hots.plugins.clustering.builder.build_matrix_indiv_attr(df: DataFrame, tick_field: str, indiv_field: str, metrics: list, id_map: dict) → DataFrame[source]: Build a container×time matrix from individual‐level DataFrame.

hots.plugins.clustering.builder.build_post_clust_matrices(clust_mat)[source]: Build result clustering dataframes and matrices to be used.

hots.plugins.clustering.builder.build_pre_clust_matrices(df, tick_field, indiv_field, metrics, id_map, clustering, new_containers: bool = False)[source]: Build period clustering dataframes and matrices to be used.

hots.plugins.clustering.builder.build_similarity_matrix(mat: DataFrame) → DataFrame[source]: Compute pairwise Euclidean distance matrix from input matrix.

hots.plugins.clustering.builder.build_var_delta_cluster_matrix(df_clust, cluster_var_matrix, *, zero_diag=True)[source]: Build variance of deltas matrix from cluster.

hots.plugins.clustering.builder.change_clustering(mvg_containers, clustering, dict_id_c: dict, tol_open_clust: float | None = None)[source]: Reassign each container in mvg_containers to the closest existing cluster (by Euclidean distance to the cluster mean profile).

hots.plugins.clustering.builder.cluster_mean_profile(df_clust: DataFrame, cluster_col: str = 'cluster') → ndarray[source]: Compute the mean profile of each cluster.

hots.plugins.clustering.builder.dist_from_mean(df_clust, profiles, cid: str) → float[source]: Return distance from cid to its cluster mean profile.

hots.plugins.clustering.builder.get_far_container(c1, c2, df_clust: DataFrame, profiles: ndarray) → str[source]: Return c1 if it’s farther from its cluster mean than c2 is, else return c2.

hots.plugins.clustering.builder.pairwise_sum_profile_var(profiles: ndarray) → ndarray[source]: Compute a matrix of variance of sum of profiles for each pair of cluster.

Clustering plugin: mini‐batch KMeans streaming.

class hots.plugins.clustering.kmeans.StreamKMeans(params: dict[str, Any], instance)[source]

Bases: ClusteringPlugin

StreamKMeans plugin using scikit‐learn’s MiniBatchKMeans.

fit(df: DataFrame) → Series[source]: Rebuild and fit a MiniBatchKMeans on the current data, then return labels. This avoids any mismatch in expected feature dimension.

Clustering plugin: agglomerative (hierarchical) clustering.

class hots.plugins.clustering.hierarchical.HierarchicalClustering(parameters: dict, instance)[source]

Bases: ClusteringPlugin

Hierarchical clustering plugin using SciPy linkage.

fit(df: DataFrame) → Series[source]: Fit hierarchical clusters and return zero‐indexed labels.

Clustering plugin: spectral clustering with precomputed affinity.

class hots.plugins.clustering.spectral.SpectralClustering(parameters: dict, instance)[source]

Bases: ClusteringPlugin

Spectral clustering plugin using a precomputed similarity matrix.

fit(df: DataFrame) → Series[source]: Fit spectral model and return cluster labels.

Clustering plugin: custom spectral clustering for HOTS.

class hots.plugins.clustering.custom_spectral.CustomSpectralClustering(parameters: dict, instance)[source]

Bases: ClusteringPlugin

Custom spectral clustering plugin using normalized Laplacian.

fit(df: DataFrame) → Series[source]: Compute labels by eigen-decomposing the normalized Laplacian.