hots.plugins.clustering.*

The clustering logic is implemented by several plugins in the hots.plugins.clustering package.

Clustering builder utilities for HOTS.

hots.plugins.clustering.builder.assign_new_containers_to_nearest_cluster(clust_mat: DataFrame, label_col: str = 'cluster') DataFrame[source]

For any row with cluster == -1, assign it to the cluster of its nearest existing container.

Mutates and returns clust_mat.

hots.plugins.clustering.builder.build_adjacency_matrix(labels_)[source]

Build the adjacency matrix of clustering.

Parameters:

labels (List) – List of clusters assigned to individuals

Returns:

Adjacency matrix

Return type:

np.array

hots.plugins.clustering.builder.build_matrix_indiv_attr(df: DataFrame, tick_field: str, indiv_field: str, metrics: list, id_map: dict) DataFrame[source]

Build a container×time matrix from individual‐level DataFrame.

hots.plugins.clustering.builder.build_post_clust_matrices(clust_mat)[source]

Build result clustering dataframes and matrices to be used.

hots.plugins.clustering.builder.build_pre_clust_matrices(df, tick_field, indiv_field, metrics, id_map, clustering, new_containers: bool = False)[source]

Build period clustering dataframes and matrices to be used.

hots.plugins.clustering.builder.build_similarity_matrix(mat: DataFrame) DataFrame[source]

Compute pairwise Euclidean distance matrix from input matrix.

hots.plugins.clustering.builder.build_var_delta_cluster_matrix(df_clust, cluster_var_matrix, *, zero_diag=True)[source]

Build variance of deltas matrix from cluster.

hots.plugins.clustering.builder.change_clustering(mvg_containers, clustering, dict_id_c: dict, tol_open_clust: float | None = None)[source]

Reassign each container in mvg_containers to the closest existing cluster (by Euclidean distance to the cluster mean profile).

hots.plugins.clustering.builder.cluster_mean_profile(df_clust: DataFrame, cluster_col: str = 'cluster') ndarray[source]

Compute the mean profile of each cluster.

hots.plugins.clustering.builder.dist_from_mean(df_clust, profiles, cid: str) float[source]

Return distance from cid to its cluster mean profile.

hots.plugins.clustering.builder.get_far_container(c1, c2, df_clust: DataFrame, profiles: ndarray) str[source]

Return c1 if it’s farther from its cluster mean than c2 is, else return c2.

hots.plugins.clustering.builder.pairwise_sum_profile_var(profiles: ndarray) ndarray[source]

Compute a matrix of variance of sum of profiles for each pair of cluster.

Clustering plugin: mini‐batch KMeans streaming.

class hots.plugins.clustering.kmeans.StreamKMeans(params: dict[str, Any], instance)[source]

Bases: ClusteringPlugin

StreamKMeans plugin using scikit‐learn’s MiniBatchKMeans.

fit(df: DataFrame) Series[source]

Rebuild and fit a MiniBatchKMeans on the current data, then return labels. This avoids any mismatch in expected feature dimension.

Clustering plugin: agglomerative (hierarchical) clustering.

class hots.plugins.clustering.hierarchical.HierarchicalClustering(parameters: dict, instance)[source]

Bases: ClusteringPlugin

Hierarchical clustering plugin using SciPy linkage.

fit(df: DataFrame) Series[source]

Fit hierarchical clusters and return zero‐indexed labels.

Clustering plugin: spectral clustering with precomputed affinity.

class hots.plugins.clustering.spectral.SpectralClustering(parameters: dict, instance)[source]

Bases: ClusteringPlugin

Spectral clustering plugin using a precomputed similarity matrix.

fit(df: DataFrame) Series[source]

Fit spectral model and return cluster labels.

Clustering plugin: custom spectral clustering for HOTS.

class hots.plugins.clustering.custom_spectral.CustomSpectralClustering(parameters: dict, instance)[source]

Bases: ClusteringPlugin

Custom spectral clustering plugin using normalized Laplacian.

fit(df: DataFrame) Series[source]

Compute labels by eigen-decomposing the normalized Laplacian.