.. _kpis: =================== HOTS KPIs Reference =================== This page describes the performance indicators (KPIs) emitted by the HOTS evaluation pipeline. KPIs are produced **per step** (typically per streaming window / tick) as a flat dictionary of scalar values. Conventions =========== - A *step* corresponds to one iteration of the scheduling/evaluation loop (e.g. one tick or window). - KPI keys are grouped using prefixes: - ``clst__*`` : clustering change (vs previous step) - ``clst_struct__*`` : clustering structure at current step - ``plc__*`` : placement change (vs previous step) - ``plc_delta__*`` : placement change counts (vs previous step) - ``moves__*`` : moves returned / applied by ``adjust()`` - ``clust_conf__*`` : conflict graph KPIs for clustering stage - ``place_conf__*`` : conflict graph KPIs for placement stage - ``load__*`` : host-load / balance KPIs from the working window Some KPIs may be missing for some steps (for example, the first step has no previous state; some conflict graphs may have no weights). ------------------------------------------------------------ Step Metadata ============= These fields are not KPIs strictly speaking, but they are strongly recommended to be present in every metrics row to enable proper analysis. - ``run_id`` (str) Unique identifier for the run (e.g. timestamp-based). - ``loop_nb`` (int) Index of the evaluation loop step. - ``tick`` (int) Tick identifier for the step (if using a single-tick model). - ``tick_start`` (int) Start tick of the window (if using windows). - ``tick_end`` (int) End tick of the window (if using windows). - ``window_duration`` (int or float) Window length in ticks or time units. ------------------------------------------------------------ ``clst__*`` — Clustering Change KPIs ==================================== These KPIs compare cluster assignments between consecutive steps. Let: - ``labels_now[i]`` be the cluster label of individual ``i`` at the current step - ``labels_prev[i]`` be the cluster label of individual ``i`` at the previous step Metrics - Clustering Change ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - ``clst__changed_count`` (float) Number of individuals whose cluster assignment changed compared to the previous step. Definition:: sum_i 1(labels_now[i] != labels_prev[i]) - ``clst__changed_ratio`` (float) Fraction of individuals whose cluster assignment changed. Definition:: changed_count / n Interpretation - Clustering Change ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ High values indicate unstable or frequently changing clustering across time. ------------------------------------------------------------ ``clst_struct__*`` — Clustering Structure KPIs ============================================== These KPIs describe the cluster-size distribution at the current step. Let the cluster sizes be ``{s_1, s_2, ..., s_k}``, where ``k`` is the number of clusters. Metrics - Clustering Structure ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - ``clst_struct__n_clusters`` (float) Number of distinct clusters (``k``). - ``clst_struct__singleton_ratio`` (float) Fraction of clusters of size 1. Definition:: (# clusters with size 1) / k - ``clst_struct__size_min`` (float) - ``clst_struct__size_max`` (float) - ``clst_struct__size_mean`` (float) - ``clst_struct__size_std`` (float) Descriptive statistics of cluster sizes. - ``clst_struct__size_entropy`` (float) Shannon entropy (natural logarithm) of the cluster-size distribution. Higher values indicate more balanced cluster sizes. - ``clst_struct__size_gini`` (float) Gini coefficient of cluster sizes. ``0`` means perfectly equal sizes; higher values indicate inequality. Interpretation - Clustering Structure ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - High ``singleton_ratio`` suggests over-fragmentation. - High ``size_gini`` indicates a few large clusters and many small ones. ------------------------------------------------------------ ``plc__*`` — Placement Change KPIs ================================== These KPIs compare placement (host assignment) between consecutive steps. Let: - ``placement_now[id]`` be the assigned host for individual ``id`` at the current step - ``placement_prev[id]`` be the assigned host at the previous step Metrics - Placement Change ^^^^^^^^^^^^^^^^^^^^^^^^^^ - ``plc__moved_count`` (float) Number of individuals whose host assignment changed. - ``plc__moved_ratio`` (float) Fraction of individuals whose host assignment changed. Definition:: moved_count / n_common Interpretation - Placement Change ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ High values indicate placement churn or instability. ------------------------------------------------------------ ``plc_delta__*`` — Placement Delta Counts ========================================= Simple counts derived from comparing placements between steps. Metrics - Placement Delta ^^^^^^^^^^^^^^^^^^^^^^^^^ - ``plc_delta__moved_count`` (float) Number of individuals whose host assignment changed. - ``plc_delta__stable_count`` (float) Number of individuals whose host assignment remained unchanged. Interpretation - Placement Delta ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Useful for sanity checks and debugging placement behavior. ------------------------------------------------------------ ``moves__*`` — Applied Moves KPIs ================================= These KPIs summarize the result of ``problem.adjust(...)``. Metrics - Applied Moves ^^^^^^^^^^^^^^^^^^^^^^^ - ``moves__moves_count`` (float) Number of moves applied or returned by ``adjust()``. - ``moves__moves_ratio`` (float) Moves normalized by population size. Definition:: moves_count / n_indiv Interpretation - Applied Moves ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Comparing these KPIs with ``plc__moved_*`` can reveal differences between explicit adjustments and final placement changes. ------------------------------------------------------------ Conflict Graph KPIs =================== Conflict graphs represent coupling between individuals derived from step-to-step signals (e.g. dual changes). Two sets of KPIs are produced: - ``clust_conf__*`` : conflict graph from clustering stage - ``place_conf__*`` : conflict graph from placement stage Let: - ``n`` be the number of nodes - ``m`` be the number of edges Graph Structure Metrics ^^^^^^^^^^^^^^^^^^^^^^^ - ``*_conf__nodes`` (float) - ``*_conf__edges`` (float) - ``*_conf__density`` (float) Graph density:: m / (n * (n - 1) / 2) (for n > 1) Connectivity Metrics ^^^^^^^^^^^^^^^^^^^^ - ``*_conf__components`` (float) Number of connected components. - ``*_conf__largest_component_ratio`` (float) Size of the largest connected component divided by ``n``. Degree Statistics ^^^^^^^^^^^^^^^^^ - ``*_conf__deg_mean`` (float) - ``*_conf__deg_std`` (float) - ``*_conf__deg_p95`` (float) - ``*_conf__deg_max`` (float) Edge Weight Statistics (if weights exist) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - ``*_conf__w_sum`` (float) - ``*_conf__w_mean`` (float) - ``*_conf__w_p95`` (float) - ``*_conf__w_max`` (float) Interpretation - Conflict graph ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ High density, large components, or high-degree nodes indicate strong coupling between decisions and may correlate with instability or reconfiguration cost. ------------------------------------------------------------ ``load__*`` — Host Load / Balance KPIs ====================================== These KPIs summarize how a chosen metric (e.g. CPU usage) is distributed across hosts during the current working window. Let: - ``L_h`` be the aggregated load of host ``h`` over the window Metrics - Host ^^^^^^^^^^^^^^ - ``load__n_hosts`` (float) Number of hosts observed. - ``load__n_indiv`` (float) Number of individuals observed. - ``load__host_load_mean`` (float) - ``load__host_load_std`` (float) Mean and standard deviation of ``{L_h}``. - ``load__host_load_cv`` (float) Coefficient of variation:: std / mean (0 if mean == 0) - ``load__host_load_p95`` (float) - ``load__host_load_max`` (float) - ``load__host_load_gini`` (float) Gini coefficient of host loads. Interpretation - Host ^^^^^^^^^^^^^^^^^^^^^ - High ``cv`` or ``gini`` indicates poor load balance. - Tracking these KPIs over time helps assess scheduling effectiveness. ------------------------------------------------------------ Practical Notes =============== - The first step of a run may not contain change-based KPIs (``clst__*``, ``plc__*``). - Edge-weight KPIs may be zero if conflict graphs are unweighted. - Prefer storing only scalar KPIs in the main metrics table. Detailed event data (e.g. lists of moved containers) should be stored separately (e.g. JSONL).