User manual

Preparing the parameters

The configuration file is the only mandatory argument to run hots. It is a JSON file passed to the CLI with the --config option.

A minimal example using the file connector is the following:

{
  "time_limit": null,
  "clustering": {
    "method": "kmeans",
    "nb_clusters": 3,
    "parameters": {
      "nb_clusters": 3
    }
  },
  "optimization": {
    "backend": "pyomo",
    "parameters": {
      "solver": "glpk",
      "verbose": 0
    }
  },
  "problem": {
    "type": "placement",
    "parameters": {
      "initial_placement": 0,
      "tol": 0.1,
      "tol_move": 0.5
    }
  },
  "connector": {
    "type": "file",
    "parameters": {
      "data_folder": "./tests/data/thesis_ex_10",
      "file_name": "container_usage.csv",
      "host_field": "machine_id",
      "individual_field": "container_id",
      "tick_field": "timestamp",
      "tick_increment": 2,
      "window_duration": 3,
      "sep_time": 3,
      "metrics": ["cpu"],
      "outfile": "./out/moves.log"
    }
  },
  "logging": {
    "level": "INFO",
    "filename": "./out/hots.log",
    "fmt": "%(asctime)s %(levelname)s: %(message)s"
  },
  "reporting": {
    "results_folder": "./out",
    "metrics_file": "./out/metrics.csv",
    "plots_folder": "./out/plots"
  }
}

Configuration sections

The top-level keys of the configuration file map to the fields of hots.config.loader.AppConfig:

  • time_limit: maximum wall-clock time (in seconds) for the application run. If null, hots processes all available data.

  • clustering: configuration of the clustering plugin.

    • method: name of the clustering algorithm to use, e.g. "kmeans", "hierarchical", "spectral".

    • nb_clusters: target number of clusters.

    • parameters: free-form dict of method-specific parameters.

  • optimization: configuration of the optimization backend.

    • backend: optimization backend to use. Currently "pyomo" is supported and maps to hots.plugins.optimization.pyomo_model.PyomoModel.

    • parameters: solver-related parameters such as:

      • solver: solver name (e.g. "glpk").

      • verbose: integer verbosity level.

  • problem: configuration of the business (domain) problem plugin (see hots.plugins.problem.placement).

    • type: problem type. The default implementation is "placement".

    • parameters: problem-specific parameters, for example:

      • initial_placement: whether to compute an initial placement (0 or 1).

      • tol: tolerance used to decide when a node is overloaded.

      • tol_move: tolerance for deciding when to move a container.

  • connector: configuration of the data connector plugin.

    • type: connector type. The built-in types are:

      • "file": read data from CSV files.

      • "kafka": read data from a Kafka topic.

    • parameters: connector-specific parameters. For both connectors, the following keys are usually required:

      • data_folder: folder containing the input data.

      • file_name: CSV file containing container-level metrics.

      • individual_field: column name for container IDs.

      • host_field: column name for host (node) IDs.

      • tick_field: column name for timestamps.

      • tick_increment: step between two timestamps in the stream.

      • window_duration: window size used for sliding-window analysis.

      • sep_time: time separating the analysis period from the running period.

      • metrics: list of metric names to use (e.g. ["cpu"]).

      • outfile: file where proposed moves will be written.

      For the Kafka connector, the following additional keys can be used:

      • bootstrap.servers: Kafka bootstrap servers string.

      • topics: list of topic names used to publish moves.

      • connector_url: optional URL of an external connector service.

  • logging: logging configuration used by hots.utils.logging_config.setup_logging().

    • level: log level ("DEBUG", "INFO", …).

    • filename: log file path.

    • fmt: log message format.

  • reporting: configuration of result files and plots.

    • results_folder: base output folder.

    • metrics_file: CSV file for aggregated metrics.

    • plots_folder: folder where plots will be saved.

Additional top-level keys

For convenience, some examples also define data_folder, individual_field, host_field, tick_field and metrics at the top level. These are redundant copies of the values stored inside connector.parameters and are kept for backward compatibility.

Preparing data

If you use historical data, the inputs are provided through 3 CSV files hosted in the same directory:

  • container_usage.csv : describes containers resource consumption

  • node_meta.csv : provides nodes capacities (and other additional data)

  • node_usage.csv : describes nodes resource consumption

Each file have the following formats :

  • container_usage.csv :

    timestamp

    container_id

    metric_1

    metric_2

    machine_id

    t1

    c_10

    10

    50

    m_2

    tmax

    c_48

    6.5

    24

    m_5

  • node_meta.csv :

    machine_id

    metric_1

    metric_2

    m_2

    30

    150

    m_5

    24

    80

  • container_usage.csv :

    timestamp

    machine_id

    metric_1

    metric_2

    t1

    m_2

    25

    65

    tmax

    m_5

    17.5

    52

Note that the file node_usage.csv is not mandatory : if it does not exist in the directory, it will be built using container_usage.csv data.

Running the app

Once the configuration file is prepared, hots is started by:

hots --config /path/to/config.json

The --config (or -c) option is mandatory and must point to a valid JSON configuration file.

You can view the available command-line options with:

hots --help

Typical options include the standard --help and --version flags. All runtime behaviour is controlled by the configuration file rather than individual CLI flags (number of clusters, window size, problem type, connector, etc.).

Output explanation

With the execution of HOTS, the global process is displayed in the terminal and the following output and logs files are created:

  • logs.log: logs on main process (which loop, which step in the loop…)

  • clustering_logs.log: logs on clustering computes at each loop

  • optim_logs.log: information on optimization models solving

  • results.log: temporary results at each loop (number of changes, objective value…)

  • global_results.csv: final results for identified business criteria

  • loop_results.csv: multiple indicators at each loop (clustering criteria, conflict graph information…)

  • node_results.csv: final nodes related results (average / minimum / maximum loads)

  • times.csv: intermediate times for each step (preprocess + all steps for each loop)

  • node_usage_evo.csv: numerical nodes consumption evolution, since HOTS launch until HOTS stop

  • node_usage_evo.svg: graphical nodes consumption evolution, since HOTS launch until HOTS stop