User manual

Preparing the parameters

The configuration file is the only mandatory argument to run hots. It is a JSON file passed to the CLI with the --config option.

A minimal example using the file connector is the following:

{
  "time_limit": null,
  "clustering": {
    "method": "kmeans",
    "nb_clusters": 3,
    "parameters": {
      "nb_clusters": 3
    }
  },
  "optimization": {
    "backend": "pyomo",
    "parameters": {
      "solver": "glpk",
      "verbose": 0
    }
  },
  "problem": {
    "type": "placement",
    "parameters": {
      "initial_placement": 0,
      "tol": 0.1,
      "tol_move": 0.5
    }
  },
  "connector": {
    "type": "file",
    "parameters": {
      "data_folder": "./tests/data/thesis_ex_10",
      "file_name": "container_usage.csv",
      "host_field": "machine_id",
      "individual_field": "container_id",
      "tick_field": "timestamp",
      "tick_increment": 2,
      "window_duration": 3,
      "sep_time": 3,
      "metrics": ["cpu"],
      "outfile": "./out/moves.log"
    }
  },
  "logging": {
    "level": "INFO",
    "filename": "./out/hots.log",
    "fmt": "%(asctime)s %(levelname)s: %(message)s"
  },
  "reporting": {
    "results_folder": "./out",
    "metrics_file": "./out/metrics.csv",
    "plots_folder": "./out/plots"
  }
}

Configuration sections

The top-level keys of the configuration file map to the fields of hots.config.loader.AppConfig:

time_limit: maximum wall-clock time (in seconds) for the application run. If null, hots processes all available data.
clustering: configuration of the clustering plugin.
- method: name of the clustering algorithm to use, e.g. "kmeans", "hierarchical", "spectral".
- nb_clusters: target number of clusters.
- parameters: free-form dict of method-specific parameters.
optimization: configuration of the optimization backend.
- backend: optimization backend to use. Currently "pyomo" is supported and maps to hots.plugins.optimization.pyomo_model.PyomoModel.
- parameters: solver-related parameters such as:
  - solver: solver name (e.g. "glpk").
  - verbose: integer verbosity level.
problem: configuration of the business (domain) problem plugin (see hots.plugins.problem.placement).
- type: problem type. The default implementation is "placement".
- parameters: problem-specific parameters, for example:
  - initial_placement: whether to compute an initial placement (0 or 1).
  - tol: tolerance used to decide when a node is overloaded.
  - tol_move: tolerance for deciding when to move a container.
connector: configuration of the data connector plugin.
- type: connector type. The built-in types are:
  - "file": read data from CSV files.
  - "kafka": read data from a Kafka topic.
- parameters: connector-specific parameters. For both connectors, the following keys are usually required:
  - data_folder: folder containing the input data.
  - file_name: CSV file containing container-level metrics.
  - individual_field: column name for container IDs.
  - host_field: column name for host (node) IDs.
  - tick_field: column name for timestamps.
  - tick_increment: step between two timestamps in the stream.
  - window_duration: window size used for sliding-window analysis.
  - sep_time: time separating the analysis period from the running period.
  - metrics: list of metric names to use (e.g. ["cpu"]).
  - outfile: file where proposed moves will be written.
  For the Kafka connector, the following additional keys can be used:
  - bootstrap.servers: Kafka bootstrap servers string.
  - topics: list of topic names used to publish moves.
  - connector_url: optional URL of an external connector service.
logging: logging configuration used by hots.utils.logging_config.setup_logging().
- level: log level ("DEBUG", "INFO", …).
- filename: log file path.
- fmt: log message format.
reporting: configuration of result files and plots.
- results_folder: base output folder.
- metrics_file: CSV file for aggregated metrics.
- plots_folder: folder where plots will be saved.

Additional top-level keys

For convenience, some examples also define data_folder, individual_field, host_field, tick_field and metrics at the top level. These are redundant copies of the values stored inside connector.parameters and are kept for backward compatibility.

Preparing data

If you use historical data, the inputs are provided through 3 CSV files hosted in the same directory:

container_usage.csv : describes containers resource consumption
node_meta.csv : provides nodes capacities (and other additional data)
node_usage.csv : describes nodes resource consumption

Each file have the following formats :

container_usage.csv :

timestamp

container_id

metric_1

metric_2

machine_id

t1

c_10

10

50

m_2

…

…

…

…

…

tmax

c_48

6.5

24

m_5
node_meta.csv :

machine_id

metric_1

metric_2

m_2

30

150

m_5

24

80
container_usage.csv :

timestamp

machine_id

metric_1

metric_2

t1

m_2

25

65

…

…

…

…

tmax

m_5

17.5

52

Note that the file node_usage.csv is not mandatory : if it does not exist in the directory, it will be built using container_usage.csv data.

Running the app

Once the configuration file is prepared, hots is started by:

hots --config /path/to/config.json

The --config (or -c) option is mandatory and must point to a valid JSON configuration file.

You can view the available command-line options with:

hots --help

Typical options include the standard --help and --version flags. All runtime behaviour is controlled by the configuration file rather than individual CLI flags (number of clusters, window size, problem type, connector, etc.).

Output explanation

With the execution of HOTS, the global process is displayed in the terminal and the following output and logs files are created:

logs.log: logs on main process (which loop, which step in the loop…)
clustering_logs.log: logs on clustering computes at each loop
optim_logs.log: information on optimization models solving
results.log: temporary results at each loop (number of changes, objective value…)
global_results.csv: final results for identified business criteria
loop_results.csv: multiple indicators at each loop (clustering criteria, conflict graph information…)
node_results.csv: final nodes related results (average / minimum / maximum loads)
times.csv: intermediate times for each step (preprocess + all steps for each loop)
node_usage_evo.csv: numerical nodes consumption evolution, since HOTS launch until HOTS stop
node_usage_evo.svg: graphical nodes consumption evolution, since HOTS launch until HOTS stop

timestamp	container_id	metric_1	metric_2	machine_id
t1	c_10	10	50	m_2
…	…	…	…	…
tmax	c_48	6.5	24	m_5

machine_id	metric_1	metric_2
m_2	30	150
m_5	24	80

timestamp	machine_id	metric_1	metric_2
t1	m_2	25	65
…	…	…	…
tmax	m_5	17.5	52