User manual
Preparing the parameters
The configuration file is the only mandatory argument to run hots. It is a JSON
file passed to the CLI with the --config option.
A minimal example using the file connector is the following:
{
"time_limit": null,
"clustering": {
"method": "kmeans",
"nb_clusters": 3,
"parameters": {
"nb_clusters": 3
}
},
"optimization": {
"backend": "pyomo",
"parameters": {
"solver": "glpk",
"verbose": 0
}
},
"problem": {
"type": "placement",
"parameters": {
"initial_placement": 0,
"tol": 0.1,
"tol_move": 0.5
}
},
"connector": {
"type": "file",
"parameters": {
"data_folder": "./tests/data/thesis_ex_10",
"file_name": "container_usage.csv",
"host_field": "machine_id",
"individual_field": "container_id",
"tick_field": "timestamp",
"tick_increment": 2,
"window_duration": 3,
"sep_time": 3,
"metrics": ["cpu"],
"outfile": "./out/moves.log"
}
},
"logging": {
"level": "INFO",
"filename": "./out/hots.log",
"fmt": "%(asctime)s %(levelname)s: %(message)s"
},
"reporting": {
"results_folder": "./out",
"metrics_file": "./out/metrics.csv",
"plots_folder": "./out/plots"
}
}
Configuration sections
The top-level keys of the configuration file map to the fields of
hots.config.loader.AppConfig:
time_limit: maximum wall-clock time (in seconds) for the application run. Ifnull, hots processes all available data.clustering: configuration of the clustering plugin.method: name of the clustering algorithm to use, e.g."kmeans","hierarchical","spectral".nb_clusters: target number of clusters.parameters: free-form dict of method-specific parameters.
optimization: configuration of the optimization backend.backend: optimization backend to use. Currently"pyomo"is supported and maps tohots.plugins.optimization.pyomo_model.PyomoModel.parameters: solver-related parameters such as:solver: solver name (e.g."glpk").verbose: integer verbosity level.
problem: configuration of the business (domain) problem plugin (seehots.plugins.problem.placement).type: problem type. The default implementation is"placement".parameters: problem-specific parameters, for example:initial_placement: whether to compute an initial placement (0or1).tol: tolerance used to decide when a node is overloaded.tol_move: tolerance for deciding when to move a container.
connector: configuration of the data connector plugin.type: connector type. The built-in types are:"file": read data from CSV files."kafka": read data from a Kafka topic.
parameters: connector-specific parameters. For both connectors, the following keys are usually required:data_folder: folder containing the input data.file_name: CSV file containing container-level metrics.individual_field: column name for container IDs.host_field: column name for host (node) IDs.tick_field: column name for timestamps.tick_increment: step between two timestamps in the stream.window_duration: window size used for sliding-window analysis.sep_time: time separating the analysis period from the running period.metrics: list of metric names to use (e.g.["cpu"]).outfile: file where proposed moves will be written.
For the Kafka connector, the following additional keys can be used:
bootstrap.servers: Kafka bootstrap servers string.topics: list of topic names used to publish moves.connector_url: optional URL of an external connector service.
logging: logging configuration used byhots.utils.logging_config.setup_logging().level: log level ("DEBUG","INFO", …).filename: log file path.fmt: log message format.
reporting: configuration of result files and plots.results_folder: base output folder.metrics_file: CSV file for aggregated metrics.plots_folder: folder where plots will be saved.
Additional top-level keys
For convenience, some examples also define data_folder,
individual_field, host_field, tick_field and
metrics at the top level. These are redundant copies of the values
stored inside connector.parameters and are kept for backward
compatibility.
Preparing data
If you use historical data, the inputs are provided through 3 CSV files hosted in the same directory:
container_usage.csv: describes containers resource consumptionnode_meta.csv: provides nodes capacities (and other additional data)node_usage.csv: describes nodes resource consumption
Each file have the following formats :
container_usage.csv:timestamp
container_id
metric_1
metric_2
machine_id
t1
c_10
10
50
m_2
…
…
…
…
…
tmax
c_48
6.5
24
m_5
node_meta.csv:machine_id
metric_1
metric_2
m_2
30
150
m_5
24
80
container_usage.csv:timestamp
machine_id
metric_1
metric_2
t1
m_2
25
65
…
…
…
…
tmax
m_5
17.5
52
Note that the file node_usage.csv is not mandatory : if it does not exist in
the directory, it will be built using container_usage.csv data.
Running the app
Once the configuration file is prepared, hots is started by:
hots --config /path/to/config.json
The --config (or -c) option is mandatory and must point to a valid
JSON configuration file.
You can view the available command-line options with:
hots --help
Typical options include the standard --help and --version flags.
All runtime behaviour is controlled by the configuration file rather than
individual CLI flags (number of clusters, window size, problem type, connector,
etc.).
Output explanation
With the execution of HOTS, the global process is displayed in the terminal and the following output and logs files are created:
logs.log: logs on main process (which loop, which step in the loop…)clustering_logs.log: logs on clustering computes at each loopoptim_logs.log: information on optimization models solvingresults.log: temporary results at each loop (number of changes, objective value…)global_results.csv: final results for identified business criterialoop_results.csv: multiple indicators at each loop (clustering criteria, conflict graph information…)node_results.csv: final nodes related results (average / minimum / maximum loads)times.csv: intermediate times for each step (preprocess + all steps for each loop)node_usage_evo.csv: numerical nodes consumption evolution, since HOTS launch until HOTS stopnode_usage_evo.svg: graphical nodes consumption evolution, since HOTS launch until HOTS stop