.. _usermanual:

===========
User manual
===========

Preparing the parameters
========================

The configuration file is the only mandatory argument to run :term:`hots`. It is a JSON
file passed to the CLI with the :code:`--config` option.

A minimal example using the file connector is the following:

.. code-block:: json

   {
     "time_limit": null,
     "clustering": {
       "method": "kmeans",
       "nb_clusters": 3,
       "parameters": {
         "nb_clusters": 3
       }
     },
     "optimization": {
       "backend": "pyomo",
       "parameters": {
         "solver": "glpk",
         "verbose": 0
       }
     },
     "problem": {
       "type": "placement",
       "parameters": {
         "initial_placement": 0,
         "tol": 0.1,
         "tol_move": 0.5
       }
     },
     "connector": {
       "type": "file",
       "parameters": {
         "data_folder": "./tests/data/thesis_ex_10",
         "file_name": "container_usage.csv",
         "host_field": "machine_id",
         "individual_field": "container_id",
         "tick_field": "timestamp",
         "tick_increment": 2,
         "window_duration": 3,
         "sep_time": 3,
         "metrics": ["cpu"],
         "outfile": "./out/moves.log"
       }
     },
     "logging": {
       "level": "INFO",
       "filename": "./out/hots.log",
       "fmt": "%(asctime)s %(levelname)s: %(message)s"
     },
     "reporting": {
       "results_folder": "./out",
       "metrics_file": "./out/metrics.csv",
       "plots_folder": "./out/plots"
     }
   }

Configuration sections
----------------------

The top-level keys of the configuration file map to the fields of
:class:`hots.config.loader.AppConfig`:

- :code:`time_limit`:
  maximum wall-clock time (in seconds) for the application run.
  If :code:`null`, :term:`hots` processes all available data.

- :code:`clustering`:
  configuration of the clustering plugin.

  - :code:`method`: name of the clustering algorithm to use, e.g.
    :code:`"kmeans"`, :code:`"hierarchical"`, :code:`"spectral"`.
  - :code:`nb_clusters`: target number of clusters.
  - :code:`parameters`: free-form dict of method-specific parameters.

- :code:`optimization`:
  configuration of the optimization backend.

  - :code:`backend`: optimization backend to use. Currently :code:`"pyomo"` is
    supported and maps to :class:`hots.plugins.optimization.pyomo_model.PyomoModel`.
  - :code:`parameters`: solver-related parameters such as:

    - :code:`solver`: solver name (e.g. :code:`"glpk"`).
    - :code:`verbose`: integer verbosity level.

- :code:`problem`:
  configuration of the business (domain) problem plugin (see
  :mod:`hots.plugins.problem.placement`).

  - :code:`type`: problem type. The default implementation is :code:`"placement"`.
  - :code:`parameters`: problem-specific parameters, for example:

    - :code:`initial_placement`: whether to compute an initial placement
      (:code:`0` or :code:`1`).
    - :code:`tol`: tolerance used to decide when a node is overloaded.
    - :code:`tol_move`: tolerance for deciding when to move a container.

- :code:`connector`:
  configuration of the data connector plugin.

  - :code:`type`: connector type. The built-in types are:

    - :code:`"file"`: read data from CSV files.
    - :code:`"kafka"`: read data from a Kafka topic.

  - :code:`parameters`: connector-specific parameters. For both connectors,
    the following keys are usually required:

    - :code:`data_folder`: folder containing the input data.
    - :code:`file_name`: CSV file containing container-level metrics.
    - :code:`individual_field`: column name for container IDs.
    - :code:`host_field`: column name for host (node) IDs.
    - :code:`tick_field`: column name for timestamps.
    - :code:`tick_increment`: step between two timestamps in the stream.
    - :code:`window_duration`: window size used for sliding-window analysis.
    - :code:`sep_time`: time separating the analysis period from the running period.
    - :code:`metrics`: list of metric names to use (e.g. :code:`["cpu"]`).
    - :code:`outfile`: file where proposed moves will be written.

    For the Kafka connector, the following additional keys can be used:

    - :code:`bootstrap.servers`: Kafka bootstrap servers string.
    - :code:`topics`: list of topic names used to publish moves.
    - :code:`connector_url`: optional URL of an external connector service.

- :code:`logging`:
  logging configuration used by :func:`hots.utils.logging_config.setup_logging`.

  - :code:`level`: log level (:code:`"DEBUG"`, :code:`"INFO"`, ...).
  - :code:`filename`: log file path.
  - :code:`fmt`: log message format.

- :code:`reporting`:
  configuration of result files and plots.

  - :code:`results_folder`: base output folder.
  - :code:`metrics_file`: CSV file for aggregated metrics.
  - :code:`plots_folder`: folder where plots will be saved.

Additional top-level keys
-------------------------

For convenience, some examples also define :code:`data_folder`,
:code:`individual_field`, :code:`host_field`, :code:`tick_field` and
:code:`metrics` at the top level. These are redundant copies of the values
stored inside :code:`connector.parameters` and are kept for backward
compatibility.

Preparing data
==============

If you use historical data, the inputs are provided through 3 CSV files hosted in the same directory:

- :file:`container_usage.csv` : describes containers resource consumption
- :file:`node_meta.csv` : provides nodes capacities (and other additional data)
- :file:`node_usage.csv` : describes nodes resource consumption

Each file have the following formats :

- :file:`container_usage.csv` :
   .. csv-table::
      :header: "timestamp", "container_id", "metric_1", "metric_2", "machine_id"
      :widths: 15, 15, 15, 15, 15

      "t1", "c_10", 10, 50, "m_2"
      "...", "...", "...", "...", "..."
      "tmax", "c_48", 6.5, 24, "m_5"
- :file:`node_meta.csv` :
   .. csv-table::
      :header: "machine_id", "metric_1", "metric_2"
      :widths: 15, 15, 15

      "m_2", 30, 150
      "m_5", 24, 80
- :file:`container_usage.csv` :
   .. csv-table::
      :header: "timestamp", "machine_id", "metric_1", "metric_2"
      :widths: 15, 15, 15, 15

      "t1", "m_2", 25, 65
      "...", "...", "...", "..."
      "tmax", "m_5", 17.5, 52

Note that the file :file:`node_usage.csv` is not mandatory : if it does not exist in
the directory, it will be built using :file:`container_usage.csv` data.

Running the app
===============

Once the configuration file is prepared, :term:`hots` is started by:

.. code:: console

   hots --config /path/to/config.json

The :code:`--config` (or :code:`-c`) option is mandatory and must point to a valid
JSON configuration file.

You can view the available command-line options with:

.. code:: console

   hots --help

Typical options include the standard :code:`--help` and :code:`--version` flags.
All runtime behaviour is controlled by the configuration file rather than
individual CLI flags (number of clusters, window size, problem type, connector,
etc.).

Output explanation
===================

With the execution of HOTS, the global process is displayed in the terminal and
the following output and logs files are created:

* :file:`logs.log`: logs on main process (which loop, which step in the loop...)
* :file:`clustering_logs.log`: logs on clustering computes at each loop
* :file:`optim_logs.log`: information on optimization models solving
* :file:`results.log`: temporary results at each loop (number of changes, objective value...)
* :file:`global_results.csv`: final results for identified business criteria 
* :file:`loop_results.csv`: multiple indicators at each loop (clustering criteria, conflict graph information...)
* :file:`node_results.csv`: final nodes related results (average / minimum / maximum loads)
* :file:`times.csv`: intermediate times for each step (preprocess + all steps for each loop)
* :file:`node_usage_evo.csv`: numerical nodes consumption evolution, since HOTS launch until HOTS stop
* :file:`node_usage_evo.svg`: graphical nodes consumption evolution, since HOTS launch until HOTS stop