User manual

Preparing data

Input data are provided in 3 CSV files hosted in the same directory:

  • container_usage.csv : describes containers resource consumption

  • node_meta.csv : provides nodes capacities (and other additional data)

  • node_usage.csv : describes nodes resource consumption

Each file have the following formats :

  • container_usage.csv :

    timestamp

    container_id

    metric_1

    metric_2

    machine_id

    t1

    c_10

    10

    50

    m_2

    tmax

    c_48

    6.5

    24

    m_5

  • node_meta.csv :

    machine_id

    metric_1

    metric_2

    m_2

    30

    150

    m_5

    24

    80

  • container_usage.csv :

    timestamp

    machine_id

    metric_1

    metric_2

    t1

    m_2

    25

    65

    tmax

    m_5

    17.5

    52

Note that the file node_usage.csv is not mandatory : if it does not exist in the directory, it will be built using container_usage.csv data.

Preparing the parameters

The parameters inputs are provided from a JSON file, which has the following format :

{
   "analysis": {
      "window_duration": "default",
      "step": 1
   },
   "clustering": {
      "algo": "hierarchical",
      "nb_clusters": 4
   },
   "heuristic": {
      "algo": "distant_pairwise"
   }
}
There is 2 ways to specify the parameter file to use :
  • with the options --params (or -p)

  • by including a file named params.json in the data folder

Here are all the possible parameters with a small description :

  • analysis parameters dealing with analysis period

    • window_duration window size for the loop process

    • sep_time time dividing data between analysis and running period

  • clustering parameters dealing with first clustering problem

    • algo algorithm to use for first clustering (between kmeans, hierarchical and spectral)

    • nb_clusters number of clusters to use

  • data parameters dealing with data

    • individuals_file filename for containers consumption

    • hosts_meta_file filename for nodes information

    • individual_field field name for containers ID in data

    • host_field field name for nodes ID in data

    • tick_field field name for timestamps ID in data

    • metrics resources to take into account from data

  • heuristic parameters dealing with placement heuristic during analysis period

    • algo heuristic algorithm used to have first placement solution (between distant_pairwise, ffd and spread)

  • optimization parameters dealing with optimization models solve

    • model path to file describing the models to use (see Pyomo use for more information)

    • solver the solver to use for solving problems

  • loop parameters dealing with loop process

    • mode triggering loop mode (between event, sequential and hybrid)

    • tick number of datapoints used to progress in time before triggering a new loop

    • constraints_dual list of constraints used for dual variables comparison during solutions evaluation

    • tol_dual_clust tolerance threshold for dual variables comparison during clustering evaluation

    • tol_move_clust maximum allowed moves for clustering update

    • tol_dual_place tolerance threshold for dual variables comparison during placeent evaluation

    • tol_move_place maximum allowed moves for placement update

    • tol_step tolerance increment factor for each loop

  • plot parameters dealing with graph display

    • renderer rendering method used by matplotlib

  • allocation parameters dealing with adjustment of resources allocated to containers

    • enable enable or disable the dynamic adjustment of containers resources

    • constraints constraints used for resources dynamic adjustment

      • load_threshold maximum nodes load threshold

      • max_amplitude maximum nodes resource consumption amplitude

    • objective allocation problem objectives

      • open_nodes number of used nodes

      • target_load_CPU nodes load (CPU)

  • placement parameters dealing with containers placement problem

    • enable enable or disable the containers placement problem

An parameter example file can be found in ~/tests/params_default.json file. Note that if no parameter file is provided, this example parameter file will be used.

Running the app

Having the first 3 above mentioned files in an arbitrary directory - say ~/path/to/data/ - issue the command:

hots ~/path/to/data/

The hots can be used with the following options :

  • -k : number of clusters used in clustering

  • -t, --tau : window size for the loop process

  • -m, --method : global method used for placement problem

  • -c, --cluster_method : method used to update the clustering

  • -p, --param : specific parameters file

  • -o, --output : specific directory for –output

  • -ec, --tolclust : value for epsilonC (building the conflict graph for clustering)

  • -ea, --tolplace : value for epsilonA (building the conflict graph for placement)

  • --help : display these options and exit

Note that some parameters can be redundant with the parameter file (e.g. k and tau) : in this case the value from CLI is used.

Reading the results

When the application is launched, the whole initial data is displayed :

  • the container resource usage

  • the node resource usage (based on initial allocation)

The separation time (between the two phases) is plotted by a red line.

Then the first part of the methodology is performed (clustering on first time period), and the allocation resulting from heuristic applied. The clustering results and new nodes resource usage (based on new allocation) are displayed.

Finally, clustering results, containers and nodes consumptions are plotted and updated in time, for the second phase.