User manual
Preparing data
Input data are provided in 3 CSV files hosted in the same directory:
container_usage.csv: describes containers resource consumptionnode_meta.csv: provides nodes capacities (and other additional data)node_usage.csv: describes nodes resource consumption
Each file have the following formats :
container_usage.csv:timestamp
container_id
metric_1
metric_2
machine_id
t1
c_10
10
50
m_2
…
…
…
…
…
tmax
c_48
6.5
24
m_5
node_meta.csv:machine_id
metric_1
metric_2
m_2
30
150
m_5
24
80
container_usage.csv:timestamp
machine_id
metric_1
metric_2
t1
m_2
25
65
…
…
…
…
tmax
m_5
17.5
52
Note that the file node_usage.csv is not mandatory : if it does not exist in
the directory, it will be built using container_usage.csv data.
Preparing the parameters
The parameters inputs are provided from a JSON file, which has the following format :
{
"analysis": {
"window_duration": "default",
"step": 1
},
"clustering": {
"algo": "hierarchical",
"nb_clusters": 4
},
"heuristic": {
"algo": "distant_pairwise"
}
}
- There is 2 ways to specify the parameter file to use :
with the options
--params(or-p)by including a file named
params.jsonin the data folder
Here are all the possible parameters with a small description :
analysisparameters dealing with analysis periodwindow_durationwindow size for the loop processsep_timetime dividing data between analysis and running period
clusteringparameters dealing with first clustering problemalgoalgorithm to use for first clustering (between kmeans, hierarchical and spectral)nb_clustersnumber of clusters to use
dataparameters dealing with dataindividuals_filefilename for containers consumptionhosts_meta_filefilename for nodes informationindividual_fieldfield name for containers ID in datahost_fieldfield name for nodes ID in datatick_fieldfield name for timestamps ID in datametricsresources to take into account from data
heuristicparameters dealing with placement heuristic during analysis periodalgoheuristic algorithm used to have first placement solution (between distant_pairwise, ffd and spread)
optimizationparameters dealing with optimization models solvemodelpath to file describing the models to use (see Pyomo use for more information)solverthe solver to use for solving problems
loopparameters dealing with loop processmodetriggering loop mode (between event, sequential and hybrid)ticknumber of datapoints used to progress in time before triggering a new loopconstraints_duallist of constraints used for dual variables comparison during solutions evaluationtol_dual_clusttolerance threshold for dual variables comparison during clustering evaluationtol_move_clustmaximum allowed moves for clustering updatetol_dual_placetolerance threshold for dual variables comparison during placeent evaluationtol_move_placemaximum allowed moves for placement updatetol_steptolerance increment factor for each loop
plotparameters dealing with graph displayrendererrendering method used by matplotlib
allocationparameters dealing with adjustment of resources allocated to containersenableenable or disable the dynamic adjustment of containers resourcesconstraintsconstraints used for resources dynamic adjustmentload_thresholdmaximum nodes load thresholdmax_amplitudemaximum nodes resource consumption amplitude
objectiveallocation problem objectivesopen_nodesnumber of used nodestarget_load_CPUnodes load (CPU)
placementparameters dealing with containers placement problemenableenable or disable the containers placement problem
An parameter example file can be found in ~/tests/params_default.json file.
Note that if no parameter file is provided, this example parameter file will be used.
Running the app
Having the first 3 above mentioned files in an arbitrary directory - say ~/path/to/data/ -
issue the command:
hots ~/path/to/data/
The hots can be used with the following options :
-k: number of clusters used in clustering-t, --tau: window size for the loop process-m, --method: global method used for placement problem-c, --cluster_method: method used to update the clustering-p, --param: specific parameters file-o, --output: specific directory for –output-ec, --tolclust: value for epsilonC (building the conflict graph for clustering)-ea, --tolplace: value for epsilonA (building the conflict graph for placement)--help: display these options and exit
Note that some parameters can be redundant with the parameter file (e.g. k and tau)
: in this case the value from CLI is used.
Reading the results
When the application is launched, the whole initial data is displayed :
the container resource usage
the node resource usage (based on initial allocation)
The separation time (between the two phases) is plotted by a red line.
Then the first part of the methodology is performed (clustering on first time period), and the allocation resulting from heuristic applied. The clustering results and new nodes resource usage (based on new allocation) are displayed.
Finally, clustering results, containers and nodes consumptions are plotted and updated in time, for the second phase.