Streaming data
This document describes how streaming data is integrated into hots in the refactored architecture.
Overview
Streaming is handled by connector plugins implementing the
hots.core.interfaces.ConnectorPlugin interface. A connector is
responsible for:
loading the initial data used during the analysis phase,
providing new data for each step of the streaming loop,
applying the moves decided by the optimization/heuristics layer (for instance by sending them to Kafka or writing them to a file).
The connector to use is selected via the connector section of the
configuration file (see User manual).
Connector plugin interface
The hots.core.interfaces.ConnectorPlugin class defines the methods that
every connector must implement:
load_initial(): load the initial batch of data and return the individual-level dataframe, the host-level dataframe, and optional metadata.next_batch(): return the next batch of individual-level data for the current streaming window orNonewhen no more data is available.apply_moves(): apply a list of move dictionaries produced by the placement plugin. For example, a move can be represented as:{"container_name": "c_001", "old_host": "h_01", "new_host": "h_03"}
The connector decides how to persist or forward these moves (to a file, a Kafka topic, an external orchestrator, etc.).
Built-in connectors
File connector
hots.plugins.connector.file_connector implements the
hots.plugins.connector.file_connector.FileConnector class, which:
reads historical data from CSV files located in
data_folder,exposes them as
pandas.DataFrameobjects to the rest of the application,simulates a streaming process by returning successive time windows in
next_batch(),writes computed moves to a log file, using the
outfilepath provided inconnector.parameters.
Kafka connector
hots.plugins.connector.kafka_connector implements the
hots.plugins.connector.kafka_connector.KafkaConnector class, which:
consumes container usage data from one or several Kafka
topics,keeps track of offsets for robust consumption,
converts messages into
pandas.DataFramerows compatible with the rest of the pipeline,publishes move messages back to Kafka in
apply_moves().
The Kafka connection details (bootstrap servers, topic names, etc.) are
configured through the connector.parameters and optional top-level
kafka section of the configuration file.
Integration in the main loop
The main application class hots.core.app.App interacts only with the
connector interface. This means you can implement your own connector (for
instance, to integrate with another streaming platform or REST API) by
subclassing ConnectorPlugin and configuring connector.type
accordingly, without changing the rest of the code base.