Streaming data

This document describes how streaming data is integrated into hots in the refactored architecture.

Overview

Streaming is handled by connector plugins implementing the hots.core.interfaces.ConnectorPlugin interface. A connector is responsible for:

loading the initial data used during the analysis phase,
providing new data for each step of the streaming loop,
applying the moves decided by the optimization/heuristics layer (for instance by sending them to Kafka or writing them to a file).

The connector to use is selected via the connector section of the configuration file (see User manual).

Connector plugin interface

The hots.core.interfaces.ConnectorPlugin class defines the methods that every connector must implement:

load_initial(): load the initial batch of data and return the individual-level dataframe, the host-level dataframe, and optional metadata.
next_batch(): return the next batch of individual-level data for the current streaming window or None when no more data is available.
apply_moves(): apply a list of move dictionaries produced by the placement plugin. For example, a move can be represented as:
```
{"container_name": "c_001", "old_host": "h_01", "new_host": "h_03"}
```
The connector decides how to persist or forward these moves (to a file, a Kafka topic, an external orchestrator, etc.).

Built-in connectors

File connector

hots.plugins.connector.file_connector implements the hots.plugins.connector.file_connector.FileConnector class, which:

reads historical data from CSV files located in data_folder,
exposes them as pandas.DataFrame objects to the rest of the application,
simulates a streaming process by returning successive time windows in next_batch(),
writes computed moves to a log file, using the outfile path provided in connector.parameters.

Kafka connector

hots.plugins.connector.kafka_connector implements the hots.plugins.connector.kafka_connector.KafkaConnector class, which:

consumes container usage data from one or several Kafka topics,
keeps track of offsets for robust consumption,
converts messages into pandas.DataFrame rows compatible with the rest of the pipeline,
publishes move messages back to Kafka in apply_moves().

The Kafka connection details (bootstrap servers, topic names, etc.) are configured through the connector.parameters and optional top-level kafka section of the configuration file.

Integration in the main loop

The main application class hots.core.app.App interacts only with the connector interface. This means you can implement your own connector (for instance, to integrate with another streaming platform or REST API) by subclassing ConnectorPlugin and configuring connector.type accordingly, without changing the rest of the code base.