=============
Configuration
=============

This page documents the configuration options available in ICOS-FL.

Configuration Files
-------------------

ICOS-FL uses several configuration files:

1. **pyproject.toml**: Main project configuration
2. **otel-config.yaml**: OpenTelemetry collector configuration
3. **docker-compose.yml**: Container configuration
4. **bridgeConfig.py**: Bridge configuration script

pyproject.toml Configuration
----------------------------

The ``pyproject.toml`` file is the primary configuration file for ICOS-FL. It is divided into multiple sections:

Build System
~~~~~~~~~~~~

.. code-block:: toml

   [build-system]
   requires = ["hatchling"]
   build-backend = "hatchling.build"

Project Metadata
~~~~~~~~~~~~~~~~

.. code-block:: toml

   [project]
   name = "icos-fl"
   version = "0.1.0"
   description = "ICOS-FL: Flower-powered FL framework for real-time resource monitoring (LSTM) & predictions."
   license = "MIT"
   readme = "README.md"
   requires-python = ">=3.10"
   authors = [
     { name = "Anastasios Kaltakis", email = "anastasioskaltakis@gmail.com" },
   ]
   dependencies = [
     "flwr[simulation]>=1.17.0",
     "torch==2.5.1",
     "wandb==0.19.8",
     "pandas>=2.2.3",
     "scikit-learn>=1.6.1",
     "dataclay==4.0.0",
   ]

Flower Application Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: toml

   [tool.flwr.app]
   publisher = "Anastasios Kaltakis"

   [tool.flwr.app.components]
   serverapp = "icos_fl.server.server:app"
   clientapp = "icos_fl.client.client:app"

   [tool.flwr.app.config]
   # Server configuration
   num-server-rounds = 10
   fraction-fit = 1.0
   fraction-evaluate = 1.0
   min-fit-clients = 2
   min-evaluate-clients = 2
   min-available-clients = 2
   server-device = "cpu"

   # LSTM model configuration
   hidden-layer-size = 10
   time-step = 10
   num-layers = 1

   # Resource metric to monitor and predict
   metric = "cpu_usage"
   batch-size = 64
   train-test-split = 0.8
   local-epochs = 100
   learning-rate = 0.001

   use-wandb = false

Federation Configuration
~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: toml

   [tool.flwr.federations]
   default = "local-deployment"

   [tool.flwr.federations.local-deployment]
   address = "127.0.0.1:9093"
   insecure = true

   [tool.flwr.federations.remote-deployment]
   address = "127.0.0.1:9093"
   insecure = true

OpenTelemetry Configuration
---------------------------

The ``otel-config.yaml`` file configures the OpenTelemetry collector:

.. code-block:: yaml

   receivers:
     prometheus:
       config:
         scrape_configs:
           - job_name: 'scaphandre'
             scrape_interval: 3s  # Scrape metrics every 3 seconds
             static_configs:
               - targets: ['127.0.0.1:8080']

   processors:
     batch:
       timeout: 180s  # Batch metrics for 3 minutes

   exporters:
     otlp:
       endpoint: 127.0.0.1:4317
       tls:
         insecure: true

   service:
     pipelines:
       metrics:
         receivers: [prometheus]
         processors: [batch]
         exporters: [otlp]

Docker Compose Configuration
----------------------------

The ``docker-compose.yml`` file configures the Docker containers:

.. code-block:: yaml

   services:
     redis:
       image: redis:latest
       restart: unless-stopped

     scaphandre:
       image: docker.io/hubblo/scaphandre
       command: prometheus -p 8080 -a 0.0.0.0
       privileged: true
       # ...

     proxy:
       build: .
       ports:
         - 8676:8676
       depends_on:
         - metadata-service
         - backend
       environment:
         - DATACLAY_PROXY_MDS_HOST=metadata-service
         - DATACLAY_KV_HOST=redis
       command: python -m dataclay.proxy
       # ...

Bridge Configuration
--------------------

The bridge configuration is set through the ``bridgeConfig.py`` script:

.. code-block:: python

   # Create a ResourceConfiguration for Scaphandre
   rc_scaphandre = ResourceConfiguration("scaphandre-metrics", scaphandre_rules)

   # Add the specific metrics you want to track
   rc_scaphandre.add_metric("scaph_host_power_microwatts")
   rc_scaphandre.add_metric("scaph_host_load_avg_one")
   rc_scaphandre.add_metric("scaph_host_memory_total_bytes")
   rc_scaphandre.add_metric("scaph_host_memory_available_bytes")

   # Add this configuration to the bridge
   bc.set_res_config(rc_scaphandre)

Configuration Parameters
------------------------

Server Configuration
~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :align: left

   * - Parameter
     - Default
     - Description
   * - num-server-rounds
     - 10
     - Number of federated learning rounds
   * - fraction-fit
     - 1.0
     - Fraction of clients to select for training
   * - fraction-evaluate
     - 1.0
     - Fraction of clients to select for evaluation
   * - min-fit-clients
     - 2
     - Minimum number of clients for training
   * - min-evaluate-clients
     - 2
     - Minimum number of clients for evaluation
   * - min-available-clients
     - 2
     - Minimum clients before starting a round
   * - server-device
     - "cpu"
     - Device to use for server-side operations

LSTM Model Configuration
~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :align: left

   * - Parameter
     - Default
     - Description
   * - hidden-layer-size
     - 10
     - Size of the LSTM hidden layer
   * - time-step
     - 10
     - Number of time steps in input sequence
   * - num-layers
     - 1
     - Number of LSTM layers
   * - learning-rate
     - 0.001
     - Learning rate for model optimization

Training Configuration
~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :align: left

   * - Parameter
     - Default
     - Description
   * - batch-size
     - 64
     - Batch size for training
   * - train-test-split
     - 0.8
     - Ratio for train/validation split
   * - local-epochs
     - 100
     - Number of local training epochs per round
   * - metric
     - "cpu_usage"
     - Metric to predict (cpu_usage, memory_usage, power_consumption)
   * - use-wandb
     - false
     - Whether to use Weights & Biases for logging

Time Series Data Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :align: left

   * - Parameter
     - Default
     - Description
   * - max_rows
     - 300
     - Maximum rows in the sliding window
   * - scrape_interval
     - 3s
     - Interval between metrics scrapes
   * - batch_timeout
     - 180s
     - Interval for batching metrics

Environment Variables
---------------------

ICOS-FL respects these environment variables:

.. list-table::
   :header-rows: 1
   :align: left

   * - Variable
     - Description
   * - DATACLAY_PROXY_HOST
     - Host address for the DataClay proxy
   * - DATACLAY_PROXY_PORT
     - Port for the DataClay proxy
   * - BRIDGE_CONFIGURATION_ALIAS
     - Alias for the bridge configuration
   * - TIMESERIES_ALIAS
     - Alias for the TimeSeriesData object
   * - LOG_LEVEL
     - Logging level (DEBUG, INFO, WARNING, ERROR)

Example Configuration
---------------------

Here's an example of a complete configuration for predicting memory usage with a larger model:

.. code-block:: toml

   [tool.flwr.app.config]
   # Server configuration
   num-server-rounds = 20
   min-fit-clients = 3
   min-evaluate-clients = 3
   min-available-clients = 3

   # LSTM model configuration
   hidden-layer-size = 20
   time-step = 15
   num-layers = 2

   # Training configuration
   metric = "memory_usage"
   batch-size = 32
   local-epochs = 150
   learning-rate = 0.0005

   # Logging
   use-wandb = true