Overview

What is ICOS-FL?

ICOS-FL is a federated learning framework built on Flower for real-time resource monitoring and prediction across distributed nodes. It enables organizations to train machine learning models on system metrics data without centralizing sensitive information.

The framework uses LSTM (Long Short-Term Memory) neural networks to predict resource utilization patterns based on historical data collected through OpenTelemetry and stored in DataClay.

Architecture

ICOS-FL consists of several components that work together:

  • SuperLink: The central server component that orchestrates the federated learning process

  • SuperNodes: Client components running on each node that collect metrics and train local models

  • DataClay: A distributed object store that handles time series data

  • OTLP Bridge: Connects OpenTelemetry metrics to DataClay for storage

  • LSTM Models: Neural networks trained to predict resource usage patterns

Key Features

  • Privacy-Preserving Learning: Train models without sharing raw system metrics

  • Resource Prediction: Forecast CPU, memory, and power usage in advance

  • Scalable Architecture: Support for multiple nodes in a federated topology

  • Real-time Monitoring: Track system metrics with minimal overhead

  • Docker Integration: Easy deployment with containerized components

Use Cases

ICOS-FL is designed for scenarios where organizations need to:

  • Predict Resource Spikes: Anticipate CPU, memory, or power consumption surges

  • Optimize Resource Allocation: Plan capacity based on predicted usage patterns

  • Detect Anomalies: Identify unusual system behavior based on historical patterns

  • Maintain Data Privacy: Keep sensitive system information within organizational boundaries

The framework is particularly useful for:

  • Edge Computing Environments: Where data sovereignty is important

  • Multi-datacenter Deployments: For aggregating insights without centralizing data

  • Privacy-Sensitive Organizations: That need insights without exposing raw metrics