Overview¶
What is ICOS-FL?¶
ICOS-FL is a federated learning framework built on Flower for real-time resource monitoring and prediction across distributed nodes. It enables organizations to train machine learning models on system metrics data without centralizing sensitive information.
The framework uses LSTM (Long Short-Term Memory) neural networks to predict resource utilization patterns based on historical data collected through OpenTelemetry and stored in DataClay.
Architecture¶
ICOS-FL consists of several components that work together:
SuperLink: The central server component that orchestrates the federated learning process
SuperNodes: Client components running on each node that collect metrics and train local models
DataClay: A distributed object store that handles time series data
OTLP Bridge: Connects OpenTelemetry metrics to DataClay for storage
LSTM Models: Neural networks trained to predict resource usage patterns
Key Features¶
Privacy-Preserving Learning: Train models without sharing raw system metrics
Resource Prediction: Forecast CPU, memory, and power usage in advance
Scalable Architecture: Support for multiple nodes in a federated topology
Real-time Monitoring: Track system metrics with minimal overhead
Docker Integration: Easy deployment with containerized components
Use Cases¶
ICOS-FL is designed for scenarios where organizations need to:
Predict Resource Spikes: Anticipate CPU, memory, or power consumption surges
Optimize Resource Allocation: Plan capacity based on predicted usage patterns
Detect Anomalies: Identify unusual system behavior based on historical patterns
Maintain Data Privacy: Keep sensitive system information within organizational boundaries
The framework is particularly useful for:
Edge Computing Environments: Where data sovereignty is important
Multi-datacenter Deployments: For aggregating insights without centralizing data
Privacy-Sensitive Organizations: That need insights without exposing raw metrics