Deployment Architecture¶
This page explains the deployment architecture of ICOS-FL, focusing on how components are organized and deployed across machines.
Deployment Patterns¶
ICOS-FL supports several deployment patterns:
Development Mode: All components on a single machine
Federated Mode: Components distributed across multiple machines
Hybrid Mode: Mix of centralized and distributed components
ICOS-FL deployment patterns¶
Single-Machine Deployment¶
In development mode, all components run on a single machine:
┌─────────────────────────── Single Machine ───────────────────────────┐
│ │
│ ┌─────────┐ ┌───────────┐ ┌───────────┐ ┌─────────────────────┐ │
│ │ Redis │ │ DataClay │ │ DataClay │ │ DataClay Proxy │ │
│ └─────────┘ │ Metadata │ │ Backend │ └─────────────────────┘ │
│ └───────────┘ └───────────┘ │
│ │
│ ┌─────────-┐ ┌───────────┐ ┌───────────┐ ┌─────────────────────┐ │
│ │Scaphandre│ │ OTEL │ │ OTLP │ │ Bridge Config │ │
│ │ │ │ Collector │ │ Bridge │ └─────────────────────┘ │
│ └──────-───┘ └───────────┘ └───────────┘ │
│ │
│ ┌─────────────────────┐ ┌─────────────────────────────────────┐ │
│ │ SuperLink (Server) │ │ SuperNode (Client) │ │
│ └─────────────────────┘ └─────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
All components are deployed as Docker containers, managed by Docker Compose.
Federated Deployment¶
In a federated deployment, components are distributed across multiple machines:
┌─────────────────────── Controller Machine ────────────────────────┐
│ │
│ ┌─────────┐ ┌───────────┐ ┌───────────┐ ┌─────────────────┐ │
│ │ Redis │ │ DataClay │ │ DataClay │ │ DataClay Proxy │ │
│ └─────────┘ │ Metadata │ │ Backend │ └─────────────────┘ │
│ └───────────┘ └───────────┘ │
│ │
│ ┌─────────-┐ ┌───────────┐ ┌───────────┐ ┌─────────────────┐ │
│ │Scaphandre│ │ OTEL │ │ OTLP │ │ Bridge Config │ │
│ │ │ │ Collector │ │ Bridge │ └─────────────────┘ │
│ └────────-─┘ └───────────┘ └───────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ SuperLink (Server) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────┬──────────────────────────────────┘
│
│ Network Communication
│
┌────────────────────────────────┼───────────────────────────────────┐
│ │ │
│ ┌─────────────────────────────▼───────────────────────────────┐ │
│ │ Node Machine 1 │ │
│ │ │ │
│ │ ┌─────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ │ Redis │ │ DataClay │ │ DataClay │ │ DataClay │ │ │
│ │ │ │ │ Metadata │ │ Backend │ │ Proxy │ │ │
│ │ └─────────┘ └───────────┘ └───────────┘ └───────────┘ │ │
│ │ │ │
│ │ ┌────────-─┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ │Scaphandre│ │ OTEL │ │ OTLP │ │ Bridge │ │ │
│ │ │ │ │ Collector │ │ Bridge │ │ Config │ │ │
│ │ └──────-───┘ └───────────┘ └───────────┘ └───────────┘ │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────┐ │ │
│ │ │ SuperNode (Client) │ │ │
│ │ └─────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Node Machine 2 │ │
│ │ (Same structure as Node Machine 1) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────┘
Key aspects of the federated deployment:
Controller Machine: Hosts the SuperLink server
Node Machines: Each runs a SuperNode client
Local Data Collection: Each node collects and processes its own metrics
Federation: SuperLink coordinates learning across nodes
Component Communication¶
ICOS-FL components communicate through several protocols:
Components |
Protocol |
Description |
|---|---|---|
Scaphandre → OTEL Collector |
HTTP |
Metrics scraping (Prometheus format) |
OTEL Collector → OTLP Bridge |
gRPC |
Metrics streaming (OpenTelemetry protocol) |
OTLP Bridge → DataClay |
Custom Protocol |
DataClay’s internal communication |
SuperNode → SuperLink |
gRPC |
Flower’s federation protocol |
Consumer → DataClay |
Custom Protocol |
DataClay client API |
Docker Container Architecture¶
Each component runs in its own Docker container:
Container |
Responsibility |
Dependencies |
|---|---|---|
redis |
DataClay backend storage |
None |
metadata-service |
DataClay metadata management |
redis |
backend |
DataClay execution environment |
redis |
proxy |
DataClay client access point |
metadata-service, backend |
scaphandre |
Hardware metrics collection |
Host access (privileged) |
otel-collector |
Metrics processing and forwarding |
scaphandre |
bridge |
Connect OTLP to DataClay |
proxy, otel-collector |
bridge-config |
Configure bridge settings |
proxy |
superlink |
Federated learning server |
proxy |
supernode-X |
Federated learning clients |
proxy, superlink |
Network Configuration¶
ICOS-FL uses several network configurations:
Docker Network: Internal communication between containers
Host Network: For components requiring direct access to host interfaces
External Network: For federation across machines
Port allocations:
Port |
Protocol |
Usage |
|---|---|---|
8676 |
TCP |
DataClay Proxy |
8080 |
HTTP |
Scaphandre metrics endpoint |
4317 |
gRPC |
OpenTelemetry collector |
9091 |
gRPC |
SuperLink ServerAppIo |
9092 |
gRPC |
SuperLink Fleet (federation) |
9093 |
gRPC |
SuperLink Exec |
9094+ |
gRPC |
SuperNode ClientAppIo (one per node) |
Resource Requirements¶
Minimum requirements for each component:
Component |
CPU |
Memory |
Notes |
|---|---|---|---|
DataClay Stack |
1 core |
1GB |
Scales with data volume |
Metrics Collection |
0.5 core |
512MB |
Minimal overhead |
SuperLink |
1 core |
1GB |
Scales with number of clients |
SuperNode |
1 core |
1GB |
Scales with model complexity |
LSTM Training |
2+ cores |
2GB+ |
GPU beneficial but optional |
Deployment Considerations¶
When deploying ICOS-FL in production, consider:
Data Persistence: Configure volume mounts for Redis and DataClay
Resource Allocation: Set appropriate resource limits for containers
Network Security: Use firewalls to restrict access to internal services
Monitoring: Implement health checks and container monitoring
High Availability: Use container orchestration for critical components
Backup Strategy: Regular backups for DataClay and model checkpoints