Docker Setup & Deployment¶
This guide explains how to deploy ICOS-FL using Docker, focusing on the Flower-based federated learning infrastructure.
Prerequisites¶
Docker Engine (20.10+)
Docker Compose (2.0+)
4GB+ of available RAM
Network access between nodes (for distributed deployment)
Understanding Docker Components¶
ICOS-FL uses Docker containers for its federated learning infrastructure:
SuperLink Container: Central coordinator for federated learning
SuperNode Containers: Client nodes that train models on local data
Each component is defined in the docker/ directory:
docker/superlink.Dockerfile: Defines the SuperLink containerdocker/supernode.Dockerfile: Defines the SuperNode containerdocker/simulation.yml: Docker Compose file for local deployment
Simulation Deployment¶
For development or testing, you can run all components on a single machine:
Deploy the SuperLink (central coordinator):
docker compose -f docker/simulation.yml up -d superlink
Deploy SuperNodes (client nodes):
docker compose -f docker/simulation.yml up -d supernode-1 supernode-2
Verify the containers are running:
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
This setup creates a simulated federated learning environment all on your local machine using host network mode, which enables containers to access local services like DataClay through localhost.
Production Deployment¶
In a production environment, you’ll deploy components across multiple machines:
Controller Machine Setup:
Deploy the SuperLink container on your controller machine:
# On controller machine cd /path/to/icos-fl docker compose -f docker/simulation.yml up -d superlink
Node Machine Setup:
On each node machine, edit the
docker/simulation.ymlfile to update the SuperLink address:# Example for supernode-1 command: - --insecure - --superlink - controller.ip.address:9092 # Replace with actual controller IP - --clientappio-api-address - "0.0.0.0:9094"
Then deploy the SuperNode:
# On each node machine cd /path/to/icos-fl docker compose -f docker/simulation.yml up -d supernode-1 # Or supernode-2, etc.
Start Federated Learning:
The following command can be run from any machine, not just the controller. Just make sure to configure the controller’s IP address in your
pyproject.tomlfile under the[tool.flwr.federations.remote-deployment]section:[tool.flwr.federations.remote-deployment] address = "controller.ip.address:9093" # Replace with actual controller IP insecure = true
Then start the federated learning process:
cd /path/to/icos-fl flwr run . remote-deployment --stream
Container Configuration¶
The containers are configured through the Docker Compose file and command-line parameters:
SuperLink Container¶
The SuperLink container accepts these parameters:
superlink:
build:
context: ..
dockerfile: docker/superlink.Dockerfile
network_mode: "host" # Uses host network mode
volumes:
- ..:/app/model:rw # To save model checkpoints
command:
- --insecure
Key parameters:
--insecure: Disables TLS (remove for production)network_mode: "host": Uses host networking for optimal local service accessDefault ports: * 9091: ServerAppIO API * 9092: Fleet API * 9093: Exec API
SuperNode Container¶
The SuperNode container accepts these parameters:
supernode-1:
build:
context: ..
dockerfile: docker/supernode.Dockerfile
network_mode: "host"
command:
- --insecure
- --superlink
- localhost:9092 # In production, use the controller's IP
- --clientappio-api-address
- "0.0.0.0:9094"
Key parameters:
--insecure: Disables TLS (remove for production)--superlink: Address of the SuperLink to connect to--clientappio-api-address: Address for ClientApp communicationnetwork_mode: "host": Uses host networking for optimal local service access
Building Custom Images¶
The project already includes custom Docker images that extend the base Flower images with project-specific dependencies:
SuperLink Image:
If you need to modify the SuperLink image:
docker build -f docker/superlink.Dockerfile -t custom-icos-superlink:latest .
SuperNode Image:
If you need to modify the SuperNode image:
docker build -f docker/supernode.Dockerfile -t custom-icos-supernode:latest .
Container Resource Limits¶
For production deployments, add resource limits to your containers:
# In docker-compose.yml or simulation.yml
superlink:
# ... other configuration ...
deploy:
resources:
limits:
cpus: '2.0'
memory: 2G
reservations:
cpus: '1.0'
memory: 1G
supernode-1:
# ... other configuration ...
deploy:
resources:
limits:
cpus: '1.0'
memory: 1G
reservations:
cpus: '0.5'
memory: 512M
Persisting Data¶
The simulation.yml file already includes volume mounts for persisting model data:
superlink:
# ... other configuration ...
volumes:
- ..:/app/model:rw # Mount the project directory for model storage
Security Considerations¶
For production deployments:
Enable TLS:
Remove the
--insecureflagAdd SSL certificates and configuration:
superlink: # ... other configuration ... command: - --ssl-certfile=/path/to/cert.pem - --ssl-keyfile=/path/to/key.pem - --ssl-ca-certfile=/path/to/ca.pem volumes: - /path/to/certs:/path/to:ro
Network Isolation:
When not using
network_mode: "host", create dedicated networksRestrict port exposure to the minimum necessary
Authentication:
Enable authentication for SuperLink and SuperNodes:
superlink: # ... other configuration ... command: # ... other parameters ... - --auth-list-public-keys=/path/to/keys.csv volumes: - /path/to/keys.csv:/path/to/keys.csv:ro
Troubleshooting¶
Container Fails to Start:
Check logs for errors:
docker logs icos-fl_superlink_1
Network Connectivity Issues:
Ensure ports are accessible:
nc -zv localhost 9092
Resource Constraints:
Check if containers are being killed due to OOM (Out of Memory):
docker stats
Cleanup¶
To stop and remove all containers:
docker compose -f docker/simulation.yml down