Docker Setup & Deployment

This guide explains how to deploy ICOS-FL using Docker, focusing on the Flower-based federated learning infrastructure.

Prerequisites

  • Docker Engine (20.10+)

  • Docker Compose (2.0+)

  • 4GB+ of available RAM

  • Network access between nodes (for distributed deployment)

Understanding Docker Components

ICOS-FL uses Docker containers for its federated learning infrastructure:

  • SuperLink Container: Central coordinator for federated learning

  • SuperNode Containers: Client nodes that train models on local data

Each component is defined in the docker/ directory:

  • docker/superlink.Dockerfile: Defines the SuperLink container

  • docker/supernode.Dockerfile: Defines the SuperNode container

  • docker/simulation.yml: Docker Compose file for local deployment

Simulation Deployment

For development or testing, you can run all components on a single machine:

  1. Deploy the SuperLink (central coordinator):

    docker compose -f docker/simulation.yml up -d superlink
    
  2. Deploy SuperNodes (client nodes):

    docker compose -f docker/simulation.yml up -d supernode-1 supernode-2
    
  3. Verify the containers are running:

    docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
    

This setup creates a simulated federated learning environment all on your local machine using host network mode, which enables containers to access local services like DataClay through localhost.

Production Deployment

In a production environment, you’ll deploy components across multiple machines:

  1. Controller Machine Setup:

    Deploy the SuperLink container on your controller machine:

    # On controller machine
    cd /path/to/icos-fl
    docker compose -f docker/simulation.yml up -d superlink
    
  2. Node Machine Setup:

    On each node machine, edit the docker/simulation.yml file to update the SuperLink address:

    # Example for supernode-1
    command:
      - --insecure
      - --superlink
      - controller.ip.address:9092  # Replace with actual controller IP
      - --clientappio-api-address
      - "0.0.0.0:9094"
    

    Then deploy the SuperNode:

    # On each node machine
    cd /path/to/icos-fl
    docker compose -f docker/simulation.yml up -d supernode-1  # Or supernode-2, etc.
    
  3. Start Federated Learning:

    The following command can be run from any machine, not just the controller. Just make sure to configure the controller’s IP address in your pyproject.toml file under the [tool.flwr.federations.remote-deployment] section:

    [tool.flwr.federations.remote-deployment]
    address = "controller.ip.address:9093"  # Replace with actual controller IP
    insecure = true
    

    Then start the federated learning process:

    cd /path/to/icos-fl
    flwr run . remote-deployment --stream
    

Container Configuration

The containers are configured through the Docker Compose file and command-line parameters:

SuperNode Container

The SuperNode container accepts these parameters:

supernode-1:
  build:
    context: ..
    dockerfile: docker/supernode.Dockerfile
  network_mode: "host"
  command:
    - --insecure
    - --superlink
    - localhost:9092  # In production, use the controller's IP
    - --clientappio-api-address
    - "0.0.0.0:9094"

Key parameters:

  • --insecure: Disables TLS (remove for production)

  • --superlink: Address of the SuperLink to connect to

  • --clientappio-api-address: Address for ClientApp communication

  • network_mode: "host": Uses host networking for optimal local service access

Building Custom Images

The project already includes custom Docker images that extend the base Flower images with project-specific dependencies:

  1. SuperLink Image:

    If you need to modify the SuperLink image:

    docker build -f docker/superlink.Dockerfile -t custom-icos-superlink:latest .
    
  2. SuperNode Image:

    If you need to modify the SuperNode image:

    docker build -f docker/supernode.Dockerfile -t custom-icos-supernode:latest .
    

Container Resource Limits

For production deployments, add resource limits to your containers:

# In docker-compose.yml or simulation.yml
superlink:
  # ... other configuration ...
  deploy:
    resources:
      limits:
        cpus: '2.0'
        memory: 2G
      reservations:
        cpus: '1.0'
        memory: 1G

supernode-1:
  # ... other configuration ...
  deploy:
    resources:
      limits:
        cpus: '1.0'
        memory: 1G
      reservations:
        cpus: '0.5'
        memory: 512M

Persisting Data

The simulation.yml file already includes volume mounts for persisting model data:

superlink:
  # ... other configuration ...
  volumes:
    - ..:/app/model:rw  # Mount the project directory for model storage

Security Considerations

For production deployments:

  1. Enable TLS:

    • Remove the --insecure flag

    • Add SSL certificates and configuration:

    superlink:
      # ... other configuration ...
      command:
        - --ssl-certfile=/path/to/cert.pem
        - --ssl-keyfile=/path/to/key.pem
        - --ssl-ca-certfile=/path/to/ca.pem
      volumes:
        - /path/to/certs:/path/to:ro
    
  2. Network Isolation:

    • When not using network_mode: "host", create dedicated networks

    • Restrict port exposure to the minimum necessary

  3. Authentication:

    • Enable authentication for SuperLink and SuperNodes:

    superlink:
      # ... other configuration ...
      command:
        # ... other parameters ...
        - --auth-list-public-keys=/path/to/keys.csv
      volumes:
        - /path/to/keys.csv:/path/to/keys.csv:ro
    

Troubleshooting

  1. Container Fails to Start:

    Check logs for errors:

    docker logs icos-fl_superlink_1
    
  2. Network Connectivity Issues:

    Ensure ports are accessible:

    nc -zv localhost 9092
    
  3. Resource Constraints:

    Check if containers are being killed due to OOM (Out of Memory):

    docker stats
    

Cleanup

To stop and remove all containers:

docker compose -f docker/simulation.yml down