LSTM Fundamentals

This page explains the Long Short-Term Memory (LSTM) neural network architecture used in ICOS-FL.

Introduction to LSTM Networks

Long Short-Term Memory (LSTM) networks are a type of Recurrent Neural Network (RNN) specifically designed to address the vanishing gradient problem that affects standard RNNs.

LSTM Cell Structure

LSTM Cell Architecture

LSTMs are particularly well-suited for time series prediction because they can:

  1. Capture Long-term Dependencies: Remember patterns over extended sequences

  2. Handle Variable-length Sequences: Process sequences of different lengths

  3. Maintain Context: Carry forward relevant information while forgetting irrelevant details

  4. Avoid Gradient Problems: Overcome vanishing/exploding gradients with a specialized gating mechanism

LSTM Architecture

Core Components of an LSTM Cell

An LSTM cell consists of several key components:

  1. Cell State (c_t): The “memory” of the LSTM, passing information through the sequence

  2. Hidden State (h_t): The output for the current time step

  3. Forget Gate (f_t): Controls what information to discard from the cell state

  4. Input Gate (i_t): Controls what new information to add to the cell state

  5. Output Gate (o_t): Controls what information from the cell state to output

Mathematical Formulation

The LSTM cell performs these operations:

\[\begin{split}f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \\ i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \\ \tilde{c}_t &= \tanh(W_c \cdot [h_{t-1}, x_t] + b_c) \\ c_t &= f_t \odot c_{t-1} + i_t \odot \tilde{c}_t \\ o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \\ h_t &= o_t \odot \tanh(c_t)\end{split}\]

Where: - \(\sigma\) is the sigmoid activation function - \(\odot\) is element-wise multiplication - \(W_f, W_i, W_c, W_o\) are weight matrices - \(b_f, b_i, b_c, b_o\) are bias vectors - \(x_t\) is the input at time step t - \(h_{t-1}\) is the hidden state from the previous time step - \(c_{t-1}\) is the cell state from the previous time step

LSTM Implementation in ICOS-FL

ICOS-FL uses PyTorch’s LSTM implementation:

class LSTMModel(nn.Module):
    def __init__(
        self,
        hidden_layer_size: int,
        time_step: int,
        num_layers: int,
        output_size: int = 1,
    ) -> None:
        super().__init__()

        self.hidden_layer_size = hidden_layer_size
        self.time_step = time_step
        self.num_layers = num_layers

        # LSTM layer
        self.lstm = nn.LSTM(time_step, hidden_layer_size, num_layers, batch_first=True)

        # Linear layer to produce output prediction
        self.linear = nn.Linear(hidden_layer_size, output_size)

    def forward(self, input_seq: torch.Tensor) -> torch.Tensor:
        lstm_out, _ = self.lstm(input_seq)
        predictions = self.linear(lstm_out[:, -1, :])
        return predictions

Key hyperparameters include:

  1. hidden_layer_size: Number of units in the LSTM hidden layer

  2. time_step: Length of input sequences

  3. num_layers: Number of stacked LSTM layers

  4. output_size: Dimension of the output prediction (default: 1)

Input and Output Dimensions

ICOS-FL’s LSTM model expects inputs and produces outputs with these dimensions:

  • Input: [batch_size, 1, time_step] - batch_size: Number of sequences in a batch - 1: Number of features per time step - time_step: Length of each sequence

  • Output: [batch_size, output_size] - batch_size: Same as input - output_size: Prediction dimension (typically 1 for single-value forecasting)

The model extracts the final hidden state (lstm_out[:, -1, :]) and passes it through a linear layer to produce the prediction.

Why LSTM for Time Series Prediction

LSTMs are ideal for time series prediction because they can:

  1. Identify Temporal Patterns: Recognize recurring patterns in system metrics

  2. Remember Past States: Maintain information about previous system conditions

  3. Handle Variable Sampling: Work with irregularly sampled data

  4. Detect Anomalies: Identify unusual patterns in resource usage

  5. Forecast Future Values: Predict upcoming resource demands

In ICOS-FL, these capabilities are applied to system metrics like CPU usage, memory consumption, and power usage.

Batching and Sequence Creation

ICOS-FL processes time series data into LSTM-compatible batches:

  1. Sliding Window: Creates overlapping sequences from the time series

  2. Sequence Formatting: Each sequence consists of time_step consecutive data points

  3. Target Creation: The target is the next value after the sequence

  4. Batching: Multiple sequences are grouped into batches

This is handled by the TimeSeriesDataset class:

class TimeSeriesDataset(Dataset):
    def __init__(
        self,
        df: pd.DataFrame,
        start_index: int,
        population: int,
        time_step: int,
        metric: str,
        device: torch.device,
    ) -> None:
        # Initialize dataset
        # ...

    def __getitem__(self, index: int) -> Tuple[torch.Tensor, torch.Tensor]:
        # Extract sequence and target
        sequence_values = self.data.iloc[index : index + self.time_step].values
        input_tensor = torch.tensor(sequence_values).unsqueeze(0)

        # Target is the next value after the sequence
        target_value = self.data.iloc[index + self.time_step]
        target_tensor = torch.tensor(target_value).float().unsqueeze(0)

        return input_tensor, target_tensor

Training Process

The LSTM training process in ICOS-FL follows these steps:

  1. Normalization: Scale input data to zero mean and unit variance

  2. Sequence Creation: Generate overlapping sequences from the time series

  3. Batch Formation: Group sequences into batches

  4. Forward Pass: Process batches through the LSTM network

  5. Loss Calculation: Compute Mean Squared Error between predictions and targets

  6. Backward Pass: Propagate the error gradient backward

  7. Parameter Update: Update model weights using optimization algorithm

  8. Repeat: Continue for specified number of epochs

This training process occurs locally at each node in the federated learning setup.

LSTM Advantages for ICOS-FL

LSTM networks provide several advantages in the ICOS-FL context:

  1. Compact Model Size: LSTM models are relatively small, minimizing communication overhead

  2. Incremental Learning: Can be updated with new data without complete retraining

  3. Adaptive Memory: Focus on relevant patterns in varying system loads

  4. Robust to Noise: Filter out noise in system metrics measurements

  5. Temporal Modeling: Capture complex temporal dependencies in resource usage

Limitations and Considerations

When using LSTMs in ICOS-FL, consider these limitations:

  1. Training Data Requirements: LSTMs need sufficient historical data

  2. Hyperparameter Sensitivity: Performance depends on proper hyperparameter tuning

  3. Computational Cost: Training can be resource-intensive

  4. Interpretability: Internal representations are not easily interpretable

  5. Sequence Length Tradeoff: Longer sequences capture more context but require more resources

ICOS-FL addresses these limitations through careful configuration and the federated learning approach.