LSTM Fundamentals¶

This page explains the Long Short-Term Memory (LSTM) neural network architecture used in ICOS-FL.

Introduction to LSTM Networks¶

Long Short-Term Memory (LSTM) networks are a type of Recurrent Neural Network (RNN) specifically designed to address the vanishing gradient problem that affects standard RNNs.

LSTM Cell Structure — LSTM Cell Architecture¶

LSTMs are particularly well-suited for time series prediction because they can:

Capture Long-term Dependencies: Remember patterns over extended sequences
Handle Variable-length Sequences: Process sequences of different lengths
Maintain Context: Carry forward relevant information while forgetting irrelevant details
Avoid Gradient Problems: Overcome vanishing/exploding gradients with a specialized gating mechanism

LSTM Architecture¶

Core Components of an LSTM Cell¶

An LSTM cell consists of several key components:

Cell State (c_t): The “memory” of the LSTM, passing information through the sequence
Hidden State (h_t): The output for the current time step
Forget Gate (f_t): Controls what information to discard from the cell state
Input Gate (i_t): Controls what new information to add to the cell state
Output Gate (o_t): Controls what information from the cell state to output

Mathematical Formulation¶

The LSTM cell performs these operations:

\[\begin{split}f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \\ i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \\ \tilde{c}_t &= \tanh(W_c \cdot [h_{t-1}, x_t] + b_c) \\ c_t &= f_t \odot c_{t-1} + i_t \odot \tilde{c}_t \\ o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \\ h_t &= o_t \odot \tanh(c_t)\end{split}\]

Where: - \(\sigma\) is the sigmoid activation function - \(\odot\) is element-wise multiplication - \(W_f, W_i, W_c, W_o\) are weight matrices - \(b_f, b_i, b_c, b_o\) are bias vectors - \(x_t\) is the input at time step t - \(h_{t-1}\) is the hidden state from the previous time step - \(c_{t-1}\) is the cell state from the previous time step

LSTM Implementation in ICOS-FL¶

ICOS-FL uses PyTorch’s LSTM implementation:

class LSTMModel(nn.Module):
    def __init__(
        self,
        hidden_layer_size: int,
        time_step: int,
        num_layers: int,
        output_size: int = 1,
    ) -> None:
        super().__init__()

        self.hidden_layer_size = hidden_layer_size
        self.time_step = time_step
        self.num_layers = num_layers

        # LSTM layer
        self.lstm = nn.LSTM(time_step, hidden_layer_size, num_layers, batch_first=True)

        # Linear layer to produce output prediction
        self.linear = nn.Linear(hidden_layer_size, output_size)

    def forward(self, input_seq: torch.Tensor) -> torch.Tensor:
        lstm_out, _ = self.lstm(input_seq)
        predictions = self.linear(lstm_out[:, -1, :])
        return predictions

Key hyperparameters include:

hidden_layer_size: Number of units in the LSTM hidden layer
time_step: Length of input sequences
num_layers: Number of stacked LSTM layers
output_size: Dimension of the output prediction (default: 1)

Input and Output Dimensions¶

ICOS-FL’s LSTM model expects inputs and produces outputs with these dimensions:

Input: [batch_size, 1, time_step] - batch_size: Number of sequences in a batch - 1: Number of features per time step - time_step: Length of each sequence
Output: [batch_size, output_size] - batch_size: Same as input - output_size: Prediction dimension (typically 1 for single-value forecasting)

The model extracts the final hidden state (lstm_out[:, -1, :]) and passes it through a linear layer to produce the prediction.

Why LSTM for Time Series Prediction¶

LSTMs are ideal for time series prediction because they can:

Identify Temporal Patterns: Recognize recurring patterns in system metrics
Remember Past States: Maintain information about previous system conditions
Handle Variable Sampling: Work with irregularly sampled data
Detect Anomalies: Identify unusual patterns in resource usage
Forecast Future Values: Predict upcoming resource demands

In ICOS-FL, these capabilities are applied to system metrics like CPU usage, memory consumption, and power usage.

Batching and Sequence Creation¶

ICOS-FL processes time series data into LSTM-compatible batches:

Sliding Window: Creates overlapping sequences from the time series
Sequence Formatting: Each sequence consists of time_step consecutive data points
Target Creation: The target is the next value after the sequence
Batching: Multiple sequences are grouped into batches

This is handled by the TimeSeriesDataset class:

class TimeSeriesDataset(Dataset):
    def __init__(
        self,
        df: pd.DataFrame,
        start_index: int,
        population: int,
        time_step: int,
        metric: str,
        device: torch.device,
    ) -> None:
        # Initialize dataset
        # ...

    def __getitem__(self, index: int) -> Tuple[torch.Tensor, torch.Tensor]:
        # Extract sequence and target
        sequence_values = self.data.iloc[index : index + self.time_step].values
        input_tensor = torch.tensor(sequence_values).unsqueeze(0)

        # Target is the next value after the sequence
        target_value = self.data.iloc[index + self.time_step]
        target_tensor = torch.tensor(target_value).float().unsqueeze(0)

        return input_tensor, target_tensor

Training Process¶

The LSTM training process in ICOS-FL follows these steps:

Normalization: Scale input data to zero mean and unit variance
Sequence Creation: Generate overlapping sequences from the time series
Batch Formation: Group sequences into batches
Forward Pass: Process batches through the LSTM network
Loss Calculation: Compute Mean Squared Error between predictions and targets
Backward Pass: Propagate the error gradient backward
Parameter Update: Update model weights using optimization algorithm
Repeat: Continue for specified number of epochs

This training process occurs locally at each node in the federated learning setup.

LSTM Advantages for ICOS-FL¶

LSTM networks provide several advantages in the ICOS-FL context:

Compact Model Size: LSTM models are relatively small, minimizing communication overhead
Incremental Learning: Can be updated with new data without complete retraining
Adaptive Memory: Focus on relevant patterns in varying system loads
Robust to Noise: Filter out noise in system metrics measurements
Temporal Modeling: Capture complex temporal dependencies in resource usage

Limitations and Considerations¶

When using LSTMs in ICOS-FL, consider these limitations:

Training Data Requirements: LSTMs need sufficient historical data
Hyperparameter Sensitivity: Performance depends on proper hyperparameter tuning
Computational Cost: Training can be resource-intensive
Interpretability: Internal representations are not easily interpretable
Sequence Length Tradeoff: Longer sequences capture more context but require more resources

ICOS-FL addresses these limitations through careful configuration and the federated learning approach.