================= LSTM Fundamentals ================= This page explains the Long Short-Term Memory (LSTM) neural network architecture used in ICOS-FL. Introduction to LSTM Networks ----------------------------- Long Short-Term Memory (LSTM) networks are a type of Recurrent Neural Network (RNN) specifically designed to address the vanishing gradient problem that affects standard RNNs. .. figure:: ../../_static/images/lstm_cell.svg :alt: LSTM Cell Structure :align: center LSTM Cell Architecture LSTMs are particularly well-suited for time series prediction because they can: 1. **Capture Long-term Dependencies**: Remember patterns over extended sequences 2. **Handle Variable-length Sequences**: Process sequences of different lengths 3. **Maintain Context**: Carry forward relevant information while forgetting irrelevant details 4. **Avoid Gradient Problems**: Overcome vanishing/exploding gradients with a specialized gating mechanism LSTM Architecture ----------------- Core Components of an LSTM Cell ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ An LSTM cell consists of several key components: 1. **Cell State (c_t)**: The "memory" of the LSTM, passing information through the sequence 2. **Hidden State (h_t)**: The output for the current time step 3. **Forget Gate (f_t)**: Controls what information to discard from the cell state 4. **Input Gate (i_t)**: Controls what new information to add to the cell state 5. **Output Gate (o_t)**: Controls what information from the cell state to output Mathematical Formulation ~~~~~~~~~~~~~~~~~~~~~~~~ The LSTM cell performs these operations: .. math:: f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \\ i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \\ \tilde{c}_t &= \tanh(W_c \cdot [h_{t-1}, x_t] + b_c) \\ c_t &= f_t \odot c_{t-1} + i_t \odot \tilde{c}_t \\ o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \\ h_t &= o_t \odot \tanh(c_t) Where: - :math:`\sigma` is the sigmoid activation function - :math:`\odot` is element-wise multiplication - :math:`W_f, W_i, W_c, W_o` are weight matrices - :math:`b_f, b_i, b_c, b_o` are bias vectors - :math:`x_t` is the input at time step t - :math:`h_{t-1}` is the hidden state from the previous time step - :math:`c_{t-1}` is the cell state from the previous time step LSTM Implementation in ICOS-FL ------------------------------ ICOS-FL uses PyTorch's LSTM implementation: .. code-block:: python class LSTMModel(nn.Module): def __init__( self, hidden_layer_size: int, time_step: int, num_layers: int, output_size: int = 1, ) -> None: super().__init__() self.hidden_layer_size = hidden_layer_size self.time_step = time_step self.num_layers = num_layers # LSTM layer self.lstm = nn.LSTM(time_step, hidden_layer_size, num_layers, batch_first=True) # Linear layer to produce output prediction self.linear = nn.Linear(hidden_layer_size, output_size) def forward(self, input_seq: torch.Tensor) -> torch.Tensor: lstm_out, _ = self.lstm(input_seq) predictions = self.linear(lstm_out[:, -1, :]) return predictions Key hyperparameters include: 1. **hidden_layer_size**: Number of units in the LSTM hidden layer 2. **time_step**: Length of input sequences 3. **num_layers**: Number of stacked LSTM layers 4. **output_size**: Dimension of the output prediction (default: 1) Input and Output Dimensions ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ICOS-FL's LSTM model expects inputs and produces outputs with these dimensions: - **Input**: ``[batch_size, 1, time_step]`` - batch_size: Number of sequences in a batch - 1: Number of features per time step - time_step: Length of each sequence - **Output**: ``[batch_size, output_size]`` - batch_size: Same as input - output_size: Prediction dimension (typically 1 for single-value forecasting) The model extracts the final hidden state (``lstm_out[:, -1, :]``) and passes it through a linear layer to produce the prediction. Why LSTM for Time Series Prediction ----------------------------------- LSTMs are ideal for time series prediction because they can: 1. **Identify Temporal Patterns**: Recognize recurring patterns in system metrics 2. **Remember Past States**: Maintain information about previous system conditions 3. **Handle Variable Sampling**: Work with irregularly sampled data 4. **Detect Anomalies**: Identify unusual patterns in resource usage 5. **Forecast Future Values**: Predict upcoming resource demands In ICOS-FL, these capabilities are applied to system metrics like CPU usage, memory consumption, and power usage. Batching and Sequence Creation ------------------------------ ICOS-FL processes time series data into LSTM-compatible batches: 1. **Sliding Window**: Creates overlapping sequences from the time series 2. **Sequence Formatting**: Each sequence consists of `time_step` consecutive data points 3. **Target Creation**: The target is the next value after the sequence 4. **Batching**: Multiple sequences are grouped into batches This is handled by the ``TimeSeriesDataset`` class: .. code-block:: python class TimeSeriesDataset(Dataset): def __init__( self, df: pd.DataFrame, start_index: int, population: int, time_step: int, metric: str, device: torch.device, ) -> None: # Initialize dataset # ... def __getitem__(self, index: int) -> Tuple[torch.Tensor, torch.Tensor]: # Extract sequence and target sequence_values = self.data.iloc[index : index + self.time_step].values input_tensor = torch.tensor(sequence_values).unsqueeze(0) # Target is the next value after the sequence target_value = self.data.iloc[index + self.time_step] target_tensor = torch.tensor(target_value).float().unsqueeze(0) return input_tensor, target_tensor Training Process ---------------- The LSTM training process in ICOS-FL follows these steps: 1. **Normalization**: Scale input data to zero mean and unit variance 2. **Sequence Creation**: Generate overlapping sequences from the time series 3. **Batch Formation**: Group sequences into batches 4. **Forward Pass**: Process batches through the LSTM network 5. **Loss Calculation**: Compute Mean Squared Error between predictions and targets 6. **Backward Pass**: Propagate the error gradient backward 7. **Parameter Update**: Update model weights using optimization algorithm 8. **Repeat**: Continue for specified number of epochs This training process occurs locally at each node in the federated learning setup. LSTM Advantages for ICOS-FL --------------------------- LSTM networks provide several advantages in the ICOS-FL context: 1. **Compact Model Size**: LSTM models are relatively small, minimizing communication overhead 2. **Incremental Learning**: Can be updated with new data without complete retraining 3. **Adaptive Memory**: Focus on relevant patterns in varying system loads 4. **Robust to Noise**: Filter out noise in system metrics measurements 5. **Temporal Modeling**: Capture complex temporal dependencies in resource usage Limitations and Considerations ------------------------------ When using LSTMs in ICOS-FL, consider these limitations: 1. **Training Data Requirements**: LSTMs need sufficient historical data 2. **Hyperparameter Sensitivity**: Performance depends on proper hyperparameter tuning 3. **Computational Cost**: Training can be resource-intensive 4. **Interpretability**: Internal representations are not easily interpretable 5. **Sequence Length Tradeoff**: Longer sequences capture more context but require more resources ICOS-FL addresses these limitations through careful configuration and the federated learning approach.