LSTM Fundamentals¶
This page explains the Long Short-Term Memory (LSTM) neural network architecture used in ICOS-FL.
Introduction to LSTM Networks¶
Long Short-Term Memory (LSTM) networks are a type of Recurrent Neural Network (RNN) specifically designed to address the vanishing gradient problem that affects standard RNNs.
LSTM Cell Architecture¶
LSTMs are particularly well-suited for time series prediction because they can:
Capture Long-term Dependencies: Remember patterns over extended sequences
Handle Variable-length Sequences: Process sequences of different lengths
Maintain Context: Carry forward relevant information while forgetting irrelevant details
Avoid Gradient Problems: Overcome vanishing/exploding gradients with a specialized gating mechanism
LSTM Architecture¶
Core Components of an LSTM Cell¶
An LSTM cell consists of several key components:
Cell State (c_t): The “memory” of the LSTM, passing information through the sequence
Hidden State (h_t): The output for the current time step
Forget Gate (f_t): Controls what information to discard from the cell state
Input Gate (i_t): Controls what new information to add to the cell state
Output Gate (o_t): Controls what information from the cell state to output
Mathematical Formulation¶
The LSTM cell performs these operations:
Where: - \(\sigma\) is the sigmoid activation function - \(\odot\) is element-wise multiplication - \(W_f, W_i, W_c, W_o\) are weight matrices - \(b_f, b_i, b_c, b_o\) are bias vectors - \(x_t\) is the input at time step t - \(h_{t-1}\) is the hidden state from the previous time step - \(c_{t-1}\) is the cell state from the previous time step
LSTM Implementation in ICOS-FL¶
ICOS-FL uses PyTorch’s LSTM implementation:
class LSTMModel(nn.Module):
def __init__(
self,
hidden_layer_size: int,
time_step: int,
num_layers: int,
output_size: int = 1,
) -> None:
super().__init__()
self.hidden_layer_size = hidden_layer_size
self.time_step = time_step
self.num_layers = num_layers
# LSTM layer
self.lstm = nn.LSTM(time_step, hidden_layer_size, num_layers, batch_first=True)
# Linear layer to produce output prediction
self.linear = nn.Linear(hidden_layer_size, output_size)
def forward(self, input_seq: torch.Tensor) -> torch.Tensor:
lstm_out, _ = self.lstm(input_seq)
predictions = self.linear(lstm_out[:, -1, :])
return predictions
Key hyperparameters include:
hidden_layer_size: Number of units in the LSTM hidden layer
time_step: Length of input sequences
num_layers: Number of stacked LSTM layers
output_size: Dimension of the output prediction (default: 1)
Input and Output Dimensions¶
ICOS-FL’s LSTM model expects inputs and produces outputs with these dimensions:
Input:
[batch_size, 1, time_step]- batch_size: Number of sequences in a batch - 1: Number of features per time step - time_step: Length of each sequenceOutput:
[batch_size, output_size]- batch_size: Same as input - output_size: Prediction dimension (typically 1 for single-value forecasting)
The model extracts the final hidden state (lstm_out[:, -1, :]) and passes it through a linear layer to produce the prediction.
Why LSTM for Time Series Prediction¶
LSTMs are ideal for time series prediction because they can:
Identify Temporal Patterns: Recognize recurring patterns in system metrics
Remember Past States: Maintain information about previous system conditions
Handle Variable Sampling: Work with irregularly sampled data
Detect Anomalies: Identify unusual patterns in resource usage
Forecast Future Values: Predict upcoming resource demands
In ICOS-FL, these capabilities are applied to system metrics like CPU usage, memory consumption, and power usage.
Batching and Sequence Creation¶
ICOS-FL processes time series data into LSTM-compatible batches:
Sliding Window: Creates overlapping sequences from the time series
Sequence Formatting: Each sequence consists of
time_stepconsecutive data pointsTarget Creation: The target is the next value after the sequence
Batching: Multiple sequences are grouped into batches
This is handled by the TimeSeriesDataset class:
class TimeSeriesDataset(Dataset):
def __init__(
self,
df: pd.DataFrame,
start_index: int,
population: int,
time_step: int,
metric: str,
device: torch.device,
) -> None:
# Initialize dataset
# ...
def __getitem__(self, index: int) -> Tuple[torch.Tensor, torch.Tensor]:
# Extract sequence and target
sequence_values = self.data.iloc[index : index + self.time_step].values
input_tensor = torch.tensor(sequence_values).unsqueeze(0)
# Target is the next value after the sequence
target_value = self.data.iloc[index + self.time_step]
target_tensor = torch.tensor(target_value).float().unsqueeze(0)
return input_tensor, target_tensor
Training Process¶
The LSTM training process in ICOS-FL follows these steps:
Normalization: Scale input data to zero mean and unit variance
Sequence Creation: Generate overlapping sequences from the time series
Batch Formation: Group sequences into batches
Forward Pass: Process batches through the LSTM network
Loss Calculation: Compute Mean Squared Error between predictions and targets
Backward Pass: Propagate the error gradient backward
Parameter Update: Update model weights using optimization algorithm
Repeat: Continue for specified number of epochs
This training process occurs locally at each node in the federated learning setup.
LSTM Advantages for ICOS-FL¶
LSTM networks provide several advantages in the ICOS-FL context:
Compact Model Size: LSTM models are relatively small, minimizing communication overhead
Incremental Learning: Can be updated with new data without complete retraining
Adaptive Memory: Focus on relevant patterns in varying system loads
Robust to Noise: Filter out noise in system metrics measurements
Temporal Modeling: Capture complex temporal dependencies in resource usage
Limitations and Considerations¶
When using LSTMs in ICOS-FL, consider these limitations:
Training Data Requirements: LSTMs need sufficient historical data
Hyperparameter Sensitivity: Performance depends on proper hyperparameter tuning
Computational Cost: Training can be resource-intensive
Interpretability: Internal representations are not easily interpretable
Sequence Length Tradeoff: Longer sequences capture more context but require more resources
ICOS-FL addresses these limitations through careful configuration and the federated learning approach.