Style Guide¶

This page describes the coding style and conventions used in ICOS-FL.

Code Formatting¶

ICOS-FL uses the following tools for code formatting:

Black: Code formatter
isort: Import sorter
Ruff: Linter
mypy: Type checker

These tools are configured in pyproject.toml and enforced via pre-commit hooks.

Black Configuration¶

[tool.black]
line-length = 99
target-version = ["py310", "py311", "py312"]

isort Configuration¶

[tool.isort]
profile = "black"
multi_line_output = 3
include_trailing_comma = true
force_grid_wrap = 0
use_parentheses = true
line_length = 99
known_first_party = ["icos_fl"]

Ruff Configuration¶

[tool.ruff]
target-version = "py310"
output-format = "full"
line-length = 99
fix = true

[tool.ruff.lint]
select = [
  "E", "F", "W",  # flake8
  "C",            # mccabe
  "I",            # isort
  "N",            # pep8-naming
  "D",            # flake8-docstrings
  "ANN",          # flake8-annotations
  # ...
]

mypy Configuration¶

[tool.mypy]
warn_return_any = true
warn_unused_configs = true
# ...

Naming Conventions¶

Follow these naming conventions:

Variables and Functions¶

Use snake_case for variable and function names
Use descriptive names that indicate purpose
Avoid single-letter names except for loop indices

# Good
time_series_data = fetch_data()
normalized_data = normalize_data(time_series_data)

# Bad
ts = fetch_data()
nd = normalize_data(ts)

Classes¶

Use PascalCase for class names
Use noun phrases that describe what the class represents

# Good
class TimeSeriesDataset:
    pass

class LSTMModel:
    pass

# Bad
class process_data:
    pass

class data:
    pass

Constants¶

Use UPPER_CASE for constants
Constants should be defined at the module level

# Good
MAX_SEQUENCE_LENGTH = 100
DEFAULT_BATCH_SIZE = 64

# Bad
maxSequenceLength = 100
default_batch_size = 64

Modules and Packages¶

Use lowercase for module and package names
Use short, descriptive names
Avoid underscores in module names

# Good
from icos_fl.utils.processor import Processor

# Bad
from icos_fl.Utils.data_processor import Processor

Type Variables¶

Use PascalCase for type variables
Use single-letter names for simple type variables
Use descriptive names for complex type variables

# Good
T = TypeVar('T')
TensorData = TypeVar('TensorData', bound=torch.Tensor)

# Bad
t = TypeVar('t')
tensor_data = TypeVar('tensor_data', bound=torch.Tensor)

Docstring Style¶

ICOS-FL uses Google-style docstrings:

def fetch_data(timeout: int = 60) -> pd.DataFrame:
    """Fetch time series data from DataClay.

    Args:
        timeout: Maximum time to wait for data in seconds

    Returns:
        DataFrame containing the processed time series data

    Raises:
        TimeoutError: If no data is available within the timeout period
        DataClayException: If there is an error connecting to DataClay

    Example:
        >>> fetcher = Fetcher()
        >>> df = fetcher.fetch_data(timeout=30)
        >>> print(df.shape)
        (300, 4)
    """

Class Docstrings¶

class TimeSeriesDataset(Dataset):
    """Dataset for time series prediction with sliding window approach.

    Creates sequences of consecutive time steps as inputs and
    uses the next value as the prediction target.

    Attributes:
        data: The time series data
        time_step: Number of time steps in each sequence
        metric: The target metric column
        device: PyTorch device to place tensors on
    """

Module Docstrings¶

"""Utilities for processing time series data.

This module provides classes and functions for processing time series data
for LSTM model training, including normalization, sequence creation, and
DataLoader generation.
"""

File Layout¶

Follow this order for file contents:

Module docstring
Imports (grouped as described in Imports section)
Constants
Global variables
Classes
Functions
Main execution block (if applicable)

Import Conventions¶

Organize imports in the following groups, separated by a blank line:

Standard library imports
Third-party imports
Local application imports

Within each group, sort imports alphabetically.

# Standard library imports
import os
import sys
import time
from typing import Dict, List, Optional, Tuple

# Third-party imports
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from dataclay import Client, DataClayObject

# Local application imports
from icos_fl.models.lstm import LSTMModel
from icos_fl.utils.colors import BGRN, WHT, paint

Avoid wildcard imports:

# Good
from torch.nn import LSTM, Linear, MSELoss

# Bad
from torch.nn import *

Line Length¶

Maximum line length is 99 characters. For long lines:

Use parentheses for line continuation in expressions
Use backslashes only when necessary (prefer parentheses)
Break lines before operators

# Good
long_expression = (
    first_variable + second_variable + third_variable
    + fourth_variable + fifth_variable
)

# Also good
def long_function_name(
    arg1: str,
    arg2: int,
    arg3: Optional[Dict[str, Any]] = None,
) -> None:
    pass

Whitespace¶

Follow these whitespace conventions:

Use 4 spaces for indentation (no tabs)
Add two blank lines before top-level classes and functions
Add one blank line before method definitions
Add blank lines to separate logical sections of code

Comments¶

Use comments sparingly and only when needed
Prefer docstrings for public functions and classes
Use comments to explain complex logic or non-obvious decisions
Keep comments up-to-date with code changes

# Good: explains the purpose of complex code
# Use recursive approach for multi-step prediction to simulate real forecasting
for _ in range(forecast_steps):
    next_pred = model(curr_seq)
    predictions.append(next_pred.item())
    curr_seq = torch.cat((curr_seq[:, 1:], next_pred.unsqueeze(0).unsqueeze(0)), dim=1)

# Bad: states the obvious
# Increment the counter
counter += 1

Type Annotations¶

Use type annotations for all function parameters and return values
Use Optional for parameters that can be None
Use Union for parameters that can have multiple types
Use concrete types (e.g., List[int] instead of list)
Use TypeVar for generic types

def create_data_loaders(
    df: pd.DataFrame,
    time_step: Optional[int] = None,
    batch_size: Optional[int] = None,
    train_ratio: Optional[float] = None,
) -> Tuple[DataLoader, DataLoader, TimeSeriesDataset, TimeSeriesDataset]:
    """Create data loaders for training and validation."""
    # ...

Exception Handling¶

Be specific about exceptions you catch
Use multiple except blocks for different exceptions
Provide context in exception messages
Avoid bare except statements

# Good
try:
    data = self.time_series_data.get_dataframe()
except DataClayException as e:
    raise DataClayException(f"Failed to retrieve time series data: {e}") from e
except TimeoutError as e:
    raise TimeoutError(f"Timed out waiting for data: {e}") from e

# Bad
try:
    data = self.time_series_data.get_dataframe()
except Exception as e:
    raise Exception(f"Error: {e}")

String Formatting¶

Use f-strings for string interpolation
Use triple quotes for multi-line strings
Use r-strings for raw strings (especially regex)

# Good
message = f"Processing {len(df)} rows of data for metric: {metric}"

docstring = """
This is a multi-line docstring.
It explains the purpose of this function.
"""

pattern = r"^\d{4}-\d{2}-\d{2}$"  # YYYY-MM-DD

# Bad
message = "Processing " + str(len(df)) + " rows of data for metric: " + metric

Object-Oriented Programming¶

Follow the Single Responsibility Principle
Use composition over inheritance
Make attributes private when appropriate
Use properties for computed attributes
Initialize all instance variables in __init__

class Processor:
    def __init__(
        self,
        time_step: int,
        metric: str,
        batch_size: int = 64,
        train_ratio: float = 0.8,
    ) -> None:
        self.time_step = time_step
        self.metric = metric
        self.batch_size = batch_size
        self.train_ratio = train_ratio

    @property
    def sequence_length(self) -> int:
        """Return the length of sequences created by this processor."""
        return self.time_step

Performance Considerations¶

Use built-in functions and methods when possible
Use vectorized operations for NumPy and pandas
Avoid unnecessary computation in loops
Use generators for large data processing
Profile code before premature optimization

# Good: vectorized operations
normalized_data = (df - df.mean()) / df.std()

# Bad: inefficient loop
normalized_data = pd.DataFrame(index=df.index)
for col in df.columns:
    mean = df[col].mean()
    std = df[col].std()
    normalized_data[col] = [(x - mean) / std for x in df[col]]