Style Guide

This page describes the coding style and conventions used in ICOS-FL.

Code Formatting

ICOS-FL uses the following tools for code formatting:

  • Black: Code formatter

  • isort: Import sorter

  • Ruff: Linter

  • mypy: Type checker

These tools are configured in pyproject.toml and enforced via pre-commit hooks.

Black Configuration

[tool.black]
line-length = 99
target-version = ["py310", "py311", "py312"]

isort Configuration

[tool.isort]
profile = "black"
multi_line_output = 3
include_trailing_comma = true
force_grid_wrap = 0
use_parentheses = true
line_length = 99
known_first_party = ["icos_fl"]

Ruff Configuration

[tool.ruff]
target-version = "py310"
output-format = "full"
line-length = 99
fix = true

[tool.ruff.lint]
select = [
  "E", "F", "W",  # flake8
  "C",            # mccabe
  "I",            # isort
  "N",            # pep8-naming
  "D",            # flake8-docstrings
  "ANN",          # flake8-annotations
  # ...
]

mypy Configuration

[tool.mypy]
warn_return_any = true
warn_unused_configs = true
# ...

Naming Conventions

Follow these naming conventions:

Variables and Functions

  • Use snake_case for variable and function names

  • Use descriptive names that indicate purpose

  • Avoid single-letter names except for loop indices

# Good
time_series_data = fetch_data()
normalized_data = normalize_data(time_series_data)

# Bad
ts = fetch_data()
nd = normalize_data(ts)

Classes

  • Use PascalCase for class names

  • Use noun phrases that describe what the class represents

# Good
class TimeSeriesDataset:
    pass

class LSTMModel:
    pass

# Bad
class process_data:
    pass

class data:
    pass

Constants

  • Use UPPER_CASE for constants

  • Constants should be defined at the module level

# Good
MAX_SEQUENCE_LENGTH = 100
DEFAULT_BATCH_SIZE = 64

# Bad
maxSequenceLength = 100
default_batch_size = 64

Modules and Packages

  • Use lowercase for module and package names

  • Use short, descriptive names

  • Avoid underscores in module names

# Good
from icos_fl.utils.processor import Processor

# Bad
from icos_fl.Utils.data_processor import Processor

Type Variables

  • Use PascalCase for type variables

  • Use single-letter names for simple type variables

  • Use descriptive names for complex type variables

# Good
T = TypeVar('T')
TensorData = TypeVar('TensorData', bound=torch.Tensor)

# Bad
t = TypeVar('t')
tensor_data = TypeVar('tensor_data', bound=torch.Tensor)

Docstring Style

ICOS-FL uses Google-style docstrings:

def fetch_data(timeout: int = 60) -> pd.DataFrame:
    """Fetch time series data from DataClay.

    Args:
        timeout: Maximum time to wait for data in seconds

    Returns:
        DataFrame containing the processed time series data

    Raises:
        TimeoutError: If no data is available within the timeout period
        DataClayException: If there is an error connecting to DataClay

    Example:
        >>> fetcher = Fetcher()
        >>> df = fetcher.fetch_data(timeout=30)
        >>> print(df.shape)
        (300, 4)
    """

Class Docstrings

class TimeSeriesDataset(Dataset):
    """Dataset for time series prediction with sliding window approach.

    Creates sequences of consecutive time steps as inputs and
    uses the next value as the prediction target.

    Attributes:
        data: The time series data
        time_step: Number of time steps in each sequence
        metric: The target metric column
        device: PyTorch device to place tensors on
    """

Module Docstrings

"""Utilities for processing time series data.

This module provides classes and functions for processing time series data
for LSTM model training, including normalization, sequence creation, and
DataLoader generation.
"""

File Layout

Follow this order for file contents:

  1. Module docstring

  2. Imports (grouped as described in Imports section)

  3. Constants

  4. Global variables

  5. Classes

  6. Functions

  7. Main execution block (if applicable)

Import Conventions

Organize imports in the following groups, separated by a blank line:

  1. Standard library imports

  2. Third-party imports

  3. Local application imports

Within each group, sort imports alphabetically.

# Standard library imports
import os
import sys
import time
from typing import Dict, List, Optional, Tuple

# Third-party imports
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from dataclay import Client, DataClayObject

# Local application imports
from icos_fl.models.lstm import LSTMModel
from icos_fl.utils.colors import BGRN, WHT, paint

Avoid wildcard imports:

# Good
from torch.nn import LSTM, Linear, MSELoss

# Bad
from torch.nn import *

Line Length

Maximum line length is 99 characters. For long lines:

  • Use parentheses for line continuation in expressions

  • Use backslashes only when necessary (prefer parentheses)

  • Break lines before operators

# Good
long_expression = (
    first_variable + second_variable + third_variable
    + fourth_variable + fifth_variable
)

# Also good
def long_function_name(
    arg1: str,
    arg2: int,
    arg3: Optional[Dict[str, Any]] = None,
) -> None:
    pass

Whitespace

Follow these whitespace conventions:

  • Use 4 spaces for indentation (no tabs)

  • Add two blank lines before top-level classes and functions

  • Add one blank line before method definitions

  • Add blank lines to separate logical sections of code

Comments

  • Use comments sparingly and only when needed

  • Prefer docstrings for public functions and classes

  • Use comments to explain complex logic or non-obvious decisions

  • Keep comments up-to-date with code changes

# Good: explains the purpose of complex code
# Use recursive approach for multi-step prediction to simulate real forecasting
for _ in range(forecast_steps):
    next_pred = model(curr_seq)
    predictions.append(next_pred.item())
    curr_seq = torch.cat((curr_seq[:, 1:], next_pred.unsqueeze(0).unsqueeze(0)), dim=1)

# Bad: states the obvious
# Increment the counter
counter += 1

Type Annotations

  • Use type annotations for all function parameters and return values

  • Use Optional for parameters that can be None

  • Use Union for parameters that can have multiple types

  • Use concrete types (e.g., List[int] instead of list)

  • Use TypeVar for generic types

def create_data_loaders(
    df: pd.DataFrame,
    time_step: Optional[int] = None,
    batch_size: Optional[int] = None,
    train_ratio: Optional[float] = None,
) -> Tuple[DataLoader, DataLoader, TimeSeriesDataset, TimeSeriesDataset]:
    """Create data loaders for training and validation."""
    # ...

Exception Handling

  • Be specific about exceptions you catch

  • Use multiple except blocks for different exceptions

  • Provide context in exception messages

  • Avoid bare except statements

# Good
try:
    data = self.time_series_data.get_dataframe()
except DataClayException as e:
    raise DataClayException(f"Failed to retrieve time series data: {e}") from e
except TimeoutError as e:
    raise TimeoutError(f"Timed out waiting for data: {e}") from e

# Bad
try:
    data = self.time_series_data.get_dataframe()
except Exception as e:
    raise Exception(f"Error: {e}")

String Formatting

  • Use f-strings for string interpolation

  • Use triple quotes for multi-line strings

  • Use r-strings for raw strings (especially regex)

# Good
message = f"Processing {len(df)} rows of data for metric: {metric}"

docstring = """
This is a multi-line docstring.
It explains the purpose of this function.
"""

pattern = r"^\d{4}-\d{2}-\d{2}$"  # YYYY-MM-DD

# Bad
message = "Processing " + str(len(df)) + " rows of data for metric: " + metric

Object-Oriented Programming

  • Follow the Single Responsibility Principle

  • Use composition over inheritance

  • Make attributes private when appropriate

  • Use properties for computed attributes

  • Initialize all instance variables in __init__

class Processor:
    def __init__(
        self,
        time_step: int,
        metric: str,
        batch_size: int = 64,
        train_ratio: float = 0.8,
    ) -> None:
        self.time_step = time_step
        self.metric = metric
        self.batch_size = batch_size
        self.train_ratio = train_ratio

    @property
    def sequence_length(self) -> int:
        """Return the length of sequences created by this processor."""
        return self.time_step

Performance Considerations

  • Use built-in functions and methods when possible

  • Use vectorized operations for NumPy and pandas

  • Avoid unnecessary computation in loops

  • Use generators for large data processing

  • Profile code before premature optimization

# Good: vectorized operations
normalized_data = (df - df.mean()) / df.std()

# Bad: inefficient loop
normalized_data = pd.DataFrame(index=df.index)
for col in df.columns:
    mean = df[col].mean()
    std = df[col].std()
    normalized_data[col] = [(x - mean) / std for x in df[col]]