=========== Style Guide =========== This page describes the coding style and conventions used in ICOS-FL. Code Formatting --------------- ICOS-FL uses the following tools for code formatting: * **Black**: Code formatter * **isort**: Import sorter * **Ruff**: Linter * **mypy**: Type checker These tools are configured in ``pyproject.toml`` and enforced via pre-commit hooks. Black Configuration ~~~~~~~~~~~~~~~~~~~ .. code-block:: toml [tool.black] line-length = 99 target-version = ["py310", "py311", "py312"] isort Configuration ~~~~~~~~~~~~~~~~~~~ .. code-block:: toml [tool.isort] profile = "black" multi_line_output = 3 include_trailing_comma = true force_grid_wrap = 0 use_parentheses = true line_length = 99 known_first_party = ["icos_fl"] Ruff Configuration ~~~~~~~~~~~~~~~~~~ .. code-block:: toml [tool.ruff] target-version = "py310" output-format = "full" line-length = 99 fix = true [tool.ruff.lint] select = [ "E", "F", "W", # flake8 "C", # mccabe "I", # isort "N", # pep8-naming "D", # flake8-docstrings "ANN", # flake8-annotations # ... ] mypy Configuration ~~~~~~~~~~~~~~~~~~ .. code-block:: toml [tool.mypy] warn_return_any = true warn_unused_configs = true # ... Naming Conventions ------------------ Follow these naming conventions: Variables and Functions ~~~~~~~~~~~~~~~~~~~~~~~ * Use snake_case for variable and function names * Use descriptive names that indicate purpose * Avoid single-letter names except for loop indices .. code-block:: python # Good time_series_data = fetch_data() normalized_data = normalize_data(time_series_data) # Bad ts = fetch_data() nd = normalize_data(ts) Classes ~~~~~~~ * Use PascalCase for class names * Use noun phrases that describe what the class represents .. code-block:: python # Good class TimeSeriesDataset: pass class LSTMModel: pass # Bad class process_data: pass class data: pass Constants ~~~~~~~~~ * Use UPPER_CASE for constants * Constants should be defined at the module level .. code-block:: python # Good MAX_SEQUENCE_LENGTH = 100 DEFAULT_BATCH_SIZE = 64 # Bad maxSequenceLength = 100 default_batch_size = 64 Modules and Packages ~~~~~~~~~~~~~~~~~~~~ * Use lowercase for module and package names * Use short, descriptive names * Avoid underscores in module names .. code-block:: python # Good from icos_fl.utils.processor import Processor # Bad from icos_fl.Utils.data_processor import Processor Type Variables ~~~~~~~~~~~~~~ * Use PascalCase for type variables * Use single-letter names for simple type variables * Use descriptive names for complex type variables .. code-block:: python # Good T = TypeVar('T') TensorData = TypeVar('TensorData', bound=torch.Tensor) # Bad t = TypeVar('t') tensor_data = TypeVar('tensor_data', bound=torch.Tensor) Docstring Style --------------- ICOS-FL uses Google-style docstrings: .. code-block:: python def fetch_data(timeout: int = 60) -> pd.DataFrame: """Fetch time series data from DataClay. Args: timeout: Maximum time to wait for data in seconds Returns: DataFrame containing the processed time series data Raises: TimeoutError: If no data is available within the timeout period DataClayException: If there is an error connecting to DataClay Example: >>> fetcher = Fetcher() >>> df = fetcher.fetch_data(timeout=30) >>> print(df.shape) (300, 4) """ Class Docstrings ~~~~~~~~~~~~~~~~ .. code-block:: python class TimeSeriesDataset(Dataset): """Dataset for time series prediction with sliding window approach. Creates sequences of consecutive time steps as inputs and uses the next value as the prediction target. Attributes: data: The time series data time_step: Number of time steps in each sequence metric: The target metric column device: PyTorch device to place tensors on """ Module Docstrings ~~~~~~~~~~~~~~~~~ .. code-block:: python """Utilities for processing time series data. This module provides classes and functions for processing time series data for LSTM model training, including normalization, sequence creation, and DataLoader generation. """ File Layout ----------- Follow this order for file contents: 1. Module docstring 2. Imports (grouped as described in Imports section) 3. Constants 4. Global variables 5. Classes 6. Functions 7. Main execution block (if applicable) Import Conventions ------------------ Organize imports in the following groups, separated by a blank line: 1. Standard library imports 2. Third-party imports 3. Local application imports Within each group, sort imports alphabetically. .. code-block:: python # Standard library imports import os import sys import time from typing import Dict, List, Optional, Tuple # Third-party imports import numpy as np import pandas as pd import torch import torch.nn as nn from dataclay import Client, DataClayObject # Local application imports from icos_fl.models.lstm import LSTMModel from icos_fl.utils.colors import BGRN, WHT, paint Avoid wildcard imports: .. code-block:: python # Good from torch.nn import LSTM, Linear, MSELoss # Bad from torch.nn import * Line Length ----------- Maximum line length is 99 characters. For long lines: - Use parentheses for line continuation in expressions - Use backslashes only when necessary (prefer parentheses) - Break lines before operators .. code-block:: python # Good long_expression = ( first_variable + second_variable + third_variable + fourth_variable + fifth_variable ) # Also good def long_function_name( arg1: str, arg2: int, arg3: Optional[Dict[str, Any]] = None, ) -> None: pass Whitespace ---------- Follow these whitespace conventions: * Use 4 spaces for indentation (no tabs) * Add two blank lines before top-level classes and functions * Add one blank line before method definitions * Add blank lines to separate logical sections of code Comments -------- * Use comments sparingly and only when needed * Prefer docstrings for public functions and classes * Use comments to explain complex logic or non-obvious decisions * Keep comments up-to-date with code changes .. code-block:: python # Good: explains the purpose of complex code # Use recursive approach for multi-step prediction to simulate real forecasting for _ in range(forecast_steps): next_pred = model(curr_seq) predictions.append(next_pred.item()) curr_seq = torch.cat((curr_seq[:, 1:], next_pred.unsqueeze(0).unsqueeze(0)), dim=1) # Bad: states the obvious # Increment the counter counter += 1 Type Annotations ---------------- * Use type annotations for all function parameters and return values * Use Optional for parameters that can be None * Use Union for parameters that can have multiple types * Use concrete types (e.g., List[int] instead of list) * Use TypeVar for generic types .. code-block:: python def create_data_loaders( df: pd.DataFrame, time_step: Optional[int] = None, batch_size: Optional[int] = None, train_ratio: Optional[float] = None, ) -> Tuple[DataLoader, DataLoader, TimeSeriesDataset, TimeSeriesDataset]: """Create data loaders for training and validation.""" # ... Exception Handling ------------------ * Be specific about exceptions you catch * Use multiple except blocks for different exceptions * Provide context in exception messages * Avoid bare except statements .. code-block:: python # Good try: data = self.time_series_data.get_dataframe() except DataClayException as e: raise DataClayException(f"Failed to retrieve time series data: {e}") from e except TimeoutError as e: raise TimeoutError(f"Timed out waiting for data: {e}") from e # Bad try: data = self.time_series_data.get_dataframe() except Exception as e: raise Exception(f"Error: {e}") String Formatting ----------------- * Use f-strings for string interpolation * Use triple quotes for multi-line strings * Use r-strings for raw strings (especially regex) .. code-block:: python # Good message = f"Processing {len(df)} rows of data for metric: {metric}" docstring = """ This is a multi-line docstring. It explains the purpose of this function. """ pattern = r"^\d{4}-\d{2}-\d{2}$" # YYYY-MM-DD # Bad message = "Processing " + str(len(df)) + " rows of data for metric: " + metric Object-Oriented Programming --------------------------- * Follow the Single Responsibility Principle * Use composition over inheritance * Make attributes private when appropriate * Use properties for computed attributes * Initialize all instance variables in __init__ .. code-block:: python class Processor: def __init__( self, time_step: int, metric: str, batch_size: int = 64, train_ratio: float = 0.8, ) -> None: self.time_step = time_step self.metric = metric self.batch_size = batch_size self.train_ratio = train_ratio @property def sequence_length(self) -> int: """Return the length of sequences created by this processor.""" return self.time_step Performance Considerations -------------------------- * Use built-in functions and methods when possible * Use vectorized operations for NumPy and pandas * Avoid unnecessary computation in loops * Use generators for large data processing * Profile code before premature optimization .. code-block:: python # Good: vectorized operations normalized_data = (df - df.mean()) / df.std() # Bad: inefficient loop normalized_data = pd.DataFrame(index=df.index) for col in df.columns: mean = df[col].mean() std = df[col].std() normalized_data[col] = [(x - mean) / std for x in df[col]]