.. _data_formats: Data Formats ============ Market Data Files ----------------- Market data is organized as **one file per ticker**. The filename must match the ticker symbol exactly: - ``AAPL.csv`` or ``AAPL.parquet`` - ``SPY.csv`` or ``SPY.parquet`` This applies to portfolio tickers **and** the benchmark ticker. Required columns ~~~~~~~~~~~~~~~~ Each file must contain a date column and at least three price columns. The engine uses them for three distinct roles: .. list-table:: :widths: 25 75 :header-rows: 1 * - Role - What the engine uses it for * - **Date** - Trading dates for the time series * - **Commission price** - Share count for commission fee calculation (typically unadjusted VWAP) * - **Trade execution price** - Price at which shares are bought and sold (typically split/dividend-adjusted VWAP) * - **Mark-to-market price** - Daily portfolio valuation between rebalancing dates (typically split/dividend-adjusted close) Column names are fully configurable — you map them to these roles in the configuration file. Additional columns in the file are ignored. .. note:: All three price roles can point to the same column if your data only has one price. The three-role separation is a recommendation that reflects how institutional trading works, not a hard constraint. Example CSV ~~~~~~~~~~~ .. code-block:: text date,vwap,vwap_adjusted,close_adjusted 2020-01-02,74.06,74.06,73.87 2020-01-03,74.34,74.34,73.07 2020-01-06,73.19,73.19,73.75 In this example the mappings would be: - ``commission_price_column`` → ``vwap`` - ``trade_execution_price_column`` → ``vwap_adjusted`` - ``mark_to_market_price_column`` → ``close_adjusted`` - ``date_column`` → ``date`` Supported formats ~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 20 30 50 :header-rows: 1 * - Format - File naming - Notes * - **CSV** - ``{TICKER}.csv`` - Headers required; dates as ``YYYY-MM-DD`` * - **Parquet** - ``{TICKER}.parquet`` - Efficient for large datasets Portfolio Files --------------- Portfolio files define asset allocation weights at each rebalancing date. The engine auto-detects horizontal or vertical orientation based on the first column name. Horizontal format (Ticker × Dates) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First column must be named **``Ticker``**. Subsequent columns are rebalancing dates; values are decimal weights. .. code-block:: text Ticker | 2020-01-02 | 2022-01-03 | 2024-01-02 AAPL | 0.40 | 0.35 | 0.30 MSFT | 0.35 | 0.35 | 0.40 GOOGL | 0.25 | 0.30 | 0.30 Vertical format (Dates × Tickers) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First column must be named **``date``**. Subsequent columns are ticker symbols. .. code-block:: text date | AAPL | MSFT | GOOGL 2020-01-02 | 0.40 | 0.35 | 0.25 2022-01-03 | 0.35 | 0.35 | 0.30 2024-01-02 | 0.30 | 0.40 | 0.30 Rules ~~~~~ - Weights are decimals and must sum to **1.0 or less** per rebalancing date — the remainder is held as cash - Rebalancing dates must exist within the market data date range - No null values allowed - Supported file formats: ``.csv``, ``.xlsx``