Data Formats#
Market Data Files#
Market data is organized as one file per ticker. The filename must match the ticker symbol exactly:
AAPL.csvorAAPL.parquetSPY.csvorSPY.parquet
This applies to portfolio tickers and the benchmark ticker.
Required columns#
Each file must contain a date column and at least three price columns. The engine uses them for three distinct roles:
Role |
What the engine uses it for |
|---|---|
Date |
Trading dates for the time series |
Commission price |
Share count for commission fee calculation (typically unadjusted VWAP) |
Trade execution price |
Price at which shares are bought and sold (typically split/dividend-adjusted VWAP) |
Mark-to-market price |
Daily portfolio valuation between rebalancing dates (typically split/dividend-adjusted close) |
Column names are fully configurable — you map them to these roles in the configuration file. Additional columns in the file are ignored.
Note
All three price roles can point to the same column if your data only has one price. The three-role separation is a recommendation that reflects how institutional trading works, not a hard constraint.
Example CSV#
date,vwap,vwap_adjusted,close_adjusted
2020-01-02,74.06,74.06,73.87
2020-01-03,74.34,74.34,73.07
2020-01-06,73.19,73.19,73.75
In this example the mappings would be:
commission_price_column→vwaptrade_execution_price_column→vwap_adjustedmark_to_market_price_column→close_adjusteddate_column→date
Supported formats#
Format |
File naming |
Notes |
|---|---|---|
CSV |
|
Headers required; dates as |
Parquet |
|
Efficient for large datasets |
Portfolio Files#
Portfolio files define asset allocation weights at each rebalancing date. The engine auto-detects horizontal or vertical orientation based on the first column name.
Horizontal format (Ticker × Dates)#
First column must be named ``Ticker``. Subsequent columns are rebalancing dates; values are decimal weights.
Ticker | 2020-01-02 | 2022-01-03 | 2024-01-02
AAPL | 0.40 | 0.35 | 0.30
MSFT | 0.35 | 0.35 | 0.40
GOOGL | 0.25 | 0.30 | 0.30
Vertical format (Dates × Tickers)#
First column must be named ``date``. Subsequent columns are ticker symbols.
date | AAPL | MSFT | GOOGL
2020-01-02 | 0.40 | 0.35 | 0.25
2022-01-03 | 0.35 | 0.35 | 0.30
2024-01-02 | 0.30 | 0.40 | 0.30
Rules#
Weights are decimals and must sum to 1.0 or less per rebalancing date — the remainder is held as cash
Rebalancing dates must exist within the market data date range
No null values allowed
Supported file formats:
.csv,.xlsx