Metrics Module#

Financial metrics and statistical functions for portfolio analysis.

Provide functions for computing risk/return metrics (Sharpe, Sortino, VaR, CVaR), annual return analysis, drawdown calculations, and PyArrow-based return computations. All functions are designed to work with pandas Series/DataFrames or PyArrow arrays.

active_returns_plot(portfolio_value: Series, benchmark_value: Series, benchmark_name: str)#

Plot the yearly active return (portfolio minus benchmark).

The function resamples both the portfolio and the benchmark series to year-end values, computes annual percentage returns, obtains the active return (portfolio - benchmark), and renders a bar chart by year.

Parameters:
  • portfolio_value (pd.Series) – Time series of portfolio value indexed by date (any frequency).

  • benchmark_value (pd.Series) – Time series of benchmark value indexed by date (any frequency).

  • benchmark_name (str) – Display name for the benchmark in the chart title.

Returns:

A Matplotlib figure containing the bar chart of active returns.

Return type:

matplotlib.figure.Figure

Notes

  • Indexes are converted to DatetimeIndex and resampled to year-end ('YE').

  • Returns are simple percentage changes year-over-year.

analyze_annual_returns(df: DataFrame, portfolio_column: str, strategy_name: str, method: Literal['simple', 'log'] = 'simple', min_observations: int = 2) DataFrame#

Analyze annual returns of a portfolio from a time-indexed DataFrame.

Calculates the annual return based on the first and last available portfolio value of each year, returning the start/end dates and returns for each year.

Parameters:
  • df (pd.DataFrame) – DataFrame with a DatetimeIndex containing portfolio values.

  • portfolio_column (str) – Name of the column containing the portfolio values.

  • strategy_name (str) – Label used to name output columns (e.g., strategy or benchmark name).

  • method ({'simple', 'log'}, default 'simple') – Method to compute returns: - ‘simple’: (final - initial) / initial - ‘log’: log(final / initial) (requires strictly positive values)

  • min_observations (int, default 2) – Minimum number of valid observations required per year to calculate returns.

Returns:

DataFrame with columns: - ‘year’ : int - ‘start_date’ : str (YYYY-MM-DD) - ‘end_date’ : str (YYYY-MM-DD) - f’{strategy_name}_return’ : float - f’{strategy_name}_start_value’ : float - f’{strategy_name}_end_value’ : float

Return type:

pd.DataFrame

Raises:
  • TypeError – If the index is not a DatetimeIndex.

  • ValueError – If the DataFrame is empty, if the specified column doesn’t exist, or if invalid method is provided.

Notes

  • Returns use the first and last valid value per year (NaNs are skipped).

  • The output return is not annualized—it’s the raw year-over-year return.

  • Years with fewer than min_observations are excluded from results.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'portfolio': [100, 105, 110, 115, 120]
... }, index=pd.date_range('2023-01-01', periods=5, freq='M'))
>>> result = analyze_annual_returns(df, 'portfolio', 'MyStrategy')
>>> print(result)
annual_returns_plot(portfolio_value: Series, benchmark_value: Series, portfolio_name: str, benchmark_name: str)#

Plot annual returns for portfolio and benchmark, side by side.

The function resamples both series to year-end values, computes year-over-year returns, and renders a grouped bar chart for portfolio vs. benchmark.

Parameters:
  • portfolio_value (pd.Series) – Time series of portfolio value indexed by date (any frequency).

  • benchmark_value (pd.Series) – Time series of benchmark value indexed by date (any frequency).

  • portfolio_name (str) – Label used in the legend for the portfolio.

  • benchmark_name (str) – Label used in the legend for the benchmark.

Returns:

A Matplotlib figure containing the grouped bar chart.

Return type:

matplotlib.figure.Figure

annualize_rets(r: Series, periods_per_year: int = 252)#

Annualize a compounded return series.

Parameters:
  • r (pd.Series) – Series of simple periodic returns (e.g., daily).

  • periods_per_year (int, default 252) – Number of observations per year (e.g., 252 for daily).

Returns:

Annualized return computed as: (1 + r).prod() ** (periods_per_year / n_periods) - 1.

Return type:

float

annualize_vol(r: Series, periods_per_year: int = 252) float#

Annualize the volatility of a return series.

Parameters:
  • r (pd.Series) – Series of simple periodic returns (e.g., daily).

  • periods_per_year (int, default 252) – Number of observations per year (e.g., 252 for daily).

Returns:

Annualized volatility computed as r.std() * sqrt(periods_per_year).

Return type:

float

calculate_annualized_return(initial_value: float, final_value: float, years: float) float#

Compute CAGR (compound annual growth rate).

Parameters:
  • initial_value (float) – Initial capital or portfolio value.

  • final_value (float) – Final portfolio value.

  • years (float) – Number of years (fraction allowed).

Returns:

Annualized return as a decimal (e.g., 0.10 == 10%).

Return type:

float

Raises:

ValueError – If initial_value <= 0 or years <= 0.

calculate_daily_returns(price_array: Array) Array#

Compute daily returns using percentage change logic for a PyArrow array.

Returns are computed as (P_t - P_{t-1}) / P_{t-1} using PyArrow compute functions, with a Python fallback if an exception occurs.

Parameters:

price_array (pa.Array) – PyArrow array of price values in chronological order.

Returns:

Array of daily returns with length len(price_array) - 1 (empty if fewer than 2 observations).

Return type:

pa.Array

Notes

calculate_pct_change(price_array: Array) Array#

Compute percentage change (returns) for a price array.

Equivalent to pandas.Series.pct_change().iloc[1:]: returns length is len(price_array) - 1.

Parameters:

price_array (pa.Array) – Array of prices (e.g., decimal128, float64).

Returns:

Array of returns with one fewer element than the input.

Return type:

pa.Array

Notes

  • Uses PyArrow compute kernels; on error, falls back to Python lists.

calculate_return(start_value: float, end_value: float, method: str) float#

Calculate return based on the specified method.

Parameters:
  • start_value (float) – Initial portfolio value.

  • end_value (float) – Final portfolio value.

  • method (str) – ‘simple’ or ‘log’.

Returns:

Calculated return or np.nan if calculation is invalid.

Return type:

float

calculate_returns_table(price_table: Table, date_column_name: str) Table#

Calculate percentage returns for all ticker columns in a price table.

Parameters:
  • price_table (pa.Table) – Table with a date column and one column per ticker (prices).

  • date_column_name (str) – Name of the date column. If missing, the first column is assumed to be the date column.

Returns:

Table with the date column (first row removed to match returns length) and one return column per ticker.

Return type:

pa.Table

Notes

  • For each ticker, returns are computed via calculate_pct_change().

  • The date column is aligned by removing its first row.

calculate_yearly_returns(series: Series, strategy_name: str, method: str, min_observations: int, *, annualize: bool = True, trading_days_per_year: int = 252) list[dict]#

Calculate returns for each year in the series.

Parameters:
  • series (pd.Series) – Time series of portfolio values with DatetimeIndex.

  • strategy_name (str) – Name for the strategy columns.

  • method (str) – Return calculation method (‘simple’ or ‘log’).

  • min_observations (int) – Minimum observations required per year.

  • annualize (bool, default True) – If True, returns are annualized (CAGR) based on business days. If False, returns are raw period returns.

  • trading_days_per_year (int, default 252) – Number of trading days per year for annualization.

Returns:

List of dictionaries containing yearly return data.

Return type:

list[dict]

compound(r: Series)#

Compound a series of periodic returns.

Parameters:

r (pd.Series) – Series of simple periodic returns.

Returns:

Compounded total return computed as expm1(log1p(r).sum()).

Return type:

float

compute_portfolio_statistics(initial_capital: float, initial_portfolio_value: float, final_portfolio_value: float, returns: Series, total_portfolio_series: Series, years: float, risk_free_rate: float = 0.0, var_level: float | int = 5.0, periods_per_year: int = 252, additional_metrics: dict[str, float] | None = None) dict[str, float]#

Compute key performance statistics for a backtested portfolio.

Returns raw numerical values for programmatic use and Excel formatting.

Parameters:
  • initial_capital (float) – Starting capital allocated to the portfolio.

  • initial_portfolio_value (float) – Initial portfolio value.

  • final_portfolio_value (float) – Final portfolio value at the end of the backtest.

  • returns (pd.Series) – Series of portfolio periodic returns (e.g., daily).

  • total_portfolio_series (pd.Series) – Time series of total portfolio value.

  • years (float) – Length of the backtest in years (fraction allowed).

  • risk_free_rate (float, default 0.0) – Annual risk-free rate used in Sharpe/Sortino calculations.

  • var_level (float or int, default 5.0) – Tail probability (in percent) for VaR/CVaR (e.g., 5 means 5%).

  • periods_per_year (int, default 252) – Number of observations per year (e.g., 252 for daily).

  • additional_metrics (dict[str, float] or None, optional) – Dictionary of extra metrics to include in the output.

Returns:

Dictionary containing performance, risk, and distribution metrics as raw numerical values (not formatted strings).

Return type:

dict[str, float]

Notes

  • Returns raw float values for all metrics

  • VaR/CVaR keys use the format: “Modified VaR (5%)” and “Historic CVaR (5%)”

  • All ratio names include “Portfolio” prefix for consistency

create_empty_result_dataframe(strategy_name: str) DataFrame#

Create an empty DataFrame with the correct annual-returns column structure.

Parameters:

strategy_name (str) – Strategy label used as column-name prefix.

Returns:

Empty DataFrame with columns: ‘year’, ‘start_date’,

’end_date’, ‘{strategy_name}_return’, ‘{strategy_name}_start_value’, ‘{strategy_name}_end_value’.

Return type:

pd.DataFrame

cumprod_manual_fallback(returns_array: Array) Array#

Manual cumulative product for (1 + returns) preserving decimal precision.

Parameters:

returns_array (pa.Array) – Array of returns (decimal or numeric).

Returns:

Array of cumulative products with preserved precision.

Return type:

pa.Array

Notes

  • Required for decimal types because PyArrow lacks cumulative_prod support for decimal128.

cvar_historic(r: Series, level: int = 5)#

Historical Conditional Value-at-Risk (CVaR) at a given tail level.

Parameters:
  • r (pd.Series or pd.DataFrame) – Returns series or DataFrame. For DataFrames, CVaR is aggregated column-wise.

  • level (int, default 5) – Tail probability in percent (e.g., 5 indicates the 5% left tail).

Returns:

CVaR estimate (positive number) for a Series, or a Series of CVaRs when r is a DataFrame.

Return type:

float or pd.Series

Notes

  • Uses historical returns below (or equal to) the historical VaR threshold.

  • Returns NaN if insufficient data.

decimal_cumprod_pyarrow(returns_array: Array) Array#

Cumulative product of (1 + returns) with decimal-aware fallback.

Parameters:

returns_array (pa.Array) – PyArrow array containing returns (decimal or numeric).

Returns:

Cumulative product of (1 + returns) with preserved precision.

Return type:

pa.Array

Notes

downside_deviation(r: Series, target_return: float = 0.0, periods_per_year: int = 252) float#

Annualized downside deviation relative to a target return.

Parameters:
  • r (pd.Series) – Return series.

  • target_return (float, default 0.0) – Per-period target/mar (minimum acceptable return).

  • periods_per_year (int, default 252) – Number of periods per year.

Returns:

Annualized downside deviation.

Return type:

float

drawdown(return_series: Series) DataFrame#

Compute wealth index, previous peaks, and drawdown from returns.

Parameters:

return_series (pd.Series) – Series of periodic returns.

Returns:

DataFrame with columns: - ‘Wealth’ : wealth index starting at 1000 - ‘Previous Peak’ : running maximum of the wealth index - ‘Drawdown’ : (wealth - peak) / peak

Return type:

pd.DataFrame

ensure_datetime_index(df: Series) Series#

Ensure a Series is indexed by a DatetimeIndex.

Parameters:

df (pd.Series) – Series with any index type.

Returns:

Same series with index converted to DatetimeIndex.

Return type:

pd.Series

format_results_dataframe(results: list[dict], strategy_name: str) DataFrame#

Format the results list into a properly typed DataFrame.

Parameters:
  • results (list[dict]) – Raw results from yearly calculations.

  • strategy_name (str) – Name for the strategy columns.

Returns:

Formatted DataFrame with proper types and rounded values.

Return type:

pd.DataFrame

index_price_construction(daily_portfolio_returns: Series, base: int = 1000)#

Construct an index-like price series from returns and a base value.

Parameters:
  • daily_portfolio_returns (pd.Series) – Series of periodic returns (e.g., daily).

  • base (int, default 1000) – Starting index value.

Returns:

Index-like price series: base * (1 + r).cumprod() (sorted by index).

Return type:

pd.Series

kurtosis(r: Series) Series#

Kurtosis of a return series (population definition).

Parameters:

r (pd.Series) – Return series.

Returns:

Kurtosis computed as E[(X - mu)^4] / sigma^4 (not excess).

Return type:

float

manual_returns_calculation(current_prices: Array, previous_prices: Array) Array#

Manual fallback for returns: (P_t - P_{t-1}) / P_{t-1}.

Parameters:
  • current_prices (pa.Array) – Prices at time t.

  • previous_prices (pa.Array) – Prices at time t-1.

Returns:

Returns array; None where inputs are null or previous price is zero.

Return type:

pa.Array

merge_strategy_benchmark_returns(strategy_returns: DataFrame, benchmark_returns: DataFrame, strategy_name: str, benchmark_name: str) DataFrame#

Merge strategy and benchmark annual returns into a single DataFrame.

This helper function properly merges the output from two analyze_annual_returns calls, handling the date columns correctly. Validates that the columns match the expected naming convention based on strategy and benchmark names.

Parameters:
  • strategy_returns (pd.DataFrame) – Annual returns DataFrame for the strategy.

  • benchmark_returns (pd.DataFrame) – Annual returns DataFrame for the benchmark.

  • strategy_name (str) – Name of the strategy (must match column prefix in strategy_returns).

  • benchmark_name (str) – Name of the benchmark (must match column prefix in benchmark_returns).

Returns:

Merged DataFrame with columns from both strategy and benchmark. Keeps dates from the strategy DataFrame.

Return type:

pd.DataFrame

Raises:

ValueError – If expected columns based on strategy_name or benchmark_name are not found.

Examples

>>> strategy_df = analyze_annual_returns(df1, 'portfolio', 'MyStrategy')
>>> benchmark_df = analyze_annual_returns(df2, 'bench_value', 'SP500')
>>> merged = merge_strategy_benchmark_returns(
...     strategy_df, benchmark_df, 'MyStrategy', 'SP500'
... )
multiply_by_capital_manual(cumprod_values: Array, initial_capital_decimal: float | int) Array#

Multiply cumulative values by initial capital using Decimal precision.

Parameters:
  • cumprod_values (pa.Array) – Array of cumulative product values (e.g., from (1 + r).cumprod()).

  • initial_capital_decimal (float or int) – Initial capital to multiply by.

Returns:

Array of cumulative values scaled by initial capital, preserving precision.

Return type:

pa.Array

Notes

  • Converts inputs to decimal.Decimal to avoid floating error accumulation.

  • Null inputs yield null outputs at the same positions.

pct_change_pyarrow(column: str, output_column_name: str, table: Table, periods: int = 1) Table#

Append a percentage-change column to a PyArrow table.

Parameters:
  • column (str) – Name of the input price column.

  • output_column_name (str) – Name of the output percentage-change column to append.

  • table (pa.Table) – Input table containing the column.

  • periods (int, default 1) – Lag to compute percent change (e.g., 1 for 1-period change).

Returns:

Table with the new percentage-change column appended.

Return type:

pa.Table

Notes

  • If the column is a ChunkedArray, it is combined to a flat Array.

  • If there are not enough rows, the appended column will be all nulls.

sharpe_ratio(r: Series, riskfree_rate: float, periods_per_year: int = 252) float#

Annualized Sharpe ratio of a return series.

Parameters:
  • r (pd.Series) – Series of simple periodic returns.

  • riskfree_rate (float) – Annual risk-free rate.

  • periods_per_year (int, default 252) – Number of observations per year.

Returns:

Annualized Sharpe ratio computed from annualized excess return divided by annualized volatility.

Return type:

float

skewness(r: Series) Series#

Skewness of a return series (population definition).

Parameters:

r (pd.Series) – Return series.

Returns:

Skewness computed as E[(X - mu)^3] / sigma^3.

Return type:

float

sortino_ratio(r: Series, riskfree_rate: float = 0.0, target_return: float | None = None, periods_per_year: int = 252) float#

Annualized Sortino ratio of a return series.

Parameters:
  • r (pd.Series) – Series of periodic returns (simple).

  • riskfree_rate (float, default 0.0) – Annual risk-free rate.

  • target_return (float or None, optional) – Per-period target return. If None, uses per-period risk-free rate.

  • periods_per_year (int, default 252) – Number of observations per year.

Returns:

Annualized Sortino ratio, or NaN on insufficient data.

Return type:

float

Notes

  • Downside deviation uses only negative excess returns relative to the target.

  • Uses arithmetic mean for annualized excess return with compounding adjustment.

sortino_ratio_simple(r: Series, riskfree_rate: float = 0.0, periods_per_year: int = 252, *, use_log_returns: bool = False) float#

Simplified Sortino ratio (annualized).

Parameters:
  • r (pd.Series) – Series of periodic returns.

  • riskfree_rate (float, default 0.0) – Annual risk-free rate.

  • periods_per_year (int, default 252) – Number of observations per year.

  • use_log_returns (bool, default False) – If True, convert to log returns prior to computation.

Returns:

Annualized Sortino ratio (np.inf if positive excess and no downside; NaN if insufficient data).

Return type:

float

Notes

  • Requires at least 30 observations.

  • Downside deviation is computed using returns below the risk-free rate (per period).

validate_inputs(df: DataFrame, portfolio_column: str, method: str) None#

Validate input parameters for annual-return analysis.

Parameters:
  • df (pd.DataFrame) – DataFrame that must be non-empty with a DatetimeIndex.

  • portfolio_column (str) – Column name that must exist in df.

  • method (str) – Return calculation method; must be 'simple' or 'log'.

Raises:
  • ValueError – If df is empty, portfolio_column is missing, or method is not one of the accepted values.

  • TypeError – If the index of df is not a DatetimeIndex.

var_gaussian(r: Series, level: int, *, modified: bool)#

Gaussian (parametric) Value-at-Risk with optional Cornish Fisher adjustment.

Parameters:
  • r (pd.Series) – Returns series.

  • level (int) – Tail probability in percent (0 < level < 100).

  • modified (bool) – If True, applies Cornish Fisher expansion using sample skewness and excess kurtosis to adjust the z-score.

Returns:

Parametric VaR estimate (positive number), or NaN if insufficient/invalid data.

Return type:

float

Notes

  • Uses population standard deviation (ddof=0).

  • When modified=True, the z-score is adjusted via Cornish Fisher: z_cf = z + (z^2 - 1)s/6 + (z^3 - 3z)k/24 - (2z^3 - 5z)s^2/36, where s is skewness and k is excess kurtosis.

var_historic(r: Series, level: int = 5)#

Historical Value-at-Risk at a given tail probability.

Parameters:
  • r (pd.Series or pd.DataFrame) – Returns series or DataFrame. For DataFrames, VaR is aggregated column-wise.

  • level (int, default 5) – Tail probability in percent (e.g., 5 indicates the 5% left tail).

Returns:

VaR estimate (positive number) for a Series, or a Series of VaRs when r is a DataFrame.

Return type:

float or pd.Series

Notes

  • Returns NaN if data are insufficient or constant.

wealth_index(returns: Series, initial_capital: int)#

Compute a wealth index from returns and an initial capital.

Parameters:
  • returns (pd.Series) – Series of periodic returns.

  • initial_capital (int) – Starting capital.

Returns:

Wealth index series computed as initial_capital * (1 + returns).cumprod().

Return type:

pd.Series