Published on

Backtest Statistics Categories

Authors
Table of Contents

Backtest statistics are essential for evaluating the efficacy of investment strategies. These metrics fall into different categories:

  • General Features: Includes metrics like Time range, Average AUM, Capacity, and Leverage.
  • Performance Metrics: Such as PnL, annualized rate of return, hit ratio, etc.
PythonJulia
def bet_timing(
    target_positions: pd.Series
):

function betTiming(
    targetPositions::TimeArray
)
end

View More: Python | Julia

Time-Weighted Rate of Return (TWRR)

TWRR is a method for calculating returns that adjusts for external cash flows. The formula is complex but can be summarized with:

  • ri,tr_{i, t}: TWRR for portfolio ii between time [t1,t][t-1, t].

  • πi,t\pi_{i, t}: Mark-to-market profit or loss for portfolio ii at time tt.

  • Ki,tK_{i, t}: Market value of assets managed by portfolio ii over sub-period tt.

ri,t=πi,tKi,tr_{i, t} =\frac{\pi_{i, t}}{K_{i, t}}
PythonJulia
def holding_period(
    target_positions: pd.Series
):

function holdingPeriod(
    targetPositions::TimeArray
)
end

View More: Python | Julia

Performance Statistics

Performance statistics that are not risk-adjusted include:

  • PnL: Total dollars earned.

  • PnL from Long Positions: Earnings from only long holdings.

  • Annualized Rate of Return: Includes all forms of earnings and expenses.

  • Hit Ratio: Percentage of profitable bets.

Runs and Risk Metrics in Investment Strategies

Investment strategies often contain series of returns, known as "runs," that can be either positive or negative. Understanding the concentration of these runs and their impact on risk factors like drawdowns and time under water is essential for assessing a strategy's viability.

Returns Concentration

Consider a time series of bet returns, rtr_t, with a length TT. We can split these returns into positive and negative subsets, r+r^+ and rr^-. Two weight series, w+w^+ and ww^-, can be defined as:

w+=r+r+andw=rrw^+ = \frac{r^+}{\sum r^+} \quad \text{and} \quad w^- = \frac{r^-}{\sum r^-}

We define the Herfindahl-Hirschman Index (HHI)-based concentration of positive returns (h+h^+) and negative returns (hh^-) as:

h+=(w+)21/w+11/w+h^+ = \frac{\sum (w^+)^2 - 1/\|w^+\|}{1 - 1/\|w^+\|}
h=(w)21/w11/wh^- = \frac{\sum (w^-)^2 - 1/\|w^-\|}{1 - 1/\|w^-\|}

Desirable strategy characteristics include:

  • High Sharpe ratio
  • Many bets per year
  • High hit ratio (low ww^-)
  • Low h+h^+
  • Low hh^-

HHI Concentration Functions

PythonJulia
def hhi_concentration(
    returns: pd.Series
):
function HHIConcentration(
    returns::TimeArray
)

View More: Python | Julia

These functionalities are available in both Python and Julia in the RiskLabAI library.

Drawdown and Time Under Water

Drawdown (DD) is the most significant loss between two high watermarks (HWMs), while Time under Water (TuW) is the duration taken to surpass a previous HWM.

DD and TuW Functions

PythonJulia
def compute_drawdowns_time_under_water(
    series: pd.Series,
    dollars: bool = False
):
function computeDrawDownsTimeUnderWater(
    series::TimeArray,
    dollars::Bool=false
)

View More: Python | Julia

These functionalities are available in both Python and Julia in the RiskLabAI library.

Runs Statistics for Performance Evaluation

Key Metrics:

  • HHI index for both positive and negative returns.
  • Time between bets measured by HHI index.
  • 95th percentile of Drawdown (DD) and Time under Water (TuW).

These metrics are useful to understand the concentration of portfolio returns and the risk involved.

Code for Calculating Runs Statistics

PythonJulia
def getHHI(ret: pd.Series) -> float:
    ...
function getHHI(ret::DataFrame)::Float64
    ...

View More: Python | Julia

Implementation Failure Metrics

Key Metrics to prevent investment plans from failing:

  • Broker fees per turnover
  • Average slippage per turnover
  • Dollar performance per turnover
  • Return on execution costs

These metrics help you understand how your portfolio could be affected by hidden costs.

Efficiency Metrics

Sharpe Ratio (SR)

This ratio measures performance by dividing the average returns by the standard deviation of returns.

SR=μσ\text{SR} = \frac{\mu}{\sigma}

Probabilistic Sharpe Ratio (PSR)

This metric adjusts the Sharpe ratio to account for data distortions like skewness and kurtosis.

PSR^[SR]=Z[(SR^SR)T11γ^3SR^+γ^414SR^2]\widehat{PSR}[SR^{*}] = Z\left[\frac{(\widehat{SR}-SR^{*})\sqrt{T-1}}{\sqrt{1-\hat{\gamma}_{3}\widehat{SR}+\frac{\hat{\gamma}_{4}-1}{4}\widehat{SR}^{2}}}\right]

Deflated Sharpe Ratio (DSR)

This is an extension of PSR, which accounts for the number of trials performed to obtain the Sharpe ratio.

SR=V[{SR^n}]((1γ)Z1[11N]+γZ1[11Ne1])SR^{*} = \sqrt{V[\{\widehat{SR}_{n}\}]}\left((1-\gamma)Z^{-1}[1-\frac{1}{N}]+\gamma Z^{-1}[1-\frac{1}{N}e^{-1}]\right)

Other Efficiency Metrics

  • Annualized Sharpe Ratio
  • Information Ratio
  • Probabilistic Sharpe Ratio (PSR)
  • Deflated Sharpe Ratio (DSR)

Classification Scores

Metrics for evaluating the performance of machine learning algorithms in trading strategies include:

  • Accuracy:

    Accuracy=TP+TNTP+TN+FP+FN\text{Accuracy} = \frac{TP+TN}{TP+TN+FP+FN}
  • Precision:

    Precision=TPTP+FP\text{Precision} = \frac{TP}{TP+FP}
  • Recall:

    Recall=TPTP+FN\text{Recall} = \frac{TP}{TP+FN}
  • F1 Score:

    F1=2Precision×RecallPrecision+RecallF1 = 2\frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}

These metrics help you gauge how accurately your machine learning model is performing in real trading scenarios.

PythonJulia
def calculate_metrics(TP: int, TN: int, FP: int, FN: int) -> Tuple[float, float, float, float]:
    ...
function calculate_metrics(TP::Int, TN::Int, FP::Int, FN::Int)::Tuple{Float64, Float64, Float64, Float64}
    ...

References

  1. De Prado, M. L. (2018). Advances in financial machine learning. John Wiley & Sons.
  2. De Prado, M. M. L. (2020). Machine learning for asset managers. Cambridge University Press.