- Published on
Why Backtesting is Not Enough
- Authors
- Name
- Tails Azimuth
Table of Contents
Why Backtesting is Not Enough
Overfitting can be especially problematic when we rely solely on backtesting for validation. To improve model performance and interpretation, one must look beyond simple backtesting and consider other analyses like feature importance.
The Importance of Features
Features are the variables or columns in our data that the machine learning algorithm uses for making predictions. Knowing which features are important can help in both understanding how the model is making predictions and in improving the model's performance. This brings us to the subject of feature importance methods.
Dealing with Substitution Effects
In machine learning, a "substitution effect" can dilute the importance of features that are interchangeable. This is similar to "multi-collinearity" in statistics. One way to handle this is to perform Principal Component Analysis (PCA) before feature significance analysis.
Methods of Feature Importance
Mean Decrease Impurity (MDI): This is mainly used in tree-based classifiers. It calculates how much each feature decreases impurity.
- Pros: Quick to compute, well-suited for tree-based classifiers.
- Cons: Susceptible to substitution effects, not generalizable to non-tree-based classifiers.
Mean Decrease Accuracy (MDA): This is a more universal method that can be applied to any classifier. It calculates how much the performance decreases when each feature is altered.
- Pros: Applicable to any classifier.
- Cons: Computationally expensive, susceptible to substitution effects.

MDI feature importance computed on a synthetic dataset
Both MDI and MDA feature importances are available in the RiskLabAI library, for both Python and Julia.

MDA feature importance computed on a synthetic dataset
Here are the function signatures for the MDA feature importance calculations.
Python | Julia |
---|---|
|
|
Understanding Feature Importance with SFI and Orthogonal Features
Single Feature Importance (SFI)
Single Feature Importance (SFI) evaluates the out-of-sample (OOS) performance score for each feature individually. It's useful for avoiding the substitution effects that might occur in other methods like MDI and MDA.
Python | Julia |
---|---|
|
|

Orthogonal Features
Orthogonal features can reduce the dimensionality of your feature set and help in mitigating the substitution effects. This method also provides a safeguard against overfitting.
Python | Julia |
---|---|
|
|
How to Verify Your Features?
Weighted Kendall's Tau: Use this measure to compare the ranking of feature importance against their associated eigenvalues. A value closer to 1 indicates a more consistent relationship.
Research Methodologies:
- Per-instrument Feature Importance: Parallelize feature importance computation for each financial instrument. Aggregate the results.
- Features Stacking: Combine multiple datasets into one, normalizing features as necessary. The classifier will then determine the most important features across all instruments.
References
- De Prado, M. L. (2018). Advances in financial machine learning. John Wiley & Sons.
- De Prado, M. M. L. (2020). Machine learning for asset managers. Cambridge University Press.