Table of Contents

Hyper-Parameter Tuning with Cross-Validation

Importance of Hyper-Parameter Tuning

Hyperparameter tuning is crucial for optimizing machine learning (ML) algorithms. Effective tuning results in improved real-world performance. Cross-validation (CV) plays a vital role in this, especially in the finance sector, where conventional approaches often fall short. This blog focuses on utilizing the Purged k-fold CV method for hyper-parameter optimization.

Purged-Kfold Integration into MLJBase

For hyperparameter tuning, grid search is often an initial step to understand the data's underlying structure. In MLJBase, GridSearchcV uses a CV generator, and to avoid overfitting, our PurgedKFold class can be passed as an argument.

Python Julia

Python	Julia
`# Python code for Purged-Kfold # will be provided here. # This is not available at the # time or writing this article`	`function Syntheticbacktesting( forecast ::Float64, halfLife ::Float64, σ ::Float64, maximumIteration =1e3 ::Int64, maximumHoldingPeriod = 100 ::Int64, profitTakingRange = LinRange(0.5,10,20) ::Array, stopLossRange = LinRange(0.5,10,20) ::Array , seed = 0 ::Float64 ) ::Matrix:`

# Python code for Purged-Kfold
# will be provided here.
# This is not available at the
# time or writing this article

function Syntheticbacktesting(
forecast ::Float64,
halfLife ::Float64,
σ ::Float64,
maximumIteration =1e3 ::Int64,
maximumHoldingPeriod = 100 ::Int64,
profitTakingRange = LinRange(0.5,10,20) ::Array,
stopLossRange = LinRange(0.5,10,20) ::Array ,
seed = 0 ::Float64
) ::Matrix:

View More: Python | Julia

Non-Negative Parameters

Non-negative hyperparameters are common in some ML algorithms, such as the SVC classifier and RBF kernel. Rather than using a uniform distribution for sampling, using a log-uniform distribution is often more effective for such parameters.

For a variable $x$ to have a log-uniform distribution between $a>0$ and $b>a$ , its CDF and PDF can be defined as:

F[x]=\left\{\begin{array}{cl} \frac{\log[x]-\log[a]}{\log[b]-\log[a]} & \text{for } a \leq x \leq b \\ 0 & \text{for } x<a \\ 1 & \text{for } x>b \end{array}\right.

f[x]=\left\{\begin{array}{cl} \frac{1}{x \log[b / a]} & \text{for } a \leq x \leq b \\ 0 & \text{for } x<a \\ 0 & \text{for } x>b \end{array}\right.

Limitations of Accuracy as a Measure

Accuracy alone doesn't provide a meaningful evaluation in finance-related ML, particularly in investment strategies. It fails to account for the probabilities associated with predictions. Cross-entropy loss, or $log loss$ , is a better performance metric as it incorporates prediction probabilities.

The formula for log loss is:

L[Y, P]=-\log[\text{Prob}[Y \mid P]]=-N^{-1} \sum_{n=0}^{N-1} \sum_{k=0}^{K-1} y_{n, k} \log[p_{n, k}]

Accuracy doesn't suffice for hyperparameter tuning in financial applications. It should ideally be supplemented or replaced with metrics that better capture the complexities of financial decision-making.

Note: All these functionalities are available in both Python and Julia in the RiskLabAI library. You can view more here for Python and here for Julia.

References

De Prado, M. L. (2018). Advances in financial machine learning. John Wiley & Sons.
De Prado, M. M. L. (2020). Machine learning for asset managers. Cambridge University Press.