Evaluation - K-Fold Cross Validation¶

K-Fold is used to estimate how well the model generalizes before deployment. It repeatedly splits the dataset into training and validation sets. Instead of doing one train/test split, the dataset is split in equal parts(folds) and the model is trained K times using different train/validation splits.

Test dataset must not be used in Cross Validation to avoid Data Leakage
For classification problems, specially imbalanced datasets use Stratified K-Fold
In large-scale ML pipelines, K-Fold is often too expensive.

flowchart LR
    Data([Data]) e1@ --> Prep[Preprocessing]
    Prep e2@ --> Train[Training]
    Train e3@ --> CV[Cross-Validation]
    CV e4@ --> Tuning[Tuning]
    Tuning e5@ --> Final[Final Evaluation]
    Final e6@ --> Deploy[[Deployment]]

    %% Animation
    e1@{ animate: true }
    e2@{ animate: true }
    e3@{ animate: true }
    e4@{ animate: true }
    e5@{ animate: true }
    e6@{ animate: true }

It helps detect:

• overfitting • data leakage • unstable models

Example¶

Training:

Fold	Train On	Validate On
1	folds 2-5	fold 1
2	folds 1,3-5	fold 2
3	folds 1-2,4-5	fold 3
4	folds 1-3,5	fold 4
5	folds 1-4	fold 5

Evaluation:

Fold	Accuracy
1	0.82
2	0.79
3	0.84
4	0.80
5	0.83

Computation:

mean accuracy = 0.816
std = 0.018
# The model will likely achieve ~81.6% accuracy on new data.

The mean metric is your best estimate of how the model will perform on unseen data and std tells you model stability. A low std means that model perform consistently cross folds, so is desired to achieve a high mean with low variance.