The toolbox has an extra procedure to facilitate cross-validation.
In cross-validation a dataset is split into
batches. From these
batches
are used to train a classifier, and the left-out batch is
used to evaluate it. This is repeated
times, and the performances
are averaged. The advantage is that given a limited training set, it is
still possible to obtain a relatively good classifier, and estimate its
performance on an independent set.
In practice, this cross-validation procedure is applied over and over again. Not only to evaluate and compare the performance of classifiers, but also to optimize hyperparameters. To keep the procedure as flexible as possible, the cross-validation is kept as simple as possible. An index vector is generated that indicates to with batch each object in a training set belongs. By repeatedly applying the procedure, the different batches are combined in a training and evaluation set. The following piece of code shows how this is done in practice:
a = oc_set(gendatb,1); % make or get some data
nrbags = 10; % we are doing 10-fold crossvalidation
I = nrbags; % initialization
% now start the 10 folds:
for i=1:nrbags
% extract the training (x) and validation (z) sets, and
% update the index vector I:
[x,z,I] = dd_crossval(a,I);
% do something useful with the training and evaluation:
w = gauss_dd(x,0.1);
e(i) = dd_auc(z*w*dd_roc);
end
fprintf('AUC (10-fold) %5.3 (%5.3)',mean(e),std(e));
Note that the procedure takes class priors into account. It tries to retain the number of objects per class in each fold according to the total dataset.