In this chapter I collected some remarks which are important to see
once, but did not fit in the line of the previous chapters.
- If you want to know more about a classifier or
function, always try the help command.
- Also have a look at the file Contents.m. This contains
the full list of functions and classifiers defined in the toolbox.
- In older versions of the toolbox, the width parameter
in the support vector data description was optimized
automatically. This was done such that a prespecified fraction
of the objects was on the boundary (so these are
support vectors with
). Another parameter
was
set such that another prespecified fraction
of objects was
outside the boundary (the support vectors with
). The default of this fraction was
and was often ignored in practical experiments. But this lead
sometimes to poor results, and created a lot of confusion. If you really
want to, and if you're lucky that I included it, it is still available
under newsvdd.m.
I decided to consider the parameter
as a hyper-parameter.
This parameter will not be optimized automatically, but has to be
set by the user. To obtain the prespecified error
on
the target set, the parameter
will be set. The parameter
is removed.
Another complaint about the first implementation of the svdd was, that
it was completely aimed at the RBF kernel. That was because the
optimization simplifies significantly with this assumption. Using
ksvdd or incsvdd this restriction is now lifted. In particular
incsvdd is recommended because it does not rely on external
quadratic programming optimizers which always creates problems.
- There is also a set of functions
for visualizing the output of a classifier in 2D. One can define a grid
of objects around a 2D dataset, and put that into a dataset. That
dataset can be classified by the classifier, and mapped back into the
feature space. The user can thus inspect the output of the classifier
for the whole feature space around the target class.
This is explicitly done in the following code:
>> x = target_class(gendatb([50 0]),'1');
>> w = svdd(x,0.1,5);
>> scatterd(x);
>> griddat = gendatgrid;
>> out = w*griddat;
>> plotg(out);
>> hold on;
>> scatterd(x);
- There is also one function which is in essence not
a one-class classifier, but a preprocessor: the kernel whitening
kwhiten. This mapping does not classify data, only transforms it
into a new dataset. It is hoped that it is transformed into a shaped
which can be described better by one-class classifiers. The easiest way
to work with this type of preprocessing, is to exploit some Prtools
techniques:
>> x = target_class(gendatb([50 0]),'1');
>> w_kpca = kwhiten(x,0.99,'p',2);
>> w = gauss_dd(w_kpca*x,0.1);
>> W = w_kpca*w;
This W can now be used as a normal classifier.
- I'm not responsible for the correct functioning of the toolbox,
but of course I do my best to make the toolbox as useful and bug-free
as possible. Please email me when you have found a bug
at D.M.J.Tax@prtools.org. I'm also very interested when people
have defined new one-class classifiers.