next up previous contents index
Next: Contents.m of the toolbox Up: dd_manual Previous: Cross-validation   Contents   Index


General remarks

In this chapter I collected some remarks which are important to see once, but did not fit in the line of the previous chapters.

  1. If you want to know more about a classifier or function, always try the help command.

  2. Also have a look at the file Contents.m. This contains the full list of functions and classifiers defined in the toolbox.

  3. In older versions of the toolbox, the width parameter $ \sigma$ in the support vector data description was optimized automatically. This was done such that a prespecified fraction $ {\rm\tt fracrej}$ of the objects was on the boundary (so these are support vectors with $ 0<\alpha_i<1/(N\nu)$). Another parameter $ C$ was set such that another prespecified fraction $ {\rm\tt fracerr}$ of objects was outside the boundary (the support vectors with $ \alpha_i=1/(N\nu)$). The default of this fraction was $ {\rm\tt fracerr} =
0.01$ and was often ignored in practical experiments. But this lead sometimes to poor results, and created a lot of confusion. If you really want to, and if you're lucky that I included it, it is still available under newsvdd.m.

    I decided to consider the parameter $ \sigma$ as a hyper-parameter. This parameter will not be optimized automatically, but has to be set by the user. To obtain the prespecified error $ {\rm\tt fracrej}$ on the target set, the parameter $ C$ will be set. The parameter $ {\rm\tt fracerr}$ is removed.

    Another complaint about the first implementation of the svdd was, that it was completely aimed at the RBF kernel. That was because the optimization simplifies significantly with this assumption. Using ksvdd or incsvdd this restriction is now lifted. In particular incsvdd is recommended because it does not rely on external quadratic programming optimizers which always creates problems.

  4. There is also a set of functions for visualizing the output of a classifier in 2D. One can define a grid of objects around a 2D dataset, and put that into a dataset. That dataset can be classified by the classifier, and mapped back into the feature space. The user can thus inspect the output of the classifier for the whole feature space around the target class.

    This is explicitly done in the following code:

      >> x = target_class(gendatb([50 0]),'1');
      >> w = svdd(x,0.1,5);
      >> scatterd(x);
      >> griddat = gendatgrid;
      >> out = w*griddat;
      >> plotg(out);
      >> hold on;
      >> scatterd(x);
    

  5. There is also one function which is in essence not a one-class classifier, but a preprocessor: the kernel whitening kwhiten. This mapping does not classify data, only transforms it into a new dataset. It is hoped that it is transformed into a shaped which can be described better by one-class classifiers. The easiest way to work with this type of preprocessing, is to exploit some Prtools techniques:
      >> x = target_class(gendatb([50 0]),'1');
      >> w_kpca = kwhiten(x,0.99,'p',2);
      >> w = gauss_dd(w_kpca*x,0.1);
      >> W = w_kpca*w;
    
    This W can now be used as a normal classifier.

  6. I'm not responsible for the correct functioning of the toolbox, but of course I do my best to make the toolbox as useful and bug-free as possible. Please email me when you have found a bug at D.M.J.Tax@prtools.org. I'm also very interested when people have defined new one-class classifiers.


next up previous contents index
Next: Contents.m of the toolbox Up: dd_manual Previous: Cross-validation   Contents   Index
David M.J. Tax 2006-07-26