The first and most important thing for the application of one-class classifiers, is the dataset and its preprocessing. All one-class classifiers require a dataset Matlab object with objects labeled target or outlier. To create a one-class dataset, several functions are supplied: gendatoc, oc_set and target_class. What are the differences between the three?
>> xt = randn(40,2); >> xo = gendatb([20,0]); >> x = gendatoc(xt,xo);In gendatoc xt or xo do not have to be defined, they can be empty (xt = [] or xo = []). To label a Matlab array as outlier, is therefore every easily done:
>> xo = 10*randn(25,2); >> x = gendatoc([],xo);
If xt or xo is a Prtools dataset, this data is converted back to normal Matlab arrays. That means that the label information in these datasets is lost. All data in xt will be labeled target and all data in xo outlier, without exception.
>> x = gendatb([20,20]); % 40 objects in 2D >> x = oc_set(x,'1')Now you still have 40 objects, half is labeled target, the other half outlier.
This function oc_set also accepts several classes to be labeled as target class. When a 10-class problem is loaded, a subset of these classes can be assigned to be target class:
>> load nist16; % this is an example of a 10-class dataset, >> % it might not be available everywhere >> a 2000 by 256 dataset with 10 classes: [200 200 200 200 200 200 200 200 200 200] >> x = oc_set(a,[1 5 6]) %select three classes Class 0 is used as target class. Class 4 is used as target class. Class 5 is used as target class. (3 classes as target), 2000 by 256 dataset with 2 classes: [600 1400]
When you don't supply labels, it is assumed that all data is target data:
>> x = rand(20,2); >> x = oc_set(x)This constructs a dataset, containing 20 target objects in 2D. All objects are now labeled target. When you want to label this data as outlier, you have to supply it as the second argument: x = oc_set([],x).
>> x = gendatb([20,20]); % 40 objects in 2D >> x = target_class(x,'1') % 20 objects in 2DNow dataset x just contains 20 target objects. You can achieve the same in this way:
>> x = gendatb([20,20]); % 40 objects in 2D >> x = oc_set(x,'1'); >> x = target_class(x) % 20 objects in 2Dbut this is not so efficient.
In some cases you may need to extract the outlier data. This is obtained as the second output argument from target_class:
>> [xt,xo] = target_class(x) % xo contains 20 outlier objects