Dataset Biomed (targetcl. carrier)

Basic characteristics Biomed (targetcl. carrier)

67

target objects

The purpose of the analysis is to develop a screening procedure to detect carriers and to describe its effectiveness. Entries with missing values have been removed. Download mat-file with Prtools dataset.

127

outlier objects

5

features

Unsupervised PCA Biomed (targetcl. carrier)

On the left, the PCA scatterplot is shown, on the right the retained variance for varying number of features.
On the left, the PCA scatterplot is shown of data rescaled to unit variance, on the right the retained variance.

Supervised Fisher Biomed (targetcl. carrier)

On the left, the Fisher scatterplot is shown, on the right the ROC curve along this direction.

Results Biomed (targetcl. carrier)

The experiments are performed using dd_tools. A rudimentary explanation of the classifiers is given in the classifier section.

512, 0 outliers, AUC (x100) 5x strat. 10-fold
Classifiers Preproc
none unit var PCA 95\%
Gauss 62.1 (1.0) 60.0 (1.4) 63.4 (0.5)
Min.Cov.Determinant 53.4 (0.9) 53.4 (0.9) 56.0 (0.9)
Mixture of Gaussians 42.6 (1.5) 46.7 (2.0) 43.1 (1.4)
Naive Parzen 53.4 (0.9) 53.4 (0.9) 50.4 (0.8)
Parzen 38.5 (1.2) 49.1 (0.8) 48.4 (0.4)
k-means 40.6 (2.9) 56.0 (2.9) 41.0 (1.9)
1-Nearest Neighbors 27.2 (0.8) 46.7 (1.2) 25.0 (0.9)
k-Nearest Neighbors 27.2 (0.8) 46.7 (1.2) 25.0 (0.9)
Nearest-neighbor dist 44.4 (2.6) 57.5 (1.6) 42.1 (1.7)
Principal comp. 62.6 (1.1) 42.3 (1.3) 59.9 (3.5)
Self-Organ. Map 68.4 (2.1) 46.5 (3.0) 67.9 (1.0)
Auto-enc network 51.9 (5.3) 54.6 (2.5) 45.0 (3.3)
MST 31.9 (1.1) 47.1 (1.7) 33.7 (1.2)
L_1-ball 66.9 (0.6) 73.2 (1.5) 64.0 (0.9)
k-center 32.1 (4.5) 41.4 (4.9) 27.8 (2.6)
Support vector DD 50.0 (0.0) 42.9 (2.2) 41.6 (0.9)
Minimax Prob. DD 23.9 (0.5) 51.6 (1.4) 24.3 (0.8)
LinProg DD 25.4 (0.6) 64.6 (0.8) 24.4 (1.0)
Lof DD 49.4 (2.5) 50.4 (2.5) 45.8 (2.4)
Lof range DD 41.9 (2.0) 53.0 (1.5) 41.6 (1.4)
Loci DD 49.4 (1.2) 50.2 (1.3) 48.0 (0.8)

Classifier projection spaces The first classifier projection spaces are obtained by computing the classifier label disagreements (setting the threshold on 10% target error) and applying an MDS on the resulting distance matrix between classifiers:



Original



Unit variance



PCA mapped

Classifier projection spaces The second versions of the classifier projection spaces are obtained by computing the classifier ranking disagreements and applying an MDS on the resulting distance matrix between classifiers:



Original



Unit variance



PCA mapped