Dataset Survival <5yr

Basic characteristics Survival <5yr

81

target objects

Haberman's Survival Data from UCI. The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer. The class with less than 5 year survival is used as target class. Download mat-file with Prtools dataset.

225

outlier objects

3

features

Unsupervised PCA Survival <5yr

On the left, the PCA scatterplot is shown, on the right the retained variance for varying number of features.
On the left, the PCA scatterplot is shown of data rescaled to unit variance, on the right the retained variance.

Supervised Fisher Survival <5yr

On the left, the Fisher scatterplot is shown, on the right the ROC curve along this direction.

Results Survival <5yr

The experiments are performed using dd_tools. A rudimentary explanation of the classifiers is given in the classifier section.

617, 0 outliers, AUC (x100) 5x strat. 10-fold
Classifiers Preproc
none unit var PCA 95\%
Gauss 49.0 ( 3.3) 49.3 ( 3.1) 50.1 ( 3.1)
Min.Cov.Determinant 48.5 ( 1.0) 48.5 ( 1.0) 49.9 ( 1.4)
Mixture of Gaussians 43.7 ( 0.7) 42.9 ( 2.0) 44.1 ( 1.0)
Naive Parzen 44.6 ( 1.6) 44.6 ( 1.6) 51.2 ( 1.4)
Parzen 44.8 ( 2.0) 45.4 ( 1.7) 45.4 ( 1.9)
k-means 47.8 ( 3.6) 45.9 ( 5.4) 47.7 ( 2.9)
1-Nearest Neighbors 52.4 ( 2.5) 47.7 ( 4.0) 50.4 ( 3.6)
k-Nearest Neighbors 52.4 ( 2.5) 47.7 ( 4.0) 50.4 ( 3.6)
knn, opt-AUC 50.4 ( 5.4) 42.5 ( 2.9) 47.8 ( 2.3)
Nearest-neighbor dist 49.0 ( 6.6) 50.5 ( 4.1) 47.8 ( 7.2)
Principal comp. 51.3 ( 1.6) 48.3 ( 2.1) 47.4 ( 3.9)
Self-Organ. Map 46.3 ( 2.6) 41.6 ( 2.6) 46.3 ( 3.8)
Auto-enc network 47.7 ( 1.7) 49.3 ( 3.8) 49.2 ( 2.0)
Spanning Tree 46.3 ( 4.0) 44.1 ( 3.2) 45.5 ( 5.5)
L_1-ball 48.6 ( 1.0) 51.5 ( 1.4) 54.4 ( 2.4)
k-center 49.7 ( 3.9) 48.5 ( 2.8) 48.1 ( 3.7)
Support vector DD 49.3 ( 3.1) 48.5 ( 2.5) 48.2 ( 2.6)
Minimax Prob. DD 42.7 ( 1.3) 50.7 ( 1.0) 42.5 ( 1.5)
LinProg DD 42.7 ( 1.4) 53.6 ( 2.5) 42.5 ( 1.9)

Classifier projection spaces The first classifier projection spaces are obtained by computing the classifier label disagreements (setting the threshold on 10% target error) and applying an MDS on the resulting distance matrix between classifiers:



Original



Unit variance



PCA mapped

Classifier projection spaces The second versions of the classifier projection spaces are obtained by computing the classifier ranking disagreements and applying an MDS on the resulting distance matrix between classifiers:



Original



Unit variance



PCA mapped