Dataset Survival >5yr

Basic characteristics Survival >5yr

225

target objects

Haberman's Survival Data from UCI. The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer. The class with more than 5 year survival is used as target class. Download mat-file with Prtools dataset.

81

outlier objects

3

features

Unsupervised PCA Survival >5yr

On the left, the PCA scatterplot is shown, on the right the retained variance for varying number of features.
On the left, the PCA scatterplot is shown of data rescaled to unit variance, on the right the retained variance.

Supervised Fisher Survival >5yr

On the left, the Fisher scatterplot is shown, on the right the ROC curve along this direction.

Results Survival >5yr

The experiments are performed using dd_tools. A rudimentary explanation of the classifiers is given in the classifier section.

616, 0 outliers, AUC (x100) 5x strat. 10-fold
Classifiers Preproc
none unit var PCA 95\%
Gauss 59.2 ( 2.4) 60.1 ( 2.8) 59.2 ( 2.4)
Min.Cov.Determinant 69.5 ( 0.6) 69.5 ( 0.6) 69.5 ( 0.6)
Mixture of Gaussians 65.2 ( 1.9) 65.2 ( 2.4) 65.1 ( 2.0)
Naive Parzen 64.8 ( 1.3) 64.8 ( 1.3) 67.5 ( 2.4)
Parzen 66.5 ( 1.2) 65.3 ( 2.6) 66.5 ( 1.2)
k-means 64.9 ( 2.6) 62.3 ( 3.0) 65.4 ( 1.6)
1-Nearest Neighbors 50.8 ( 2.9) 50.2 ( 3.2) 48.8 ( 2.5)
k-Nearest Neighbors 50.8 ( 2.9) 50.2 ( 3.2) 48.8 ( 2.5)
knn, opt-AUC 49.8 ( 4.4) 52.2 ( 2.8) 49.2 ( 4.8)
Nearest-neighbor dist 47.2 ( 4.0) 49.6 ( 2.5) 49.0 ( 4.7)
Principal comp. 53.9 ( 0.9) 47.2 ( 3.9) 53.9 ( 0.9)
Self-Organ. Map 62.8 ( 2.2) 62.1 ( 2.8) 62.8 ( 2.2)
Auto-enc network 59.3 ( 4.4) 59.7 ( 1.6) 60.1 ( 2.4)
Spanning Tree 49.0 ( 5.0) 49.8 ( 4.3) 49.9 ( 5.9)
L_1-ball 51.0 ( 1.1) 49.0 ( 0.6) 57.2 ( 1.0)
k-center 47.3 ( 2.5) 56.1 ( 4.6) 52.8 ( 0.5)
Support vector DD 53.4 ( 2.7) 58.4 ( 0.9) 52.5 ( 3.4)
Minimax Prob. DD 66.9 ( 0.7) 53.3 ( 0.5) 66.9 ( 0.7)
LinProg DD 63.0 ( 3.5) 55.0 ( 2.8) 62.4 ( 3.6)

Classifier projection spaces The first classifier projection spaces are obtained by computing the classifier label disagreements (setting the threshold on 10% target error) and applying an MDS on the resulting distance matrix between classifiers:



Original



Unit variance



PCA mapped

Classifier projection spaces The second versions of the classifier projection spaces are obtained by computing the classifier ranking disagreements and applying an MDS on the resulting distance matrix between classifiers:



Original



Unit variance



PCA mapped