next up previous contents index
Next: Generating artificial outliers Up: Error computation Previous: Area under the ROC   Contents   Index


Cost curve

Figure 5.2: The cost curve derived from the same dataset and classifier as in Figure 5.1.
\includegraphics[width=6cm]{bananacostc}
Another proposal for plotting the performance of a classifier is by using the cost-curve [DH00]. As you can see in figure 5.1, there are many thresholds that are suboptimal: there is often another operating point for which at least one of the errors is lower. For instance, the operating point $ FP=0.38, TP=0.72$ is suboptimal, because operating point $ FP=0.38, TP = 0.92$ has much better true positive rate with equal false positive rate. For a relative large range of misclassification costs the operating point $ FP=0.38, TP = 0.92$ will be the optimal one.

This is indicated in a cost curve. For a varying cost-ratio between the two classes, the (normalized) expected cost is computed. Each operating point appears as a line in this plot. In Figure 5.2 the cost curve for the same dataset and classifier as of the ROC curve in Figure 5.1 is shown. The operating point of Figure 5.1 is indicated by the dotted line in Figure 5.2. The combination of operating points that form the lower hull is indicated by the thick line, and shows the best operating points over the range of costs. This cost curve is obtained like:

  >> a = oc_set(gendatb,1);
  >> w = gauss_dd(a,0.1);
  >> c = a*w*dd_costc;
  >> plotcostc(c)
For more information, please look at [DH00].


next up previous contents index
Next: Generating artificial outliers Up: Error computation Previous: Area under the ROC   Contents   Index
David M.J. Tax 2006-07-26