Fondazione Bruno Kessler - Technologies of Vision

contains material from
Template Matching Techniques in Computer Vision: Theory and Practice
Roberto Brunelli © 2009 John Wiley & Sons, Ltd

3.1 The ROC curve

A binary classification task can be considered as a binary hypothesis testin problem where one of the two competing hypotheses H0 and H1 must hold. The two basic probabilities characterizing the Neyman-Pearson approach to testing are the false alarm error probability PF and the detection probability PD. The former is the probability of returning H1 when the true world state is described by H0 and is also known as the probability of a type I error (or false acceptance rate, FAR). The latter, also known as the test power, gives the probability with which H1 is returned (by the classifier) when the true world state is H1. Neyman-Pearson classification, which maximizes PD under a specified bound on PF , results in a simple thresholding operation on the likelihood ratio value Λ(⃗x) for a pattern ⃗x under hypothesis H0:

Λ(⃗x) H1≷ ν
    H0
(3.1)

The relation between the two probabilities, specifically PD as a function of PF , is usually represented with a receiver operating characteristic (ROC) curve that is extensively discussed in Appendix TM:C:

PD = PD (PF )
(3.2)

The quantity 1 - PD represents the false negative error probability and is also known as the false rejection rate (FRR). The ROC curve is often reported as

FAR  = FAR  (FRR  )
(3.3)

While the ROC curve provides detailed information on the trade-off between the two types of errors, classification system are often synthetically characterized by means of the equal error rate (EER), the intersection of the ROC curve with the diagonal

FAR  = FRR
(3.4)

When the data distribution under the two competing hypotheses is Gaussian with the same covariance matrix (and different means) the probabilities considered above can be computed in close form (see Section TM:3.3) and the multidimensional case does not present significant differences from the one dimensional one. The key parameter, fixing the maximum achieavable performance, is the separation of the distributions mean with respect to distribution standard deviation. With reference to Equations TM:3.50-58, we can from parameter ν to

     ν   σ
z = ---+ --0
    σ0    2
(3.5)

in order to compute PD = PD(PF ) exploint the fact that the Q-function is simply the complement to 1 of the distribution function (pnorm)

1 z   <- function(nu, s) (nu/s + s/2) 
2 # generate a sequence of threhsolds 
3 nus <- seq(-10,10,by=0.1) 
4 # transform them to z (with sigma = 3) ... 
5 zs  <- z(nus, 3) 
6 tm.dev("figures/normalRoc") 
7 plot(1-pnorm(zs),  1 - pnorm(zs-3), type="l", lty=1, 
8 ...      xlab="False alarm rate", ylab="Detection rate") 
9 zs  <- z(nus, 2) 
10 lines(1-pnorm(zs),  1 - pnorm(zs-2), type="l", lty=2) 
11 zs  <- z(nus, 1) 
12 lines(1-pnorm(zs),  1 - pnorm(zs-1), type="l", lty=3) 
13 grid() 
14 legend(0.6,0.6, c("sigma=3", "sigma=2", "sigma=1"), lty=c(1,2,3)) 
15 dev.off()


PIC

Figure 3.1: When data are normally distributed the maximum achievable performance is determined by the separation of the classes relative to the standard deviation of the distribution σ0: the higher σ0, the closer to the upper left corner the curve.