Use of Extreme Value Statistics in Modeling Biometric Systems

Size: px

Start display at page:

Download "Use of Extreme Value Statistics in Modeling Biometric Systems"

Cameron Allison
5 years ago
Views:

1 Use of Extreme Value Statistics in Modeling Biometric Systems

2 Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample

3 Probability Density Decision Threshold Imposter sample Imposter 0.32 Genuine Genuine sample 0.95 FRR t 0 t EER FAR matching score Threshold A False Accept (FA) is made when a presented identity is incorrectly accepted (false positive). A False Reject (FR) is a result obtained when the presented identity is incorrectly rejected (false negative). The False Accept Rate (FAR) and False Reject Rate (FRR) together characterize the performance of an authentication system, which can be expressed in terms of the cumulative probability distributions of scores.

4 Objectives Score level modeling in terms of Probability Density Functions (PDFs) or cumulative Distribution Function (CDF) Data: The genuine scores and impostor scores. Methods: Extreme value statistics Issues: Tail distribution estimation Reliability assurance confidence intervals

5 Score level modeling Applications: Multimodal Fusion System performance optimization and evaluation Prediction of how systems scale to very large populations Confidence interval estimation How sure is that the measured performance will hold up. What s the sample variance? How sure is that the measured performance beats a performance requirement? How many samples need to be used for this?

6 Distribution modeling Classical statistics methods: Parametric methods: Gaussian (central limit theorem) mixture models Non-parametric method: Kernel density estimates

7 Distribution modeling Classical statistics methods (continue) Limitation: model the AVERAGE behavior of a process. Rely on the assumption of the nature of the distribution. The analysis is largely insensitive to the tails of the distribution where the biometric systems are prone to error.

8 Distribution modeling Extreme value theory (EVT) Model the EXTREME behavior (the tail of a distribution) Deal with very small data sets!

9 Biometric score modeling Earlier methods: Empirical estimation Given a set of match scores (genuine or imposter) X = {X 1,,X M } drawn from a population with distribution F(x) = Prob(X x). A simple naive estimator called the empirical distribution function is used: where the estimation is simply a count of the X i in X that are smaller than t 0 (divided by M).

10 Biometric score modeling Earlier methods: Empirical estimation F(x) = Prob(X x) (CDF) p(x) (PDF) x

11 Biometric score modeling Earlier methods: Simulation Approach Given a set of match scores Use the same empirical distribution function Define Value of H(t) at is linearly interpolated The estimated distribution of F(t) is then given as

12 Biometric score modeling Earlier methods: Simulation Approach (continue) Let then is actually exponential quantile of the empirical distribution. The estimated distribution is To generate simulation data, one starts with the uniform random variable U in [0, 1]. Data following distribution is then generated as

13 Extreme Value Theory and Tail CDF Why tail distribution important? Most biometric systems expect an equal error rate (EER) of 5% or lower. Many fingerprint verification systems achieve even 2% EER [1]. Threshold setting as well as assessing the claimed performance of vendor systems require accurate estimation of the tail distributions of both genuine and impostor scores (See Fig. 1).

14 Extreme Value Theory and Tail CDF Difficulty: Lack of data in tail. 95% threshold 5% of the scores leaves the tail only 139 scores. 2% tail will leave us with only 56 scores. Empirical distribution of 2800 genuine scores of CUBS data generated on FVC2004 fingerprint images.

15 Extreme Value Theory and Tail CDF EVT is explicitly focused on accurate modeling of the tails of the distributions and thus ideal for modeling biometric matching scores For the biometric scores being truncated at t, leaving the tail data, the distribution function (d.f.) is defined as:

16 Extreme Value Theory (EVT) Approaches Block Maxima (GEV) Threshold approach (GPD)

Extreme Value Theory (EVT) Block Maxima (GEV) Example: Biometric system monitoring: Model extreme daily recorded matching scores Take block maximum maximum

17 Extreme Value Theory (EVT) Block Maxima (GEV) Example: Biometric system monitoring: Model extreme daily recorded matching scores Take block maximum maximum daily matching scores (minimum of genuine or maximum of imposter) for each day: M n = max{x 1,,X n } m collected daily records (data points for M n, training data):

18 Extreme Value Theory (EVT) Block Maxima (GEV) Extremal Type Theory The distribution of maximum, M n = max{x 1,,X n } converges to (as n ) G(x) is called the Generalized Extreme Value (GEV) distribution and has 3 parameters: shape parameter ξ location parameter µ scale parameter σ.

19 Extreme Value Theory (EVT) Block Maxima (GEV) It is another limit theorem The Central Limit Theorem If the sum of the variables has a finite variance, then it will be approximately normally distributed The extremal type theorem the maximum of many i.i.d random variables has an asymptotic distribution as GEV

20 Extreme Value Theory (EVT) Threshold Models Model exceedances over a high threshold u {Y X>u}, where Y=X-u. Allows to make more efficient use of the data E.g. off daily measurement of biometric matching scores Block maxima use one number Threshold models use all numbers above the threshold

21 Extreme Value Theory (EVT) Threshold Models Specifically for modeling tail distribution For accurate estimation of the tail

22 Extreme Value Theory (EVT) Threshold Models for tail estimation Let X 1,X 2, be a sequence of i.i.d. random variables having a distribution function F Consider the sequence of sample excess Y 1, Y 2 where Y i = X i - u is defined for all X i > u The conditional excess distribution function is then defined as

23 Extreme Value Theory (EVT) Threshold Models Theorem: The distribution of Y :=X-u X>u converges to (as u ) is called the Generalized Pareto distribution (GPD) with 2 parameters. shape parameter ξ scale parameter σ. The shape parameter ξ is the same parameter as in the GEV distribution.

24 Extreme Value Theory (EVT) Threshold Models theory

25 Extreme Value Theory (EVT) Tail estimation of the underlying distribution The relation between conditional excess d.f. F u and the underlying distribution F Therefore for x u (the tail), (remember y=x-u)

26 Extreme Value Theory (EVT) Tail estimated by GPD: With the empirical distribution value at u) and for F(u) (CDF evaluated at one where N u is the number of data above the threshold u. We have Which simplifies to

27 Extreme Value Theory (EVT) Tail estimated by GPD: In two-parameter case where Nu is the number of data above the threshold u

28 Fitting biometric data to GPD Fitting procedure Data collection Exploratory graphical tools Quantile-Quantile Plot (QQ-Plot) Sample Mean Excess Plot Fitting: Maximum Likelihood Estimation

29 Fitting biometric data to GPD Data collection CUBS Fingerprint Matching Data 2800 genuine matching scores and 4950 impostor matching scores NIST biometric matching scores Fingerprint, iris, face Ultra-scan Fingerprint Matching Data Training and testing Both contain impostor and 1000 genuine scores

30 Fitting biometric data to GPD Data collection Histogram of the CUBS genuine scores (flipped) Sample size is 2800

31 Fitting biometric data to GPD Exploratory graphical tools Quantile-Quantile Plot (QQ-plot) Validating that if a data set comes from populations of a given distribution. A quantile: Inverse of CDF A reference line is also plotted. If the two sets come from a population with the same distribution, the points should fall approximately along this reference line.

32 Fitting biometric data to GPD Quantile-Quantile Plot (QQ-plot) The QQ-plot against the exponential distribution is a very useful guide for visual examination of the hypothesis that the data comes from an exponential distribution, i.e. a special case of GPD. QQ-plot of the quantiles of the empirical distribution function against exponential distribution is defined as Where X k,n is k-th value in the ordered list and G is the exponential distribution function. (the considered data set is ordered as {X 1:n,, X n:n } )

33 Fitting biometric data to GPD Quantile-Quantile Plot (QQ-plot) QQ-plot of the CUBS data plotted against the exponential distribution

34 Fitting biometric data to GPD Sample Mean Excess Plot For threshold selection Theorem: If X has a GPD with parameters ξ and σ, the excess over the threshold u also has a GPD and its mean excess function over this threshold is given by [13] which is a linear function of u. If the empirical mean excess of given scores has a linear shape, then a GPD might be the model for the overall distribution

35 Fitting biometric data to GPD Sample Mean Excess Plot In the case when the mean excess plot has a linear shape only beyond a certain point, it suggests that the point should be a candidate for a threshold. The sample mean plot is defined as the plot of the points: Where the data set is sorted as {X 1:n,, X n:n } in ascendant order and the sample mean excess function is defined as

36 Fitting biometric data to GPD Sample Mean Excess Plot The sample mean excess plot of CUBs data shows a general linear pattern. A close inspection of the plot suggests that a threshold can be chosen at the value u = 0.45

37 Fitting biometric data to GPD Fitting Maximum Likelihood Estimation Taking first derivative of GPD for a sample x = {x 1,, x n }, the log-likelihood function for the GPD n L(, x) log G, ( x i ) Therefore i 1 a

38 Confidence interval for CDF using GPD Derive Close form confidence interval for the tail estimation using GPD Recall the tail estimation Rename it as

39 Confidence interval for CDF using GPD The Taylor expansion around the estimated The variance of is given asymptotically

40 Confidence interval for CDF using GPD The Fisher information matrix gives the asymptotic variance of the MLE (Smith 1984 [18]) Thus For CEF with asymptotic confidence level α, the (1- α)100% confidence interval is

41 Experiments Fitting with GPD GPD fits to imposter data of the training set values GPD fit to tails of imposter data The red lines are the GPD estimates, which are inside the empirical estimates of the data represented by solid darker dots. The thick vertical bars are examples of confidence intervals.

42 Experiments Fitting with GPD GPD fits to genuine data of the training set values GPD fit to tails of genuine data The red lines are the GPD estimates, which are inside the empirical estimates of the data represented by solid darker dots. The thick vertical bars are examples of confidence intervals.

43 Experiments Fitting with GPD Test imposter fitting: The test CDFs (solid darker dots) are matching or close to the estimates of training CDFs (green line). The thick vertical bar again is an example of confidence intervals. (The confidence for imposter data is too small to see in the plot.)

44 Experiments Fitting with GPD Test imposter tail fitting:

45 Experiments Fitting with GPD Test genuine fitting: The test CDFs (solid darker dots) are matching or close to the estimates of training CDFs (green line). The thick vertical bar again is an example of confidence intervals.

46 Experiments Fitting with GPD ROC: ROC with an example of confidence box. Note that the box is narrow in the direction of imposter score due to ample availability of data.

47 Conclusion EVT is the most scientific approach to an inherently difficult problem predicting the probability of a rare event. Extreme Value Theory is a theoretically supported method for problems with interest in accurately modeling the tails of the probability distributions. EVT is a feasible method being used for biometric data. The derivation of the confidence interval for GPD estimation and confidence bands for ROC, is important contribution to the field of biometrics.

Extreme Value Theory in (Hourly) Precipitation

Extreme Value Theory in (Hourly) Precipitation Uli Schneider Geophysical Statistics Project, NCAR GSP Miniseries at CSU November 17, 2003 Outline Project overview Extreme value theory 101 Applying extreme