Estimation and sample size calculations for matching performance of biometric authentication 1

Size: px
Start display at page:

Download "Estimation and sample size calculations for matching performance of biometric authentication 1"

Transcription

1 Estimation and sample size calculations for matching performance of biometric authentication 1 Michael E. Schuckers Department of Mathematics, Computer Science and Statistics St. Lawrence University, Canton, NY USA and Center for Identification Technology Research (CITeR) Abstract Performance of biometric authentication devices can be measured in a variety of ways. The most common way is by calculating the false accept and false reject rates, usually referred to as FAR and FRR, respectively. In this paper we present two methodologies for creating confidence intervals for matching error rates. The approach that we take is based on a general parametric model. Because we utilize a parametric model, we are able to invert our confidence intervals to develop appropriate sample size calculations that account for both number of attempts per person and number of individuals to be tested a first for biometric authentication testing. The need for sample size calculations that acheive this is currently acute in biometric authentication. These methods are approximate and their small sample performance is assessed through a simulation study. The distribution we use for simulating data is one that arises repeatedly in actual biometric tests. Key words: False match rate, false non-match rate, intra-individual correlation, logit transformation, Beta-binomial distribution, confidence intervals, Monte Carlo simulation, sample size calculations 1991 MSC: 62F11, 62F25, 62N99 address: schuckers@stlawu.edu (Michael E. Schuckers). 1 This research was made possible by generous funding from the Center for Identification Technology Research (CITeR) at the West Virginia University and by NSF grant CNS which is cooperative funded by the National Science Foundation and the United States Department of Homeland Security. Preprint submitted to Pattern Recognition 21 November 2005

2 1 Introduction Biometric authentication devices or biometric authenticators aim to match a presented physiological image against one or more stored physiological images. This matching performance of a biometric authenticator (BA) is an important aspect of their overall performance. Since each matching decision from a BA results in either a reject or an accept, the two most common measures of performance are the false reject rate and the false accept rate. These rates are often abbreviated by FAR and FRR, respectively, and the estimation of these quantities is the focus of this paper. The methods described herein are equally applicable for false match rates and false non-match rates since data for these summaries are collected in a similar manner. As acknowledged in a variety of papers, including [1], [2], and more recently by [3], there is an ongoing need for assessing the uncertainty in the estimation of these error rates for a BA. In particular there is an immense need for tools that assess the uncertainty in estimation via confidence intervals (CI s) and for sample size calculations grounded in such intervals. Along these lines, the question of how many individuals to test is particularly difficult because biometric devices are generally tested on multiple individuals and each individual is tested multiple times. In this paper we present two CI methodologies for estimating the matching error rates for a BA. We then invert the better performing of these CI s to create sample size calculations for error rate estimation. Several methods for the estimation of FAR and FRR appear in the biometrics literature. These generally fall into two approaches: parametric and nonparametric. Among the parametric approaches, [4] uses the binomial distribution to make confidence intervals and obtain sample size calculations, while [5] models the features by a multivariate Gaussian distribution to obtain estimates of the error rates. The approach of [6] is to assume that individual error rates follow a Gaussian distribution. In a similar vein, [7] assumes a Beta distribution for the individual error rates and uses maximum likelihood estimation to create CI s. Several non-parametric approaches have also been considered. [8] outlined exact methods for estimating the error rates for binomial data as well as for estimating the FAR when cross-comparisons are used. Two resampling methods have been proposed. They are the block bootstrap or subsets bootstrap of [1] and the balanced repeated replicates approach of [9]. It is worth noting that non-parametric methods do not allow for sample size calculations since it is not possible to invert these calculations. The approach taken in this paper will be a parametric one that allows for sample size calculations. In addition to CI methods, several approximate methods have been proposed for sample size calculations. Doddington s Rule [10] states that one should collect data until there are 30 errors. Likewise the Rule of 3 [2] is that 2

3 3/(the number of attempts) is an appropriate upper bound for a 95% CI for the overall error rate when zero errors are observed. However, both of these methods make use of the binomial distribution which is often an unacceptable choice for biometric data [8]. [6] developed separate calculations for the number of individuals to test and for the number of tests per individual. Here we propose a single formula that accounts for both. In this paper we propose two methods for estimating the error rates for BA s. The first of these methods is a simplification of the methodology based on maximum likelihood estimation that is found in [7]. The advantage of the approach taken here is that it does not depend on numerical maximization methods. The second is a completely new method based on a transformation. For both methods we present simulations to describe how these methods perform under a variety of simulated conditions. In order to validate the simulation strategy we show that the versatile distribution used in our simulation is a good fit for real data collected on BA s. Finally we develop sample size calculations based on the second methodology since that method performed better. The paper is organized in the following manner. Section 2 describes a general model formulated by [11] for dealing with overdispersion in binary data. There we also introduce this model and the notation we will use throughout the paper. Section 3 introduces our CI methodologies based on this model. Results from a simulation study of the performance of these CI s are found in Section 4. In that section, we also present an argument for the validity and applicability of the simulated data. Section 5 derives sample size calculations based on the second model. Finally, Section 6 contains a discussion of the results presented here and their implications. 2 Extravariation Model As mentioned above, several approaches to modelling error rates from a BA have been developed. In order to develop sample size calculations, we take a flexible parametric approach. Previously [7] presented an extravariation model for estimating FARs and FRRs based on the Beta-binomial distribution which assumes that error rates for each individual follow a Beta distribution. Here we follow [11] in assuming the first two moments of a Beta-binomial model but we do not utilize the assumptions of the shape of the Beta distribution for individual error rates found in [7]. We also replace the numerical estimation methods therein with closed form calculations. Details of this model are given below. We begin by assuming an underlying population error rate, either FAR or FRR, of π. Following [2], let n be the number of comparison pairs tested and let m i be the number of decisions made regarding the i th comparison pair with 3

4 i = 1, 2,...,n. We define a comparison pair broadly to encompass any measurement of a biometric image to another image, of an image to a template or of a template to a template. This enables us to model both FAR and FRR in the same manner. Then for the i th comparison pair, let X i represent the observed number of incorrect decisions from the m i attempts and let p i = m 1 i represent the observed proportion of errors from m i observed decisions from the i th comparison pair. We assume that the X i s are conditionally independent given m i,n,π and ρ. Then, X i E[X i π,ρ,m i ] =m i π V ar[x i π,ρ,m i ] =m i π(1 π)(1 + (m i 1)ρ) (1) where ρ is a term representing the degree of extravariation in the model. The assumption of conditional independence is the same one that is implicit in the subset bootstrap of [1]. The ρ found in (1) is often referred to as the intra-class correlation coefficient, see e.g. [12] or [13]. Here we will refer to it as the intra-comparison correlation The model in (1) reduces to the binomial if ρ = 0 or m i = 1 for all i and, thus, (1) is a generalization of the binomial that allows for within comparison correlation. 3 Confidence Intervals The previous section introduced notation for the extravariation model. Here we use this model for estimating an error rate π. Suppose that we have n observed X i s from a test of a biometric authentication device. We can then use that data to estimate the parameters of our model. Let ( n ) 1 n ˆπ = m i X i i=1 i=1 n X i (X i 1) 2ˆπ(m i 1)X i + m i (m i 1)ˆπ 2 i=1 ˆρ = n m i (m i 1)ˆπ(1 ˆπ) i=1 (2) This estimation procedure for ρ is given by [14], while here we use a traditional unbiased estimator of π. 4

5 3.1 Traditional Confidence Interval We simplify the approach of [7] by replacing the maximum likelihood estimates with a moments-based approach. Thus we have an estimate, ˆπ, of the error rate, π, and the intra-comparison correlation, ˆρ. We use these to evaluate the standard error of ˆπ following (1) assuming that the image pairs tested are conditionally independent of each other. The estimated variance of ˆπ is then ˆV [ˆπ] = ˆV [ Xi mi ] = ( m i ) 2 ˆV [Xi ] ˆπ(1 ˆπ)(1 + (m 0 1)ˆρ) mi (3) where n (m i m) 2 i=1 m 0 = m (4) mn(n 1) and m = n 1 n i=1 m i. Note that in the notation of [7], (1 + (m 0 1)ρ) = C. We can create a nominally 100 (1 α)% CI for π from this. Using the results in [15] about the sampling distribution of ˆπ, we get the following interval ˆπ ± z 1 α 2 [ˆπ(1 ˆπ)(1 + (m0 1)ˆρ) mi ] 1/2 (5) where z 1 α 2 represents the 100 (1 α 2 )th percentile of a Gaussian distribution. Our use of the Gaussian or Normal distribution is justified by the asymptotic properties of these estimators [14,15]. 3.2 Logit Confidence Interval One of the traditional difficulties with estimation of proportions near zero (or one) is that sampling distributions of the estimated proportions are non- Gaussian. Another problem is that CI s for proportions, such as that given in (5), are not constrained to fall within the interval (0, 1). The latter is specifically noted by [2]. One method that has been used to compensate for both of these is to transform the proportions to another scale. Many transformations for proportions have been proposed including the logit, probit and arcsin of the square root, e. g. [14]. [16] has an extensive discussion of transformed CI s of the kind that we are proposing here. Below we use the logit or logodds transformation to create CI s for the error rate π. [17] offers a specific discussion of CI s based on a logit transformation for binomial proportions. 5

6 Table 1 Sample Confidence Intervals Based on the Traditional Approach Confidence Interval Modality Rate n m ˆπ ˆρ Lower Endpoint Upper Endpoint Hand FAR Finger FAR Face FAR Hand FRR Finger FRR Face FRR The logit or log-odds transformation is one of the most commonly used transformations in statistics. For this reason, we focus on the logit over other transformations. Define logit(π) ln( π ) as the natural logarithm of the 1 π odds of an error occurring. The logit function has a domain of (0, 1) and a range of (, ). One advantage of using the logit transformation is that we move from a bounded parameter space, π (0, 1), to an unbounded one, logit(π) (, ). Thus, our approach is as follows. We first transform our estimand, π, and our estimator ˆπ to γ logit(π) and ˆγ logit(ˆπ), respectively. Next, we create a CI for γ using an approximation to the standard error of ˆγ. Finally, we invert the endpoints of that interval back to the original scale. Letting ilogit(γ) logit 1 (γ) = eγ, we can create a 100(1 α)% CI using 1+e γ ˆγ. To do this we use a Delta method expansion for the estimated standard error of ˆγ. (The Delta method, as it is known in the statistical literature, is simply a one step Taylor series expansion of the variance. See [14] for more details.) Then our CI on the transformed scale is ˆγ ± z 1 α 2 ( )1 1 + (m0 2 1)ˆρ ˆπ(1 ˆπ)mn (6) where m = 1 n ni=1. Thus (6) gives a CI for γ = logit(π) and we will refer to the endpoints of this interval as γ L and γ U for lower and upper respectively. The final step for making a CI for π is to take the ilogit of both endpoints of this interval which results in (ilogit(γ L ),ilogit(γ L )). Thus an approximate (1 α) 100% CI for π is (ilogit(γ L ),ilogit(γ U )). (7) The interval, (7), is asymmetric because the logit is not a linear transformation. This differs from a traditional CI s that are plus or minus a margin of 6

7 Table 2 Sample Confidence Intervals Based on the Logit Approach Confidence Interval Modality Rate n m ˆπ ˆρ Lower Endpoint Upper Endpoint Hand FAR Finger FAR Face FAR Hand FRR Finger FRR Face FRR error. However, this interval has the same properties as other CI s. (See [18] for a rigorous definition of a CI.) In addition, this interval is guaranteed to fall inside the interval (0, 1) as long as at least one error is observed. In the next section we focus on how well the CI s found in (6) and (7) perform for reasonable values of π, ρ, m, and n. 3.3 Examples To illustrate these methods we present example CI s for both of the CI methods given above. Results for the traditional approach and the logit approach are found in table 1 and 2, respectively. The data used for these intervals comes from [19]. In that paper the authors investigated data from three biometric modalities face, fingerprint and hand geometry and recorded the match scores for ten within individual image pairs of 50 people and for five between individual image pairs for those same 50 individuals. Note that the between individual cross comparisons here are not symmetric and thus there were = 2450 comparison pairs in the sense we are using here. Thus there are 500 decisions to compare an individual to themselves and decisions regarding an individual to another individual. Here several things are apparent from these results. For this data the two intervals produce similar endpoints on the same data. This is a result of the relatively large n s. As noted earlier, the logit CI is asymmetric and has intervals that are larger while the traditional confidence interval is symmetric. 7

8 Table 3 Goodness-of-fit test results for hand geometry FAR s from data found in [19], Threshold ˆπ p-value Assessing Performance To test the small sample performance of these CI s we simulate data from a variety of different scenarios. Simulations were run because they give the best gauge of performance for statistical methodology under a wide variety of parameter combinations. We will refer to each parameter combination as a scenario. Assuming that the simulated data is similar in structure to observed data, we get a much better understanding of performance from simulation than from looking at a single observed set of data. Below we argue that the distribution that we use for simulation is an excellent fit to data from [19]. Further details on the Monte Carlo approach to evaluating statistical methodology can be found in [20]. Under each scenario, 1000 data sets were generated and from each data set a nominally 95% CI was calculated. The percentage of times that π is captured inside these intervals is recorded and referred to as the empirical coverage probability or, simply, the coverage. For a 95% CI, we should expect the coverage to be 95%. However, this is not always the case especially for small sample sizes. Below we consider a full factorial simulation study using the following values: n = (1000, 2000), m = (5, 10), π = (0.005, 0.01, 0.05, 0.1), and ρ = (0.1, 0.2, 0.4, 0.8). For simplicity we let m i = m,i = 1,...,n for these simulations. These values were chosen to determine their impact on the coverage of the confidence intervals. Specifically, these values of π were chose to be representative of possible values for a BA, while the chosen values of ρ were chosen to represent a larger range than would be expected. Performance for both of the methods given in this paper is exemplary when 0.1 < π < 0.9. Because of the symmetry of binary estimation, it is sufficient to consider only values of π less than

9 Table 4 Goodness-of-fit test results for fingerprint FAR s from data found in [19], Threshold ˆπ p-value Goodness-of-Fit Tests Because it is easy to generate from a Beta-binomial, we would like to utilize this distribution for our simulations. To determine whether or not the Beta-binomial distribution is appropriate for generating data, we considered biometric decision data from [19]. To determine whether or not the Betabinomial distribution was appropriate we computed goodness-of-fit tests statistics and p-values which are discussed by a several authors, e.g. [21]. The idea of a goodness-of-fit test is that we fit the distribution to be tested and determine if the observed data are significantly different from this structure. Summaries based on the data are compared to summaries based on the null distributional form. In the case of the Beta-binomial distribution, we compare the expected counts if the data perfectly followed a Beta-binomial distribution to the observed counts. [21] gives an excellent introduction to these tests. Tables 3, 4 and 5 summarize the results of these tests for FAR s across the three modalities. Tables 6, 7 and 8 repeat that analysis for FRR s. Note that for hand match scores we accept below the given threshold, while for finger and face match scores we accept above the given threshold.for both of these tables small p-values indicate lack of fit and that the null hypothesis that the Beta-binomial distribution fits this data should be rejected. A more general goodness-of-fit test is given by [22] when the value of m i varies across comparison pairs. Looking at Tables 3 to 5 as well as Tables 6 to 8, we can readily see that the Beta-binomial fits both FAR and FRR quite well for all three modalities. 9

10 Table 5 Goodness-of-fit test results for facial recognition FAR s from data found in [19], Threshold ˆπ p-value < Table 6 Goodness-of-fit test results for FRR s from data found in [19], Hand FRR Threshold ˆπ p-value

11 Only two of the fifty-one thresholds considered resulted in a rejection of the Beta-binomial distribution as inappropriate. This is approximately what we would expect by chance alone using a significance level of 5%. For this analysis we reported on a subset of thresholds that produced FAR s and FRR s near or below 0.1. This choice was made because it is unlikely that a BA would be implemented with error rates above that cutoff. It is important to note that we are not arguing here that binary decision data, the X i s, from a biometric experiment will always follows a Beta-binomial distribution. Nor are we stating that Beta-binomial data is necessary for the use of these CI s. (As mentioned above, we are only specifying the first two moments of the distribution of the X i s instead of specifying a particular shape for the distribution.) Rather, what we conclude from the above results in Tables 3 through 8 is that the Betabinomial is a reasonable distribution for simulation of small sample decision data since it fit data from three different modalities well. Thus we generate X is from a Beta-binomial distribution to test the performance of the CI s methods specified above. 4.2 Simulation Results Before presenting the simulation results, it is necessary to summarize our goals. First, we want to determine for what combination of parameters, the methodology achieves coverage close to the nominal level, in this case, 95%. Second, because we are dealing with simulations, we should focus on overall trends rather than on specific outcomes. If we repeated these same simulations again, we would see slight changes in the coverages of individual scenarios but the overall trends should remain. Third, we would like to be able to categorize which parameter combinations give appropriate performance. We use the Monte Carlo approach because it is more complete than would be found in the evaluation of a real data set. See, e.g. [20] for a complete discussion. Evaluations from observed test data gives a less complete assessment of how well an estimation method performs since there is no way to know consider all the possible parameter combinations from such data Traditional confidence interval performance Using (5) we calculated coverage for each scenario. The results of this simulation can be found in Table 9. Several clear patterns emerge. Coverage increases as π increases, as ρ decreases, as n increases and as m increases. This is exactly as we would have expected. More observations should increase our ability to accurately estimate π. Similarly the assumption of approximate Normality will be most appropriate when π is moderate (far from zero and far from one) and when ρ is small. This CI performs well except when π < 0.01 and ρ

12 Table 7 Goodness-of-fit test results for fingerprint FRR s from data in [19], Threshold ˆπ p-value Table 8 Goodness-of-fit test results for facial recognition FRR s from data found in [19] Threshold ˆπ p-value

13 There is quite a range of coverages from a high of to a low of with a mean coverage of One way to think about ρ is that it governs how much independent - in a statistical sense - information can be found in the data. Higher values of ρ indicate that there is less independent information in the data. This performance is not surprising since binary data is difficult to assess when there is a high degree of correlation within a comparison. One reasonable rule of thumb is that the CI performs well when the effective sample size, n π 10 where n nm = (8) 1 + (m 0 1)ρ and is referred to as the effective sample size in the statistics literature [23] Logit confidence interval performance To assess how well this second interval estimates π we repeated the simulation using (6) to create our intervals. Output from these simulations is summarized in Table 10. Again coverage should be approximately 95% for a nominally 95% CI. Looking at the results found in Table 10, we note that there are very similar patterns to those found in the previous section. However, it is clear that the coverage here is generally higher than for the traditional interval. As before our interest is in overall trends. In general, coverage increases as π increases, as ρ decreases, as m increases and as n increases. Coverages range from a high of to a low of with a mean coverage of Only one of the coverages when n = 1000, m = 5, π = and ρ = 0.4 is of concern here. That value seems anomalous when compared to the coverage obtained when n = 1000, m = 5, π = and ρ = 0.8. Otherwise the CI based on a logit transformation performed extremely well. Overall, coverage for the logit CI is higher than for the traditional confidence interval. It performs well when n π 5. Thus, use of this CI is appropriate when the number of comparison pairs is roughly half what would be needed for the traditional CI. 5 Sample size calculations As discussed earlier and highlighted by [3], there is a pressing need for appropriate sample size calculations for testing of BA s. Here we present sample size calculations using the logit transformed interval since it gives better coverage and requires fewer observations for its usage than the traditional approach. (It is straightforward to solve (5) as we do below for (6) to achieve a specified margin of error.) Because of the way that matching performance for BA s is assessed, there are effectively two sample size for a biometric test: n and m. The calculations given below solve for n, the number of comparison pairs, conditional on knowledge of m, the number of decisions per comparison pair. 13

14 Table 9 Empirical Coverage Probabilities for Traditional Confidence Interval n = 1000, m = 5 π\ρ n = 1000, m = 10 π\ρ n = 2000, m = 5 π\ρ n = 2000, m = 10 π\ρ Each cell represents the coverage based on 1000 simulated data sets. 14

15 Table 10 Empirical Coverage Probabilities for Logit Confidence Interval n = 1000, m = 5 π\ρ n = 1000, m = 10 π\ρ n = 2000, m = 5 π\ρ n = 2000, m = 10 π\ρ Each cell represents 1000 simulated data sets. 15

16 Our sample size calculation require the specification of a priori estimates of π and ρ. This is typical of any sample size calculation. In the next section we discuss suggestions for selecting values of π and ρ as part of a sample size calculation. The asymmetry of the logit interval provides a challenge relative to the typical sample size calculation. Thus rather than specifying the margin of error as is typical, we will specify the desired upper bound for the CI, call it π max. Given the nature of BA s and their usage, it seems somewhat natural to specify the highest acceptable value for the range of the interval. We then set (6) equal to the logit(π m ax) and solve for n. Given (6), we can determine the appropriate sample size needed to estimate π with a certain level of confidence, 1 α, to be a specified upper bound, π max. Since it is not possible to simultaneously solve for m and n, we propose a conditional solution. First, specify appropriate values for π, ρ, π max, and 1 α. Second, fix m, the number of attempts per comparison. We assume for sample size calculations that m i = m for all i. (If significant variability in the m i s is anticipated then we recommend using a value of m that is slightly less than the anticipated average of the m i s.) Third solve for n, the number of comparisons to be tested. We then find n via the following equation, given the other quantities, n = ( z 1 α 2 logit(π max ) logit(π) ) (m 1)ρ mπ(1 π). (9) The above follows directly from (6). To illustrate this suppose we want to estimate π to an upper bound of π max = 0.01 with 99% confidence and we believe π to be and ρ to be 0.2. If we plan on testing each comparison pair 5 times we would need ( n = logit(0.01) logit(0.005) = = 985. ) 2 (1 + (5 1)0.2) 5(0.005)(0.995) (10) So we would need to test 985 comparison pairs 5 times each to achieve a 99% CI with an upper bound of If asymmetric cross-comparisons are to be used among multiple individuals, then one could replace n on the right hand side of (9) with n (n 1) and solve for n. In the example above, n = 32 would be the required number of individuals. In the case of symmetric cross comparisons would solve for n (n 1)/2 = 986 which yields n = 45 individuals assuming the conditions specified above. Table 11 contains additional values of n for given values of m. In addition this table contains mn, the total number of decisions that would be needed to achieve the specified upper bound for this CI. Clearly the relationship between mn and n is non-linear. This concurs with 16

17 Table 11 n necessary to create a 99% confidence interval with π = 0.005, π max = 0.01, ρ = 0.2 for various values of m. m n mn the observation of [2] when they discuss the non-stationarity of collecting biometric data. 6 Discussion The recent Biometric Research Agenda stated clearly that one of the fundamental needs for research on BA s was the development of statistical understanding of biometric systems sufficient to produce models useful for performance evaluation and prediction, [3, p. 3]. The methodologies discussed in this paper are a significant step toward that. This paper adds two significant tools for testers of biometric identification devices: well-understood CI methodology and a formula for determining the number of individuals to be tested. These are significant advances to core issues in the evaluation, assessment and development of biometric authentication devices. Below we discuss the properties of these methods and outline some future directions for research in this area. The models we have developed are based on the following widely applicable assumptions. First we assume that the moments of the X i s are given by (1). Second we assume that attempts made by each comparison are conditionally independent given the model parameters. We reiterate that an analysis of data found in [19] suggests that these are reasonable assumptions. For any BA, its matching performance is often critical to the overall performance of the system in which it is imbedded. In this paper we have presented two new methodologies for creating a CI for an error rate. The logit transformed CI, (6), had superior performance to the traditional CI. This methodology did well when n π > 5. Though this study presented results only for 95% CI s, it 17

18 is reasonable to assume performance will be similar for other confidence levels. Further, we have presented methodology for determining the number of attempts needed for making a CI. This is an immediate consequence of using a parametric CI. Because of the asymmetry of this CI, it is necessary specify the upper bound for the CI as well as specifying m, π and ρ. All sample size calculations carried out before data is collected require estimates of parameters. To choose estimates we suggest the following possibilities in order of importance. (1) Use estimates for π and ρ from previous studies collected under similar circumstances. (2) Conduct a pilot study with some small number of comparisons and a value of m that will likely be used in the full experiment. That will allow for reasonable estimates of π and ρ. (3) Make a reasonable estimate based on knowledge of the BA and the environment in which it will be tested. One strategy here is to overestimate π and ρ which will generally yield n larger than is needed. As outlined above, this now gives BA testers an important tool for determining the number of comparisons and the number of decisions per comparison pair necessary for assessing a single FAR or FRR. References [1] R. M. Bolle, N. K. Ratha, S. Pankanti, Error analysis of pattern recognition systems the subsets bootstrap, Computer Vision and Image Understanding 93 (2004) [2] T. Mansfield, J. L. Wayman, Best practices in testing and reporting performance of biometric devices, on the web at ast/biometrics/media/bestpractice.pdf (2002). [3] E. P. Rood, A. K. Jain, Biometric research agenda, Report of the NSF Workshop (2003). [4] W. Shen, M. Surette, R. Khanna, Evaluation of automated biometrics-based identification and verification systems, Proceedings of the IEEE 85 (9) (1997) [5] M. Golfarelli, D. Maio, D. Maltoni, On the error-reject trade-off in biometric verification systems, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (7) (1997) [6] I. Guyon, J. Makhoul, R. Schwartz, V. Vapnik, What size test set gives good error rate estimates, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1) (1998)

19 [7] M. E. Schuckers, Using the beta-binomial distribution to assess performance of a biometric identification device, International Journal of Image and Graphics 3 (3) (2003) [8] J. L. Wayman, Confidence interval and test size estimation for biometric data, in: Proceedings of IEEE AutoID 99, 1999, pp [9] R. J. Michaels, T. E. Boult, Efficient evaluation of classification and recognition systems, in: Proceedings of the International Conference on Computer Vision and Pattern Recognition, [10] G. R. Doddington, M. A. Przybocki, A. F. Martin, D. A. Reynolds, The NIST speaker recognition evaluation: overview methodology, systems, results, perspective, Speech Communication 31 (2-3) (2000) [11] D. F. Moore, Modeling the extraneous variance in the presence of extra-binomial variation, Applied Statistics 36 (1) (1987) [12] G. W. Snedecor, W. G. Cochran, Statistical Methods, 8 th Edition, Iowa State University Press, [13] W. G. Cochran, Sampling Techniques, 3rd Edition, John Wiley & Sons, New York, [14] J. L. Fleiss, B. Levin, M. C. Paik, Statistical Methods for Rates and Proportions, John Wiley & Sons, Inc., [15] D. F. Moore, Asymptotic properties of moment estimators for overdispersed counts and proportions, Biometrika 73 (3) (1986) [16] A. Agresti, Categorical Data Analysis, John Wiley & Sons, New York, [17] R. G. Newcombe, Logit confidence intervals and the inverse sinh transformation, The American Statistician 55 (3) (2001) [18] M. J. Schervish, Theory of Statistics, Springer-Verlag, New York, [19] A. Ross, A. K. Jain, Information fusion in biometrics, Pattern Recognition Letters 24 (13) (2003) [20] J. E. Gentle, Random Number Generation and Monte Carlo Methods, Springer- Verlag, [21] D. D. Wackerly, W. M. III, R. L. Scheaffer, Mathematical Statistics with Applications, 6th Edition, Duxbury, [22] S. T. Garren, R. L. Smith, W. W. Piegorsch, Bootstrap goodness-of-fit test for the beta-binomial model, Journal of Applied Statistics 28 (5) (2001) [23] L. Kish, Survey Sampling, John Wiley & Sons, New York,

A comparison of statistical methods for evaluating matching performance of a biometric identification device- a preliminary report

A comparison of statistical methods for evaluating matching performance of a biometric identification device- a preliminary report A comparison of statistical methods for evaluating matching performance of a biometric identification device- a preliminary report Michael E. Schuckers Department of Mathematics, Computer Science and Statistics

More information

The Expected Performance Curve: a New Assessment Measure for Person Authentication

The Expected Performance Curve: a New Assessment Measure for Person Authentication R E S E A R C H R E P O R T I D I A P The Expected Performance Curve: a New Assessment Measure for Person Authentication Samy Bengio 1 Johnny Mariéthoz 2 IDIAP RR 03-84 March 10, 2004 submitted for publication

More information

The Expected Performance Curve: a New Assessment Measure for Person Authentication

The Expected Performance Curve: a New Assessment Measure for Person Authentication The Expected Performance Curve: a New Assessment Measure for Person Authentication Samy Bengio Johnny Mariéthoz IDIAP CP 592, rue du Simplon4 192 Martigny, Switzerland {bengio,marietho}@idiap.ch Abstract

More information

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Bootstrapping Method for  14 June 2016 R. Russell Rhinehart. Bootstrapping Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

Curvewise DET confidence regions and pointwise EER confidence intervals using radial sweep methodology

Curvewise DET confidence regions and pointwise EER confidence intervals using radial sweep methodology Curvewise DET confidence regions and pointwise EER confidence intervals using radial sweep methodology Michael E. Schuckers 1, Yordan Minev 1 and Andy Adler 2 1 Department of Mathematics, Computer Science

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Kosuke Imai Princeton University Joint work with Graeme Blair October 29, 2010 Blair and Imai (Princeton) List Experiments NJIT (Mathematics) 1 / 26 Motivation

More information

Dealing with Categorical Data Types in a Designed Experiment

Dealing with Categorical Data Types in a Designed Experiment Dealing with Categorical Data Types in a Designed Experiment Part II: Sizing a Designed Experiment When Using a Binary Response Best Practice Authored by: Francisco Ortiz, PhD STAT T&E COE The goal of

More information

Use of Extreme Value Statistics in Modeling Biometric Systems

Use of Extreme Value Statistics in Modeling Biometric Systems Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision

More information

Fitting Fragility Functions to Structural Analysis Data Using Maximum Likelihood Estimation

Fitting Fragility Functions to Structural Analysis Data Using Maximum Likelihood Estimation Fitting Fragility Functions to Structural Analysis Data Using Maximum Likelihood Estimation 1. Introduction This appendix describes a statistical procedure for fitting fragility functions to structural

More information

Exploring Similarity Measures for Biometric Databases

Exploring Similarity Measures for Biometric Databases Exploring Similarity Measures for Biometric Databases Praveer Mansukhani, Venu Govindaraju Center for Unified Biometrics and Sensors (CUBS) University at Buffalo {pdm5, govind}@buffalo.edu Abstract. Currently

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

MULTI-FINGER PENETRATION RATE AND ROC VARIABILITY FOR AUTOMATIC FINGERPRINT IDENTIFICATION SYSTEMS

MULTI-FINGER PENETRATION RATE AND ROC VARIABILITY FOR AUTOMATIC FINGERPRINT IDENTIFICATION SYSTEMS MULTI-FINGER PENETRATION RATE AND ROC VARIABILITY FOR AUTOMATIC FINGERPRINT IDENTIFICATION SYSTEMS I. Introduction James L. Wayman, Director U.S. National Biometric Test Center College of Engineering San

More information

Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response

Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response Authored by: Francisco Ortiz, PhD Version 2: 19 July 2018 Revised 18 October 2018 The goal of the STAT COE is to assist in

More information

Bootstrap Confidence Interval of the Difference Between Two Process Capability Indices

Bootstrap Confidence Interval of the Difference Between Two Process Capability Indices Int J Adv Manuf Technol (2003) 21:249 256 Ownership and Copyright 2003 Springer-Verlag London Limited Bootstrap Confidence Interval of the Difference Between Two Process Capability Indices J.-P. Chen 1

More information

Strategies for Modeling Two Categorical Variables with Multiple Category Choices

Strategies for Modeling Two Categorical Variables with Multiple Category Choices 003 Joint Statistical Meetings - Section on Survey Research Methods Strategies for Modeling Two Categorical Variables with Multiple Category Choices Christopher R. Bilder Department of Statistics, University

More information

Online and Offline Fingerprint Template Update Using Minutiae: An Experimental Comparison

Online and Offline Fingerprint Template Update Using Minutiae: An Experimental Comparison Online and Offline Fingerprint Template Update Using Minutiae: An Experimental Comparison Biagio Freni, Gian Luca Marcialis, and Fabio Roli University of Cagliari Department of Electrical and Electronic

More information

1. Estimation equations for strip transect sampling, using notation consistent with that used to

1. Estimation equations for strip transect sampling, using notation consistent with that used to Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix,

More information

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Tool 1: Standards for Mathematical ent: Interpreting Functions CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Name of Reviewer School/District Date Name of Curriculum Materials:

More information

Two-Level Designs. Chapter 881. Introduction. Experimental Design. Experimental Design Definitions. Alias. Blocking

Two-Level Designs. Chapter 881. Introduction. Experimental Design. Experimental Design Definitions. Alias. Blocking Chapter 881 Introduction This program generates a 2 k factorial design for up to seven factors. It allows the design to be blocked and replicated. The design rows may be output in standard or random order.

More information

Statistical Methods for the Analysis of Repeated Measurements

Statistical Methods for the Analysis of Repeated Measurements Charles S. Davis Statistical Methods for the Analysis of Repeated Measurements With 20 Illustrations #j Springer Contents Preface List of Tables List of Figures v xv xxiii 1 Introduction 1 1.1 Repeated

More information

PASS Sample Size Software. Randomization Lists

PASS Sample Size Software. Randomization Lists Chapter 880 Introduction This module is used to create a randomization list for assigning subjects to one of up to eight groups or treatments. Six randomization algorithms are available. Four of the algorithms

More information

Peg-Free Hand Geometry Verification System

Peg-Free Hand Geometry Verification System Peg-Free Hand Geometry Verification System Pavan K Rudravaram Venu Govindaraju Center for Unified Biometrics and Sensors (CUBS), University at Buffalo,New York,USA. {pkr, govind} @cedar.buffalo.edu http://www.cubs.buffalo.edu

More information

Modelling and Quantitative Methods in Fisheries

Modelling and Quantitative Methods in Fisheries SUB Hamburg A/553843 Modelling and Quantitative Methods in Fisheries Second Edition Malcolm Haddon ( r oc) CRC Press \ y* J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of

More information

COPULA MODELS FOR BIG DATA USING DATA SHUFFLING

COPULA MODELS FOR BIG DATA USING DATA SHUFFLING COPULA MODELS FOR BIG DATA USING DATA SHUFFLING Krish Muralidhar, Rathindra Sarathy Department of Marketing & Supply Chain Management, Price College of Business, University of Oklahoma, Norman OK 73019

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

Annotated multitree output

Annotated multitree output Annotated multitree output A simplified version of the two high-threshold (2HT) model, applied to two experimental conditions, is used as an example to illustrate the output provided by multitree (version

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

Telephone Survey Response: Effects of Cell Phones in Landline Households

Telephone Survey Response: Effects of Cell Phones in Landline Households Telephone Survey Response: Effects of Cell Phones in Landline Households Dennis Lambries* ¹, Michael Link², Robert Oldendick 1 ¹University of South Carolina, ²Centers for Disease Control and Prevention

More information

Minimum sample size estimation in PLS-SEM: The inverse square root and gamma-exponential methods

Minimum sample size estimation in PLS-SEM: The inverse square root and gamma-exponential methods Minimum sample size estimation in PLS-SEM: The inverse square root and gamma-exponential methods Ned Kock Pierre Hadaya Full reference: Kock, N., & Hadaya, P. (2018). Minimum sample size estimation in

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

Online Signature Verification Technique

Online Signature Verification Technique Volume 3, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Online Signature Verification Technique Ankit Soni M Tech Student,

More information

DEFORMABLE MATCHING OF HAND SHAPES FOR USER VERIFICATION. Ani1 K. Jain and Nicolae Duta

DEFORMABLE MATCHING OF HAND SHAPES FOR USER VERIFICATION. Ani1 K. Jain and Nicolae Duta DEFORMABLE MATCHING OF HAND SHAPES FOR USER VERIFICATION Ani1 K. Jain and Nicolae Duta Department of Computer Science and Engineering Michigan State University, East Lansing, MI 48824-1026, USA E-mail:

More information

A HYBRID METHOD FOR SIMULATION FACTOR SCREENING. Hua Shen Hong Wan

A HYBRID METHOD FOR SIMULATION FACTOR SCREENING. Hua Shen Hong Wan Proceedings of the 2006 Winter Simulation Conference L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, eds. A HYBRID METHOD FOR SIMULATION FACTOR SCREENING Hua Shen Hong

More information

Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys

Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys Steven Pedlow 1, Kanru Xia 1, Michael Davern 1 1 NORC/University of Chicago, 55 E. Monroe Suite 2000, Chicago, IL 60603

More information

Further Simulation Results on Resampling Confidence Intervals for Empirical Variograms

Further Simulation Results on Resampling Confidence Intervals for Empirical Variograms University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2010 Further Simulation Results on Resampling Confidence

More information

Combining Biometric Scores in Identification Systems

Combining Biometric Scores in Identification Systems 1 Combining Biometric Scores in Identification Systems Sergey Tulyakov and Venu Govindaraju, Fellow, IEEE Both authors are with the State University of New York at Buffalo. 2 Abstract Combination approaches

More information

Section 4 Matching Estimator

Section 4 Matching Estimator Section 4 Matching Estimator Matching Estimators Key Idea: The matching method compares the outcomes of program participants with those of matched nonparticipants, where matches are chosen on the basis

More information

Pair-Wise Multiple Comparisons (Simulation)

Pair-Wise Multiple Comparisons (Simulation) Chapter 580 Pair-Wise Multiple Comparisons (Simulation) Introduction This procedure uses simulation analyze the power and significance level of three pair-wise multiple-comparison procedures: Tukey-Kramer,

More information

The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing

The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing UW Biostatistics Working Paper Series 9-6-2005 The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing John D. Storey University of Washington, jstorey@u.washington.edu Suggested

More information

Improving Personal Identification Accuracy Using Multisensor Fusion for Building Access Control Applications

Improving Personal Identification Accuracy Using Multisensor Fusion for Building Access Control Applications Improving Personal Identification Accuracy Using Multisensor Fusion for Building Access Control Applications Lisa Osadciw, Pramod Varshney, and Kalyan Veeramachaneni laosadci,varshney,kveerama@syr.edu

More information

SOME TYPES AND USES OF DATA MODELS

SOME TYPES AND USES OF DATA MODELS 3 SOME TYPES AND USES OF DATA MODELS CHAPTER OUTLINE 3.1 Different Types of Data Models 23 3.1.1 Physical Data Model 24 3.1.2 Logical Data Model 24 3.1.3 Conceptual Data Model 25 3.1.4 Canonical Data Model

More information

The partial Package. R topics documented: October 16, Version 0.1. Date Title partial package. Author Andrea Lehnert-Batar

The partial Package. R topics documented: October 16, Version 0.1. Date Title partial package. Author Andrea Lehnert-Batar The partial Package October 16, 2006 Version 0.1 Date 2006-09-21 Title partial package Author Andrea Lehnert-Batar Maintainer Andrea Lehnert-Batar Depends R (>= 2.0.1),e1071

More information

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 323-333 Bayesian Estimation for Skew Normal Distributions Using Data Augmentation Hea-Jung Kim 1) Abstract In this paper, we develop a MCMC

More information

BIOMET: A Multimodal Biometric Authentication System for Person Identification and Verification using Fingerprint and Face Recognition

BIOMET: A Multimodal Biometric Authentication System for Person Identification and Verification using Fingerprint and Face Recognition BIOMET: A Multimodal Biometric Authentication System for Person Identification and Verification using Fingerprint and Face Recognition Hiren D. Joshi Phd, Dept. of Computer Science Rollwala Computer Centre

More information

Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization

Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization 10 th World Congress on Structural and Multidisciplinary Optimization May 19-24, 2013, Orlando, Florida, USA Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization Sirisha Rangavajhala

More information

Using Support Vector Machines to Eliminate False Minutiae Matches during Fingerprint Verification

Using Support Vector Machines to Eliminate False Minutiae Matches during Fingerprint Verification Using Support Vector Machines to Eliminate False Minutiae Matches during Fingerprint Verification Abstract Praveer Mansukhani, Sergey Tulyakov, Venu Govindaraju Center for Unified Biometrics and Sensors

More information

3.6 Sample code: yrbs_data <- read.spss("yrbs07.sav",to.data.frame=true)

3.6 Sample code: yrbs_data <- read.spss(yrbs07.sav,to.data.frame=true) InJanuary2009,CDCproducedareportSoftwareforAnalyisofYRBSdata, describingtheuseofsas,sudaan,stata,spss,andepiinfoforanalyzingdatafrom theyouthriskbehaviorssurvey. ThisreportprovidesthesameinformationforRandthesurveypackage.Thetextof

More information

The Expected Performance Curve

The Expected Performance Curve R E S E A R C H R E P O R T I D I A P The Expected Performance Curve Samy Bengio 1 Johnny Mariéthoz 3 Mikaela Keller 2 IDIAP RR 03-85 July 4, 2005 published in International Conference on Machine Learning,

More information

Fusion in Multibiometric Identification Systems: What about the Missing Data?

Fusion in Multibiometric Identification Systems: What about the Missing Data? Fusion in Multibiometric Identification Systems: What about the Missing Data? Karthik Nandakumar 1, Anil K. Jain 2 and Arun Ross 3 1 Institute for Infocomm Research, A*STAR, Fusionopolis, Singapore, knandakumar@i2r.a-star.edu.sg

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are

More information

correlated to the Michigan High School Mathematics Content Expectations

correlated to the Michigan High School Mathematics Content Expectations correlated to the Michigan High School Mathematics Content Expectations McDougal Littell Algebra 1 Geometry Algebra 2 2007 correlated to the STRAND 1: QUANTITATIVE LITERACY AND LOGIC (L) STANDARD L1: REASONING

More information

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types

More information

Sketchable Histograms of Oriented Gradients for Object Detection

Sketchable Histograms of Oriented Gradients for Object Detection Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The

More information

Chapters 5-6: Statistical Inference Methods

Chapters 5-6: Statistical Inference Methods Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past

More information

Call for participation. FVC2004: Fingerprint Verification Competition 2004

Call for participation. FVC2004: Fingerprint Verification Competition 2004 Call for participation FVC2004: Fingerprint Verification Competition 2004 WEB SITE: http://bias.csr.unibo.it/fvc2004/ The Biometric System Lab (University of Bologna), the Pattern Recognition and Image

More information

Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation

Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation November 2010 Nelson Shaw njd50@uclive.ac.nz Department of Computer Science and Software Engineering University of Canterbury,

More information

Learning Objectives. Continuous Random Variables & The Normal Probability Distribution. Continuous Random Variable

Learning Objectives. Continuous Random Variables & The Normal Probability Distribution. Continuous Random Variable Learning Objectives Continuous Random Variables & The Normal Probability Distribution 1. Understand characteristics about continuous random variables and probability distributions 2. Understand the uniform

More information

Lab 5 - Risk Analysis, Robustness, and Power

Lab 5 - Risk Analysis, Robustness, and Power Type equation here.biology 458 Biometry Lab 5 - Risk Analysis, Robustness, and Power I. Risk Analysis The process of statistical hypothesis testing involves estimating the probability of making errors

More information

Notes on Simulations in SAS Studio

Notes on Simulations in SAS Studio Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write

More information

Error Analysis, Statistics and Graphing

Error Analysis, Statistics and Graphing Error Analysis, Statistics and Graphing This semester, most of labs we require us to calculate a numerical answer based on the data we obtain. A hard question to answer in most cases is how good is your

More information

Information Security Identification and authentication. Advanced User Authentication II

Information Security Identification and authentication. Advanced User Authentication II Information Security Identification and authentication Advanced User Authentication II 2016-01-29 Amund Hunstad Guest Lecturer, amund@foi.se Agenda for lecture I within this part of the course Background

More information

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016 Resampling Methods Levi Waldron, CUNY School of Public Health July 13, 2016 Outline and introduction Objectives: prediction or inference? Cross-validation Bootstrap Permutation Test Monte Carlo Simulation

More information

An Introduction to the Bootstrap

An Introduction to the Bootstrap An Introduction to the Bootstrap Bradley Efron Department of Statistics Stanford University and Robert J. Tibshirani Department of Preventative Medicine and Biostatistics and Department of Statistics,

More information

IDENTIFICATION AND ELIMINATION OF INTERIOR POINTS FOR THE MINIMUM ENCLOSING BALL PROBLEM

IDENTIFICATION AND ELIMINATION OF INTERIOR POINTS FOR THE MINIMUM ENCLOSING BALL PROBLEM IDENTIFICATION AND ELIMINATION OF INTERIOR POINTS FOR THE MINIMUM ENCLOSING BALL PROBLEM S. DAMLA AHIPAŞAOĞLU AND E. ALPER Yıldırım Abstract. Given A := {a 1,..., a m } R n, we consider the problem of

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Fusion of Hand Geometry and Palmprint Biometrics

Fusion of Hand Geometry and Palmprint Biometrics (Working Paper, Dec. 2003) Fusion of Hand Geometry and Palmprint Biometrics D.C.M. Wong, C. Poon and H.C. Shen * Department of Computer Science, Hong Kong University of Science and Technology, Clear Water

More information

Sample tasks from: Algebra Assessments Through the Common Core (Grades 6-12)

Sample tasks from: Algebra Assessments Through the Common Core (Grades 6-12) Sample tasks from: Algebra Assessments Through the Common Core (Grades 6-12) A resource from The Charles A Dana Center at The University of Texas at Austin 2011 About the Dana Center Assessments More than

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Multiple Comparisons of Treatments vs. a Control (Simulation)

Multiple Comparisons of Treatments vs. a Control (Simulation) Chapter 585 Multiple Comparisons of Treatments vs. a Control (Simulation) Introduction This procedure uses simulation to analyze the power and significance level of two multiple-comparison procedures that

More information

Local Correlation-based Fingerprint Matching

Local Correlation-based Fingerprint Matching Local Correlation-based Fingerprint Matching Karthik Nandakumar Department of Computer Science and Engineering Michigan State University, MI 48824, U.S.A. nandakum@cse.msu.edu Anil K. Jain Department of

More information

Analytic Performance Models for Bounded Queueing Systems

Analytic Performance Models for Bounded Queueing Systems Analytic Performance Models for Bounded Queueing Systems Praveen Krishnamurthy Roger D. Chamberlain Praveen Krishnamurthy and Roger D. Chamberlain, Analytic Performance Models for Bounded Queueing Systems,

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

SAS Macros CORR_P and TANGO: Interval Estimation for the Difference Between Correlated Proportions in Dependent Samples

SAS Macros CORR_P and TANGO: Interval Estimation for the Difference Between Correlated Proportions in Dependent Samples Paper SD-03 SAS Macros CORR_P and TANGO: Interval Estimation for the Difference Between Correlated Proportions in Dependent Samples Patricia Rodríguez de Gil, Jeanine Romano Thanh Pham, Diep Nguyen, Jeffrey

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

An Object Oriented Runtime Complexity Metric based on Iterative Decision Points

An Object Oriented Runtime Complexity Metric based on Iterative Decision Points An Object Oriented Runtime Complexity Metric based on Iterative Amr F. Desouky 1, Letha H. Etzkorn 2 1 Computer Science Department, University of Alabama in Huntsville, Huntsville, AL, USA 2 Computer Science

More information

Fitting. Instructor: Jason Corso (jjcorso)! web.eecs.umich.edu/~jjcorso/t/598f14!! EECS Fall 2014! Foundations of Computer Vision!

Fitting. Instructor: Jason Corso (jjcorso)! web.eecs.umich.edu/~jjcorso/t/598f14!! EECS Fall 2014! Foundations of Computer Vision! Fitting EECS 598-08 Fall 2014! Foundations of Computer Vision!! Instructor: Jason Corso (jjcorso)! web.eecs.umich.edu/~jjcorso/t/598f14!! Readings: FP 10; SZ 4.3, 5.1! Date: 10/8/14!! Materials on these

More information

Kernel-based online machine learning and support vector reduction

Kernel-based online machine learning and support vector reduction Kernel-based online machine learning and support vector reduction Sumeet Agarwal 1, V. Vijaya Saradhi 2 andharishkarnick 2 1- IBM India Research Lab, New Delhi, India. 2- Department of Computer Science

More information

Metaheuristic Optimization with Evolver, Genocop and OptQuest

Metaheuristic Optimization with Evolver, Genocop and OptQuest Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:

More information

A Random Number Based Method for Monte Carlo Integration

A Random Number Based Method for Monte Carlo Integration A Random Number Based Method for Monte Carlo Integration J Wang and G Harrell Department Math and CS, Valdosta State University, Valdosta, Georgia, USA Abstract - A new method is proposed for Monte Carlo

More information

NIST. Support Vector Machines. Applied to Face Recognition U56 QC 100 NO A OS S. P. Jonathon Phillips. Gaithersburg, MD 20899

NIST. Support Vector Machines. Applied to Face Recognition U56 QC 100 NO A OS S. P. Jonathon Phillips. Gaithersburg, MD 20899 ^ A 1 1 1 OS 5 1. 4 0 S Support Vector Machines Applied to Face Recognition P. Jonathon Phillips U.S. DEPARTMENT OF COMMERCE Technology Administration National Institute of Standards and Technology Information

More information

Randomized algorithms have several advantages over deterministic ones. We discuss them here:

Randomized algorithms have several advantages over deterministic ones. We discuss them here: CS787: Advanced Algorithms Lecture 6: Randomized Algorithms In this lecture we introduce randomized algorithms. We will begin by motivating the use of randomized algorithms through a few examples. Then

More information

6. Multimodal Biometrics

6. Multimodal Biometrics 6. Multimodal Biometrics Multimodal biometrics is based on combination of more than one type of biometric modalities or traits. The most compelling reason to combine different modalities is to improve

More information

Approach to Increase Accuracy of Multimodal Biometric System for Feature Level Fusion

Approach to Increase Accuracy of Multimodal Biometric System for Feature Level Fusion Approach to Increase Accuracy of Multimodal Biometric System for Feature Level Fusion Er. Munish Kumar, Er. Prabhjit Singh M-Tech(Scholar) Global Institute of Management and Emerging Technology Assistant

More information

Rotation Invariant Finger Vein Recognition *

Rotation Invariant Finger Vein Recognition * Rotation Invariant Finger Vein Recognition * Shaohua Pang, Yilong Yin **, Gongping Yang, and Yanan Li School of Computer Science and Technology, Shandong University, Jinan, China pangshaohua11271987@126.com,

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Fingerprint Recognition using Robust Local Features Madhuri and

More information

Eye Detection by Haar wavelets and cascaded Support Vector Machine

Eye Detection by Haar wavelets and cascaded Support Vector Machine Eye Detection by Haar wavelets and cascaded Support Vector Machine Vishal Agrawal B.Tech 4th Year Guide: Simant Dubey / Amitabha Mukherjee Dept of Computer Science and Engineering IIT Kanpur - 208 016

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Fingerprint Indexing using Minutiae and Pore Features

Fingerprint Indexing using Minutiae and Pore Features Fingerprint Indexing using Minutiae and Pore Features R. Singh 1, M. Vatsa 1, and A. Noore 2 1 IIIT Delhi, India, {rsingh, mayank}iiitd.ac.in 2 West Virginia University, Morgantown, USA, afzel.noore@mail.wvu.edu

More information

Houghton Mifflin MATHEMATICS Level 5 correlated to NCTM Standard

Houghton Mifflin MATHEMATICS Level 5 correlated to NCTM Standard s 2000 Number and Operations Standard Understand numbers, ways of representing numbers, relationships among numbers, and number systems understand the place-value structure of the TE: 4 5, 8 11, 14 17,

More information

A procedure for determining the characteristic value of a geotechnical parameter

A procedure for determining the characteristic value of a geotechnical parameter ISGSR 2011 - Vogt, Schuppener, Straub & Bräu (eds) - 2011 Bundesanstalt für Wasserbau ISBN 978-3-939230-01-4 A procedure for determining the characteristic value of a geotechnical parameter A. J. Bond

More information

Better than best: matching score based face registration

Better than best: matching score based face registration Better than best: based face registration Luuk Spreeuwers University of Twente Fac. EEMCS, Signals and Systems Group Hogekamp Building, 7522 NB Enschede The Netherlands l.j.spreeuwers@ewi.utwente.nl Bas

More information

Nonparametric Estimation of Distribution Function using Bezier Curve

Nonparametric Estimation of Distribution Function using Bezier Curve Communications for Statistical Applications and Methods 2014, Vol. 21, No. 1, 105 114 DOI: http://dx.doi.org/10.5351/csam.2014.21.1.105 ISSN 2287-7843 Nonparametric Estimation of Distribution Function

More information

STAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015

STAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015 STAT 113: Lab 9 Colin Reimer Dawson Last revised November 10, 2015 We will do some of the following together. The exercises with a (*) should be done and turned in as part of HW9. Before we start, let

More information

Parallel line analysis and relative potency in SoftMax Pro 7 Software

Parallel line analysis and relative potency in SoftMax Pro 7 Software APPLICATION NOTE Parallel line analysis and relative potency in SoftMax Pro 7 Software Introduction Biological assays are frequently analyzed with the help of parallel line analysis (PLA). PLA is commonly

More information

VOID FILL ACCURACY MEASUREMENT AND PREDICTION USING LINEAR REGRESSION VOID FILLING METHOD

VOID FILL ACCURACY MEASUREMENT AND PREDICTION USING LINEAR REGRESSION VOID FILLING METHOD VOID FILL ACCURACY MEASUREMENT AND PREDICTION USING LINEAR REGRESSION J. Harlan Yates, Mark Rahmes, Patrick Kelley, Jay Hackett Harris Corporation Government Communications Systems Division Melbourne,

More information

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA Chapter 1 : BioMath: Transformation of Graphs Use the results in part (a) to identify the vertex of the parabola. c. Find a vertical line on your graph paper so that when you fold the paper, the left portion

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model

More information

Client Dependent GMM-SVM Models for Speaker Verification

Client Dependent GMM-SVM Models for Speaker Verification Client Dependent GMM-SVM Models for Speaker Verification Quan Le, Samy Bengio IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland {quan,bengio}@idiap.ch Abstract. Generative Gaussian Mixture Models (GMMs)

More information