Technical White Paper Behaviometrics Measuring FAR/FRR/EER in Continuous Authentication Draft version 12/22/2009
Table of Contents Background... 1 Calculating FAR/FRR/EER in the field of Biometrics... 1 Calculating FAR/FRR/EER on Behavio... 3 Training the behavioral profiles... 3 Introducing level of Trust... 3 1.1.1 How trust can be applied... 4 1.1.2 A practical example of trust... 4 Processing the collected behaviors... 5 1.1.3 Making Behavio utilize the collected behaviors... 5 Calculating the FAR/FRR/EER ratios... 6 Delimitations... 6 Results... 6
Background The word Behaviometrics derives from the terms behavioral and biometrics. Behavioral refers to the way a human person behaves and biometrics, in an information security context, refers to technologies and methods that measure and analyzes biological characteristics of the human body for authentication purposes; for example fingerprints eye retina and voice patterns. In other words Behaviometrics, or behavioral biometrics, is a measurable behavior used to recognize or verify the identity of a person. Behaviometrics focus on behavioral patterns rather than physical attributes. After a user is verified with traditional security techniques, such as passwords, Behaviometrics can enhance the protection even after the user has logged in. It can continuously monitor the user during the whole working session to create an ongoing authentication process. The purpose of this paper is to present a methodology for calculating the performance of a continuous behavioral authentication system. Standard procedures used for biometrics are not sufficient since they are not developed to constantly authenticate users. Hence a new methodology is needed. A biometric authentication system can check if a user is accepted into a system. If a user is accepted that should not be, it is called a false accept. If a user that should be accepted is not, it is called a false reject. The ratio between users that falsely attempts to enter and users falsely accepted is called false accept rate (FAR). While the ratio between correct users being accepted and rejected is called false reject rate (FRR). A behavioral continuous authentication system uses a set of behavioral traits to calculate a similarity ratio between the current user s behavior and the expected. The similarity can be combined with a threshold so that if the similarity drops below the set threshold the user will be detected. It is because the similarity is gathered over time and the dependency for a threshold to accept or reject a user that old methods are not sufficient. Calculating FAR/FRR/EER in the field of Biometrics Biometrical systems generally separate impostors from a correct user by matching a score against a threshold. The score is how similar a sample and a template is; the higher score the more similar they are. The threshold is a line that says that all scores above this line is considered to be the correct user while all scores that are below the threshold is considered to be an impostor. 1
100 80 Classifying Samples by using a Threshold Score 60 40 20 0 Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 The performance of a biometrical system is usually measured in terms of false accept rate (FAR), false reject rate (FRR) and equal error rate (EER). The false accept rate is the percentage of invalid inputs that are incorrectly accepted (match between input and a non-matching template). The false reject rate is the percentage of valid inputs that are incorrectly rejected (fails to detect a match between input and matching template). The equal error rate indicates the accuracy of the system. The false accept rate and false reject rate intersect at a certain point which is called the equal error rate (the point in which the FAR and FRR have the same value). Accept / Reject Ratio (%) 100 90 80 70 60 50 40 30 20 10 0 ROC Curve FAR FRR EER 0 10 20 30 40 50 60 70 80 90 100 Threshold level In theory, the correct users should always score higher than the impostors. A single threshold could then be used to separate the correct user from the impostors. In general, the matching algorithm performs a decision based on a threshold which determines how close to a template the input needs to be for it to be considered a match. If the threshold is reduced, there will be less false non-matches but more false accepts. Correspondingly, a higher threshold will reduce the false accept rating but increase the false reject rating. In some cases impostor patterns generate scores that are higher than the patterns from the user. For that reason that however the threshold is chosen, some classification errors occur. 2
Depending on the threshold, a range between all and none of the impostor patterns are falsely accepted by the system. The choice of threshold value is a problem if the scoring distribution of the correct user and impostor overlap. When comparing biometric systems some of the systems just specify a FAR value. A single FAR without the corresponding FRR is not sufficient since it is possible that the system with the lowest FAR has a high FRR. In some cases where the threshold is adjustable there is not a reasonable way to decide which of them that are performing better by just looking at the FAR and FRR values. To get a threshold independent performance measurement the EER can be used. The lower the EER, the more accurate the system is considered to be. Calculating FAR/FRR/EER on Behavio To calculate the FAR/FRR/EER ratings for Behavio, our desktop behaviometric security solution a test group of 40 users where selected. They have used Behavio in a real world environment for 3 months. A behavioral monitor was installed on each subject s computer which collected behavioral data regarding specific applications used and associated keyboard and mouse events. Training the behavioral profiles In the beginning the profile will be empty and Behavio has to learn the behavior of the user. At an early stage it is difficult to differentiate between persons so initially Behavio assumes that it is the correct user handling the computer. In order to handle the evolution of the user s behavior the system has to tolerate small shifts and gradually make the necessary changes in the profile. The amount of training of the behavioral profiles is measured in insertions. An insertion is a keystroke event of some sort. Each time the profile is updated with key press statistics or a key flight statistics the amount of insertions will increase by one. A typical keystroke (moving between two letters) triggers 5 events so it would increase the number of insertions by 5. However, in the real world this is not true. Factors such as the quarantine, pauses and immeasurable samples have to be taken into account. The amount of time it takes to train the profiles varies depending on how fast the user is typing and how much the user is using the computer. Our studies show that a typical user has an average typing speed between 200 and 600 milliseconds between each keystroke. has to type about 0.5 keystrokes to get 1 insertion. This means that an average user has to type about 10 000 keystrokes to achieve 20 000 insertions. For a fast typist it would require roughly 30 minutes and for a slow typist about 100 minutes of active writing. Introducing level of Trust Trust can be measured as a percentage. At 100% the system fully trusts that it is the correct user and opposite if the trust level reaches 0%the system triggers detection. 3
Figure 1: The concept of trust The similarity score is mapped against a trust model by using a threshold. If the user is above the threshold the trust will increase, and if the user is below the threshold the trust will decrease. Staying above the threshold improves your trust level to 100%; the higher you are above the threshold the quicker the trust reaches its maximum. Some users will be faster to detect than other because of two things: 1) How much the behavior differs between the active user and the template If the difference is large enough the detection will be faster than if the difference is smaller. 2) How much trust the previous user has achieved If the previous user has worked itself up to be fully trusted, it will take longer time for the incorrect user to reach the not trusted level. Abnormal actions can also trigger a decrease in trust such as using key combinations that have never been used before or repeatedly providing immeasurable samples (compare to pushing your forehead against a finger print scanner). 1.1.1 How trust can be applied These numbers are just examples to demonstrate how trust could be calculated: A user passed a test, that has to increase the trust (i.e. is over the threshold) +1% (can be tied to level of success) A user failed a test (i.e. is below the threshold) -1%(can be tied to level of breach) A user hit a key that has never been used before, is it really the same user? -1% A user triggers the immeasurable sample alert -2% 1.1.2 A practical example of trust Event Change in trust Trust Passed test (10 times) +10% 10% Failed test -1% 9% Key that has not been used -1% 8% Passed a test +1% 9% Triggers invalid sample (3 times) -6% 3% Failed test (3 times) -3% 0% (Detection!) Table 1: A practical example of trust +++ ++ + - -- Similarity Confidence Ratio Threshold + Increase of trust - Decrease of trust 4
Processing the collected behaviors Normally Behavio installs hooks into the system to retrieve keyboard and mouse data. From there the retrieved data goes through the comparison engine which calculates the probability that the sample belong to the profile. The calculated statistics are then handed over to the validator which will determine whether it is the correct user or not. System Behavioral Profile Monitor Comparator Validator Figure 2: Logical presentation of system flow 1.1.3 Making Behavio utilize the collected behaviors To have Behavio evaluate the previously collected data, the monitor is replaced by a database reader which will read the data that we gathered from the test group. The comparator then performs the calculations in a normal manner but instead of transferring it to the validator it will store the statistics so that it can be analyzed in a FAR/FRR tool. Behavioral Data Behavioral Profile Data Reader Comparator Validator Statistics Figure 3: Logical presentation of system flow after modifications BehavioSec FAR/FRR tool 5
Calculating the FAR/FRR/EER ratios The FAR/FRR tool calculates the ratios by measuring the amount of time that the level of trust has been over and below the threshold. There is no upper or lower cap in how much trust a user can gain or lose. Delimitations For this paper the trial is going to be limited to keyboard input. Only users with completeness level of 20 000 insertions will have a behavioral profile generated and compared against. Only measurable samples are going to be used. Results The results of the trial show that there is a significant difference in behavior between users. During the trial Behavio managed to achieve an EER of 3.05% for the users in the test group. With a false reject rate under 1% only 4.3% of the input from other users would be falsely accepted and at a false accept rate below 5%, 0 % of the input from the correct user is incorrectly rejected. 100 80 Ratio (%) 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 Threshold 6