Keystroke Biometric Authentication System

Background

For general background information, see the references below: "Overview of Biometric Projects" and "Feature Data Format."

We have been exploring keystroke biometric applications. Keystroke biometric systems measure typing characteristics believed to be unique to an individual and difficult to duplicate. There are two commercial products currently used for hardening passwords (short input) in existing computer security schemes. The keystroke biometric is one of the less-studied biometrics; researchers tend to collect their own data and no known studies have compared identification techniques on a common database. Nevertheless, the published literature is optimistic about the potential of keystroke dynamics to benefit computer system security and usability.

The keystroke biometric has several possible applications. One application is an authentication process (binary accept/reject response, yes you are the person you claim to be or no you are not). For example, password entry could be "hardened" by adding as a keystroke authentication process as a second stage following password matching before allowing user entry. Thus, if the password is not entered in the normal keystroke pattern, the system could ask the user to reenter it. For example, a user on a particular occasion might be drinking a cup of coffee and be entering the password uncharacteristically with one hand. The system, then, could reject the password, sending the user a message like, "Please reenter your password in your normal manner," and after, say, three tries, possibly rejecting the user entirely. The user upon receiving the message would likely put down the coffee cup and enter the password in his/her normal fashion in order to be accepted. Another use of such an authentication process is to authenticate students taking online tests by their keystroke patterns.

A second application is to identify an individual from his/her keystroke pattern (one-of-n response). Suppose, for example, there has been a problem with the circulation of offensive emails from easily accessible desktops in a work environment. The security department wants to reduce this problem by collecting keystroke biometric data from all employees and developing a keystroke biometric identification system.

Over the last five years, we have developed in CSIS at Pace University keystroke biometric systems for identification (one-of-n response) and for authentication (accept/reject response). We have presented experimental results at several internal conferences, at three external conferences, and have recently had a book chapter accepted for publication. The next paragraph contains the abstract of the book chapter; for the full paper and related slides see Keystroke Book Chapter and Slides.

ABSTRACT: A novel keystroke biometric system for long-text input was developed and evaluated for user identification and authentication applications. The system consists of a Java applet to collect raw keystroke data over the internet, a feature extractor, and pattern classifiers to make identification or authentication decisions. Experiments on over 100 subjects investigated two input modes - copy and free-text input - and two keyboard types - desktop and laptop keyboards. The system can accurately identify or authenticate individuals if the same type of keyboard is used to produce the enrollment and questioned input samples. Longitudinal experiments quantified performance degradation over intervals of several weeks and over an interval of two years. Additional experiments investigated the system's hierarchical model, parameter settings, assumptions, and sufficiency of enrollment samples and input-text length. Although evaluated on input texts up to 650 keystrokes, we found that input of 300 keystrokes, roughly four lines of text, is sufficient for the important applications described.

Project

Code modification and experiments (customers Robert Zack and Dr. Tappert)

Last semester's team made good progress in obtaining some interesting keystroke authentication experimental results, see the link below to their technical paper. However, their program uses only the first choice information as to whether the best match is within-class or between-class, and then simply counts the result for each test sample in the file to obtain on overall result.

This semester's project will involve changing the code somewhat and running experiments. This semester the classifier will output a results file that contains, for each test difference-vector, the match result information shown in the following table.
Test Sample ID (User-Sample# / User-Sample#), e.g. 4-1/4-2 (within)
Choice Within/Between Sample ID Matched Match Distance
1 W (strong) 4-4/4-5 13
2 B 5-4/6-2 24
3 B 7-2/8-1 25
4 W (weak) 7-4/7-5 27
5 B 5-8/6-9 29
... ... ... ...
A sample ID consists of a user ID followed by a sample ID -- for example, 4-1 indicates user 4 and sample 1 from that user (remember that a user usually types several input samples). A vector-difference sample consists of the ID's of the two samples differenced -- for example, test sample 4-1/4-2 indicates user 4 sample 1 differenced with user 4 sample 2 (the slash is used to separate the two sample ID's).

Another program (not complicated) will be implemented take the results file as input and output the overall result. This will allow us to simulate various scoring procedures. For example, to simulate the current nearest-neighbor procedure, we would simple count the first-choice results (W, which is correct, in the example table above). To simulate the 3-nearest-neighbor procedure, we would take the majority of the first 3 choices (B, which is incorrect, in the example table above). To simulate the 5-nearest-neighbor procedure, we would take the majority of the first 5 choices (B, which is incorrect, in the example table above). To simulate a cross-country-like procedure with the top two runners from a team counting, add the ranks of the first two W's and the first two B's, resulting in a tie (1+4=5 versus 2+3=5, low score wins). Instead of adding the ranks, add the distances of the first two W's and the first two B's, which results in W (40 for W versus 49 for B, low score wins, which is correct).

We will also further develop a new concept -- that of strong versus weak enrollment. Strong enrollment is where enrollment (training) samples have been taken from the user being authenticated. Weak enrollment is where enrollment samples have not been taken from the user being authenticated, and the system experiments so far have been with weak enrollment.

Test-taker portion of project (customers Dr. Mary Villani and Dr. Tappert)

Here we continue to develop a test-taker authentication application designed to verify (authenticate) the identity of students taking online quizzes or tests, an application becoming more important with the student population of online classes increasing and instructors becoming concerned about evaluation security and academic integrity.

Your primary tasks on this aspect of the project are as follows:

References

Fast Agile XP Deliverables

We will use the agile methodology, particularly Extreme Programming (XP) which involves small releases and fast turnarounds in roughly two-week iterations. Many of these deliverables can be done in parallel by different members or subsets of the team. The following is the current list of deliverables (ordered by the date initiated, deliverable modifications marked in red, deliverable date marked in bold red if programming involved, completion date and related comments marked in green, pseudo-code marked in blue):
  1. 2/11 - 3/28. Rerun the authentication experiments described in the book chapter Table 4 only. Completed: obtained better results than the book chapter using 230 (not 239) linguistic-model features (trained on first 18 and tested on second 18 subjects), and determined that the book chapter results were obtained using the 254 touch-type-model features.
  2. 2/11 - Ongoing. Rewrite the user manual for just authentication, basically cut-and-paste the appropriate parts of the previous manual, and possibly improve it for clarity (by updating the user manual at regular intervals it will be in good shape by the end of the semester). Two manuals are being written: a user manual for those taking an online test and a technical manual for those running experiments. This is ongoing documentation to be completed by the end of the semester.
  3. 2/12 - 3/28. Modify the classifier program to produce an output file of the top n Within/Between choices and distances (the current program simply outputs the overall results). Through this incremental development we will eventually get to the format of the table above. This increment will produce the following choice table (the more detailed table requires changes to the feature extractor program).
    Test Sample ID, Within/Between
    ChoiceWithin/BetweenMatch Distance
    1W13
    2B24
    3B25
    4W27
    5B29
    .........
    nB129
    For each of the four experiments of deliverable 1, obtain an output file that consists of a sequence of choice tables, one for each of the test samples -- for example, for DeskCopy you would have 180 + 3825 = 4005 choice tables.

    4/7 The following is a small example BAS output file with n=3 and 4 test samples (two W and two B):
    1, W
    1W13
    2B24
    3B25
    2, W
    1W12
    2B27
    3W28
    3, B
    1W13
    2B14
    3B15
    4, B
    1B11
    2W24
    3B35
    The actual BAS output file for DeskCopy should contain 4005 test samples, and please use n=10.
    Test with deliverable 1 results: performance = #correct 1st choice/#test samples (obtain as file is written).
    Deliverable 4 output on this sample file should be:
    1-nearest neighbor results: FAR=1/2, FRR=0/2, Performance=3/4, Test samples=4.
    3-nearest neighbor results: FAR=0/2, FRR=1/2, Performance=3/4, Test samples=4.

  4. 2/13 - Create a simple program to process the classifier output file (deliverable 3) and obtain the FRR, FAR, and Performance (overall accuracy) of first choices (1-nearest neighbor, also simply called nearest neighbor), of the majority of the top three choices (3-nearest neighbor), and of the majority of the top five choices (5-nearest neighbor). Obtain FRR, FAR, and Performance for each of the four experiments of deliverable 1 using the 1-nearest neighbor, 3-nearest neighbor, 5-nearest neighbor procedures.
    The following pseudo-code implements the k-nearest neighbor algorithm:
    input value for constant k  /k is the k-nearest-neighbor value used in the majority
    input value for constant n (default 10)  /n is the number of choices
    within_total = 0            /initialize variables
    within_correct = 0
    within_wrong = 0
    between_total = 0
    between_correct = 0
    between_wrong = 0
    while not done              /read and process each choice table in the file of choice tables
    {
       test = read(class)       /class = W or B for the test sample
       choice(1) = read(class)  /class = W or B for the 1st choice
       choice(2) = read(class)  /class = W or B for the 2nd choice
       choice(3) = read(class)  /class = W or B for the 3rd choice
       for i = 4 to n           /n = number of choices (usually 10)
          read and discard remaining choices
       if test = W then
       {
          within_total = within_total + 1
          if test = majority(choice(1),...,choice(k)) then within_correct = within_correct + 1
          else within_wrong = within_wrong + 1
       }
       else                     /test = B
       {
          between_total = between_total + 1
          if test = majority(choice(1),...,choice(k)) then between_correct = between_correct + 1
          else between_wrong = between_wrong + 1
       }
    }                           /end of while loop
    FAR = between_wrong/between_total
    FRR = within_wrong/within_total
    performance = (within_correct + between_correct) / (within_total + between_total)
    print (FAR, FRR, performance, within_total+between_total)
           /within_total+between_total is a check on the number of choice tables in the file
    
    Note that the 1-nearest neighbor procedure should duplicate the results of the four experiments of deliverable 1. Are the 3-nearest neighbor and 5-nearest neighbor procedures an improvement over the 1-nearest neighbor procedure?
  5. 2/13 - Create a simple program to process the classifier output file and obtain an ROC curve as follows, and obtain the ROC curves for the four experiments of deliverable 1. This pseudo-code uses weighted choices to output (FAR,FRR) pairs to create the ROC curve:
    for i = 0 to 55                /the sum of the weighted choices has a range of 0-55 (unweighted is 0-10)
    {
       within_total = 0            /initialize variables
       within_correct = 0
       within_wrong = 0
       between_total = 0
       between_correct = 0
       between_wrong = 0
       score = 0                   /initialize score, the sum of the weighted choices
       while not done              /read and process each choice table in the file of choice tables
       {
          test = read(class)       /class = W or B for the test sample
          for j = 1 to 10          /compute score
          {
             choice(j) = read(class)  /read this choice value
             if choice(j) = W then score = score + (10-j+1)  /unweighted is: score = score + 1
          }
          if test = W then
          {
             within_total = within_total + 1
             if score >= i then within_correct = within_correct + 1
             else within_wrong = within_wrong + 1
          }
          else                     /test = B
          {
             between_total = between_total + 1
             if score < i then between_correct = between_correct + 1
             else between_wrong = between_wrong + 1
          }                        
       }                           /end of while loop
       FAR = between_wrong/between_total
       FRR = within_wrong/within_total
       print (i, FAR, FRR)
    }                              /end of for loop 
           /the list of (FAR,FRR) pairs will generate the ROC curve (e.g., input into Excel)    
    
  6. 2/16 - Run two small-training, strong-enrollment authentication experiments:
  7. 2/16 - Run big-training, strong-enrollment authentication experiments. Make three runs of each experiment, limiting to 5000, 10000, and 20000 inter-class samples:
  8. 2/17 - Write detailed descriptions of the data formats and include samples (this should be part of the Technical Manual)
  9. 4/1 - Describe/investigate why 230 (rather than 239) linguistic-model features were used for deliverable 1 (requires examining the code)
  10. 4/1 - This deliverable is crucial for getting a working system to authenticate online test takers. After the feature extraction program extracts features, it calculates the min and max of each feature measurement over all the data in the input file to standardize the measurements in the range 0-1. While this works fine for a large file (several samples from many users), it is not appropriate for a single sample from one user (or even for a small file of several samples). Therefore, we must output an intermediate file of the feature measurement min and max values from a large input sample file, and then read in and use the values from this file when authenticating individual users. Perform the following modifications to the feature extraction program: Check the BAS program to ensure it checks the train and test input files appropriately: