Biometric Authentication System

Background

For general background information see Overview of Biometric Projects.

The dichotomy model is a powerful, but little-explored, technique for biometric authentication (verification). A comparison of this technique to other authentication techniques could produce an outstanding dissertation in the area of biometrics, especially if it is supported by comparative experiments which could be performed on our extensive biometric (especially keystroke) databases. The dichotomy model was used in Dr. Cha's dissertation, see key paper for this model, and in an on-line fingerprint verification study. Also, see the subset of dichotomy slides from a conference paper.

Last semester a generic dichotomy-model authentication system was created. It accepts feature-vector data in the specified Feature Data Format and converts it into a two-class dichotomy-model authentication data file. The two authentication classes are the within-class (same person) and the between-class (different people) categories. The conversion is performed by taking all possible difference vectors, or by limiting the number of within-class/between-class samples (say, to 500/500 or 1000/1000).

In general, if n people provide m biometric samples each, there are m*(m-1)*n/2 within-class pairs and m*m*n*(n-1)/2 between-class pairs (see the key paper reference above). The number of between-class pairs usually far exceeds the number of within-class pairs. Sometimes both the number of within-class and between-class pairs can be large (possibly in the millions), and then the training and test samples can be generated at random and limited, rather than fully elaborated. For each pair, a difference vector is computed by taking the absolute difference between each vector component. Because our biometric features are in the range 0-1, the difference vector features will also be in the range 0-1.

After the dichotomy-model conversion, authentication system performance results can be obtained by using the available nearest-neighbor program to obtain accuracy results on the data (actually, these programs might be combined). This technique simply computes the Euclidean distance of each testing sample to all the training samples, and assigns the test sample to the class of the nearest training sample. A textbook (Guide to Biometrics, by Bolle, et al., Springer 2004, ISBN 0387400893) will be provided to the team (book must be returned at the end of the semester) that describes the performance statistics, namely False Accept Rate (FAR) and False Reject Rate (FRR), that should be obtained on the various biometric data sets.

Project

We will continue to test the system developed last semester, see Biometric Authentication System Technical Paper (fall 2007) and associated slides. Because all the programs from last semester should be operational, minimal programming should be required for this project. You will receive data in the specified format from one or more of the biometric teams: mouse movement, stylometry, and keystroke teams. Use the available programs to prepare sets of inter and intra-class data for training and testing and to obtain performance results. The testing sets must be independent (different) from the training sets, and these data sets can be the output of the conversion program.

Your first task is to understand the existing system by reading last semester's technical paper and communicating with your subject matter expert (team leader from last semester). Then rerun some of the experiments to ensure that you understand the system.

End-of-February Checkpoint.
Find the answers to the following questions:

  1. Your system operates on feature vector data where each feature value is in the range 0-1. Because these are decimal values with 8-10 places, we might need double precision to obtain the best accuracy. Is double precision currently being used (check the code)?
  2. Also, determine how many decimal places were used in the three types of data from last semester (mouse movement, stylometry, and keystroke).
  3. Because a full generation of inter-class samples can be rather large (many thousands) the system allows for the generation of a subset of random samples (we used 500 last semester). Find out whether it is possible to generate different random sets of, say, 500 samples (e.g., there might be a seed used to start the random number generator). This would allow us to run, for example, 10 different sets of 500 samples and average the results.

Midterm Checkpoint (our second classroom meeting).
By this checkpoint you should understand the existing system and have rerun some of the experiments from last semester to ensure that you understand the system. You might also correct any problems with the code as determined from the end-of-February checkpoint.