Keystroke Biometric System

We have a computer program that captures key and mouse data. The keystroke data captured by the program consists of the ID and duration of each key pressed, and the transition time from one key press to the next. The mouse information is similar - left or right mouse button pressed, and the mouse button depression duration, etc. This program will be extensively revised for appropriate experimental use. Initially, we must ensure that the system can capture accurate timing information.

We are exploring keystroke biometric applications, and their corresponding experimental procedures and system designs. General information on biometric systems and performance evaluation can be found in textbooks and other references. For these studies we will use the performance methodology and notation from the text, Guide to Biometrics, by Bolle, et al. (Springer 2004). These studies are in the area of pattern recognition and for simplicity we will use the Nearest Neighbor classification method (see, for example, the textbook Pattern Classification by Duda, Hart, and Stork (Wiley 2001) and the website Nearest Neighbor Classifier) with the Euclidean distance metric.

Keystroke Biometric Applications

The keystroke biometric has several possible applications. One application is for hardening password entry by adding a keystroke authentication process (accept/reject response) as a second stage following password matching before allowing user entry. Thus, if the password is not entered in the normal keystroke pattern, the system could ask the user to reenter it. For example, a user on a particular occasion might be drinking a cup of coffee and be entering the password uncharacteristically with one hand. The system, then, could reject the password, sending the user a message like, "Please reenter your password in your normal manner," and after, say, three tries, possibly rejecting the user entirely. The user upon receiving the reject message would likely put down the coffee cup and enter the password in his/her normal fashion in order to be accepted.

You task for this application is to design and then run an experiment to test this application. The design and testing steps are as follows:

A second application is to identify an individual from his/her keystroke pattern on a sample of text input and editing. Suppose, for example, there has been a problem with the circulation of offensive emails from easily accessible desktops in a work environment. The security department wants to reduce this problem by collecting keystroke biometric data from all employees and developing a keystroke biometric identification system (one-of-n response). The design and testing steps for this application are as follows:

Sample training and testing texts follow: Editing instructions for these texts can be created by specifying a reasonable number of replace, insert, and delete operations.

Methodology for Developing the Keystroke Biometric System

The system will be modularized based on the key stages of pattern recognition systems, and we will develop the desired program modules sequentially. Such modularization in systems development is standard practice that provides a clear separation of processing functionality for ease of use, ease of maintainability, and ease of extensibility.

Our keystroke biometric system will contain three modules: data capture, feature extraction, and pattern classification. The data capture program will first be developed and finalized (frozen). The output of this program will be a raw data file of the keystroke timing data, and we must ensure that the system captures accurate timing information. Once this program is finalized, experimental data can be collected while the other modules are being developed for particular applications.

The second module of the system will extract appropriate features and output a file containing, for each record, the Subject ID and the feature vector. Although a preliminary description of the features measurements for the second application are presented above, these measurements need to be finalized. Two preprocessing steps are performed on the features measurements, outlier removal and feature standardization. Outlier removal consists of removing any measurement that is more that two standard deviations from the mean, as obtained from the training data. We standardize our measurements using the ranging method, where raw measurement x is converted to x as follows:
x = (x - xmin) / (xmax - xmin)
using the min and max of the measurement over the samples in the training data. This provides measurement values in the range 0-1 to give each measurement roughly equal weight. The feature-vector output files will allow us to explore and compare various pattern classifiers on the same feature data.

The third module of the system will perform the pattern classification task. Initially, we will use the nearest neighbor classifier with the Euclidean distance metric. Finally, we might decide to create a fourth module to perform a statistical analysis of the results, and thus separate the analysis of the results from the obtaining of the classification results.