Keystroke Biometric System


We have been exploring keystroke biometric applications. Keystroke biometric systems measure typing characteristics believed to be unique to an individual and difficult to duplicate. There is a commercial product, BioPassword, currently used for hardening passwords (short input) in existing computer security schemes. The keystroke biometric is one of the less-studied biometrics; researchers tend to collect their own data and no known studies have compared identification techniques on a common database. Nevertheless, the published literature is optimistic about the potential of keystroke dynamics to benefit computer system security and usability.

The keystroke biometric has several possible applications. One application is an authentication process (binary accept/reject response, yes you are the person you claim to be or no you are not). For example, password entry could be "hardened" by adding as a keystroke authentication process as a second stage following password matching before allowing user entry. Thus, if the password is not entered in the normal keystroke pattern, the system could ask the user to reenter it. For example, a user on a particular occasion might be drinking a cup of coffee and be entering the password uncharacteristically with one hand. The system, then, could reject the password, sending the user a message like, "Please reenter your password in your normal manner," and after, say, three tries, possibly rejecting the user entirely. The user upon receiving the message would likely put down the coffee cup and enter the password in his/her normal fashion in order to be accepted. Another use of such an authentication process is to authenticate students taking online tests by their keystroke patterns.

A second application is to identify an individual from his/her keystroke pattern (one-of-n response). Suppose, for example, there has been a problem with the circulation of offensive emails from easily accessible desktops in a work environment. The security department wants to reduce this problem by collecting keystroke biometric data from all employees and developing a keystroke biometric identification system.

We have developed in CSIS at Pace University a keystroke biometric identification system (one-of-n response) over the last four years; for last fall's project see the Projects page at IT691-CS691 - Fall 2006. We have presented experimental results at three external and several internal conferences. The next paragraph contains the abstract of our most recent conference paper; for the full paper see Keystroke Conference Paper (slides).

ABSTRACT: For long-text input of 650 keystrokes, a biometric system was developed for applications such as identifying perpetrators of inappropriate e-mail or fraudulent Internet activity. A Java applet collected raw keystroke data over the Internet, appropriate long-text-input features were extracted, and a pattern classifier made identification decisions. Experiments were conducted on a total of 118 subjects using two input modes - copy and free-text input - and two keyboard types - desktop and laptop keyboards. Results indicate that the keystroke biometric can accurately identify an individual who sends inappropriate email (free text) if sufficient enrollment samples are available and if the same type of keyboard is used to produce the enrollment and questioned input samples. For laptop keyboards we obtained 99.5% identification accuracy on 36 users, which decreased to 97.9% on a larger population of 47 users. For desktop keyboards we obtained 98.3% accuracy on 36 users, which decreased to 93.3% on a larger population of 93 users. Accuracy decreases significantly when subjects used different keyboard types or different input modes for enrollment and testing.

The latest phase of the keyboard biometric effort centered on the aspect of fallback which answers questions like "what do you do if you have an incomplete or insufficient data set?" Two new Fallback models were developed - one based on touch-typing principles and the other on statistical analysis of the data. Beyond the keyboard biometric effort, fallback models can be applied to any setting where data is insufficient or missing, and the macro-goal is decision making with incomplete or imperfect information. For example, in Marketing Research, the age old problem of "survey non-response" where a survey respondent omits a portion of his/her answers is an excellent candidate and opportunity to apply this work. Other areas where this might apply include national security, general operations management, attrition prediction in industries such as banking and telecom, and stock outages in industries such as retail. See the 2007 CSIS Student/Faculty Research Day Paper.


This semester's project will have two primary focuses. First, we will collect additional data -- five data samples from each of the four quadrants (total of 20 samples, like the 36-subject data in the main paper) -- from as many participants (subjects) as possible, including each team member. We will also determine how to manage the growing quantity of data. Most of the participants in the previous studies entered all their data in a short time, usually in one sitting (same day). Therefore, we will collect three (possibly four) full data sets as above from at least each team member with two-week intervals between data entries.

Second, and most importantly, we will format the feature-vector data (especially the 36-subject data together with similar newly-obtained data) for ease of processing by other project systems, specifically the Biometric Authentication System and the Data Mining Systems.

Also, if time permits, we want to rerun some or the previous experiments and possibly improve the current method of running experiments. We also want to run some experiments with the new data.