Keystroke Biometric
Intrusion Detection

Background

According to Wikipedia (January 2011), "Keystroke logging (often called keylogging) is the action of tracking (or logging) the keys struck on a keyboard, typically in a covert manner so that the person using the keyboard is unaware that their actions are being monitored." Parents often install keylogger software on the home computer so they can track what their kids do on the computer and particularly what websites they visit.

Some keylogger software will not only record the sequence of keys struck but also their timing information, that is when a key is struck and when it is released. If this timing information is sufficiently accurate, it can be used for biometric purposes.

Over the last seven or so years we have developed the powerful Pace University Keystroke Biometric System (PKBS). This system was developed for text input applications like online exams requiring, for example, short text answers to questions. This system requires users to key text into a Java applet to produce PKBS input files.

Recently we have gone beyond text input to determine the utility of PKBS for arbitrary types of keyboard input: text, spreadsheet, program execution, etc. Initial work on this problem has been described in a Research Day 2011 paper from the Spring 2011 project and a Technical paper from the Fall 2011 project. These projects used the keylogger developed by Eric Fimbel.

Project

This semester's project involves collecting keyboard input and running experiments -- no programming is involved. Although it will take time to learn how to run the system, members from last semester's team have offered to help with this. We will continue to use the Fimbel keylogger to obtain arbitrary keyboard input for keystroke biometric analysis.

The application of interest is intrusion detection. Intruder detection, by which we mean the discovery that someone other than the authentic user is using a computer, has become of interest to various organizations including the US Government. Given the scenario where an authentic user leaves his system unlocked and unattended, the question therefore becomes how fast and how accurate can the unauthorized use of that computer be detected. Our solution is to detect the intruder from an analysis of his keystroke input which would presumably differ substantially from that of the authorized user.

The experiments will involve running data through PKBS, obtaining results, and analyzing the results. This will take several steps:

  1. Install the Fimbel keylogger on users' machines and collect data files
  2. Convert the Fimbel keylogger output files into the PKBS input format files
  3. Prepare the training and testing files for input to PKBS. This usually involves running the Featrue Extractor program to produce a feature file and then separating that file into training and testing files.
  4. Run the training and testing files through PKBS to obtain output files
  5. Run the PKBS output files through the BAS Calculator program to obtain FAR, FRR, and overall performance
  6. Run the BAS Calculator output through the ROC Curve Data Generator program to obtain Receiver Operating Characteristic (ROC) curves

This semester we will collect five types of keystroke/mouse input data (detailed instruction will be provided by Ned Bakelman):

  1. Text data: from applications like Microsoft Word and email
  2. Spreadsheet data: from applications like Microsoft Excel
  3. Browser data: from Microsoft Internet Explorer, Firefox, etc., using applications like Google, etc.
  4. Simulated intruder input: to be described by Ned Bakelman
  5. Open data: typical comuter keyboard input activity -- email, IM, facebook, web activity, etc. (any of types 1-3 above, and try to capture a variety)

Code and Instructions

All code and instructions will be provided by customer Ned Bakelman.

Fast Agile XP Deliverables

We will use the agile methodology, particularly Extreme Programming (XP) which involves small releases and fast turnarounds in roughly two-week iterations. Many of these deliverables can be done in parallel by different members or subsets of the team.

The following is the current list of deliverables (ordered by the date initiated, initiated date marked in bold red if programming involved, deliverable modifications marked in red, completion date and related comments marked in green, pseudo-code marked in blue):

  1. 2/1 . Plan the semester's work with customer Ned Bakelman.
  2. 2/5 . Data Collection.
  3. 2/5 . Practice experiment to learn the experimental procedures.