Keylogger Keystroke Biometric System

Background

According to Wikipedia (January 2011), "Keystroke logging (often called keylogging) is the action of tracking (or logging) the keys struck on a keyboard, typically in a covert manner so that the person using the keyboard is unaware that their actions are being monitored." Parents often install keylogger software on the home computer so they can track what their kids do on the computer and particularly what websites they visit.

Some keylogger software will not only record the sequence of keys struck but also their timing information, that is when a key is struck and when it is released. If this timing information is sufficiently accurate, it can be used for biometric purposes.

Over the last seven or so years we have developed the powerful Pace University Keystroke Biometric System (PKBS) for text input. Keystroke users key text into a Java applet to produce PKBS input files.

In this project we will go beyond text input to determine the utility of PKBS for arbitrary types of keyboard input: text, spreadsheet, program execution, etc. Initial work on this problem from last semester's project is described in Research Day 2011 paper. Last semester's project used the Fimbel keylogger to capture keystroke data files and developed a converter to convert the Fimble keylogger output files into the PKBS input format files.

Project

Initial experiments will involve running arbitrary keystroke data through PKBS, obtaining results, and analyzing the results. This will involve three steps:
  1. Install the Fimble keylogger on users' machines
  2. Convert the Fimble keylogger output files into the PKBS input format files
  3. Run converted data files through the PKBS (a user guide for running PKBS will be provided)

In order handle different types of input (other than text) we may have to develop an enhanced version of the system (PKBS-2) that can adequately handle these various input types. These modifications will likely be straightforward extensions. For example, additional features will likely be added to handle the greater numeric input for spreadsheets. Results will then be obtained from PKBS-2 and analyzed.

Data Capture Instructions and Code from Ned Bakelman

Here's the Keylogger Data Capture Instructions and the PACE_DataCapture Folder.

Fast Agile XP Deliverables

We will use the agile methodology, particularly Extreme Programming (XP) which involves small releases and fast turnarounds in roughly two-week iterations. Many of these deliverables can be done in parallel by different members or subsets of the team.

The following is the current list of deliverables (ordered by the date initiated, initiated date marked in bold red if programming involved, deliverable modifications marked in red, completion date and related comments marked in green, pseudo-code marked in blue):

  1. 9/22 (ongoing task over the next five weeks). Completed with old system - although the data may no longer be good, skip this item for now. Over the next five weeks collect one-hour keystroke data samples, a total of 10 one-hour samples per team member. The 10 one-hour samples should be spread out over the next 5 weeks, preferable two per week, and certainly no more than one per day. During each of the hour sessions you should use the computer as you normally do for dealing with email, searching the Internet, etc.
  2. 10/17 Spreadsheet - Strong Training. Mostly completed Over the next week collect spreadsheet data-entry samples, preferably no more than two per day, for a total of 10 samples per team member. Use this Excel Template for entering the data. Run a preliminary experiment with the Excel data:
    1. Take the ten Fimbel Excel-input data files from each of the 5 team members and convert them into PKBS raw data files (total of 50 files)
    2. Convert the 50 PKBS raw data files into a single file of feature vectors
    3. Split the feature vector file into two files - one for training and one for testing - each file containing 25 feature vectors (5 samples x 5 users)
    4. Using the training and testing feature files (each file containing 25 feature-vector records) as input, run the PKBS to obtain strong-training performance results (Note: in strong training you train on the same subjects you test on, but use different data.)
    5. For validation reverse the train and test files and rerun to obtain performance results
    Work with Vinnie and Ned if you have difficulty with any of these steps. Try to complete this deliverable and present the results at the midterm meeting.
  3. 10/31 Spreadsheet - Weak Training. Collect 10 spreadsheet data-entry samples from each of 5 non-team members. Use the previously used Excel Template for entering the data. Run a weak-training experiment with the Excel data:
    1. Take the ten Fimbel Excel-input data files from each of the 10 users (5 team members + 5 others) and convert them into PKBS raw data files (total of 100 files)
    2. Convert the 100 PKBS raw data files into a single file of feature vectors
    3. Split the feature vector file into two files - one for training and one for testing - each file containing 50 feature vectors (10 samples x 5 users)
    4. Using the training and testing feature files (each file containing 50 feature-vector records) as input, run the PKBS to obtain weak-training performance results
    5. For validation reverse the train and test files and rerun to obtain performance results
    Try to complete this deliverable for technical paper draft 2.
  4. 10/31 Text->Spreadsheet and Spreadsheet->Text - Weak Training. Use text-input data from previous projects (John Stewart or Robert Zack) and spreadsheet data from the 5 team member experiment above (10 samples from each team member). If available, use the spreadsheet data from all 10 users (5 team members + 5 non-team members).
    1. Take the text-input files from previous projects and the 50 PKBS raw data files from the above deliverable and obtain a single file of feature vectors
    2. Split the feature vector file into two files - the text-input records for training and the spreadsheet records for testing
    3. Using the training and testing feature files as input, run the PKBS to obtain weak-training performance results (train text, test spreadsheet)
    4. Reverse the train and test files and rerun to obtain performance results (train spreadsheet, test text)
    Try to complete this deliverable for technical paper draft 2.