Keystroke Biometric: ROC Experiments

Keystroke Biometric Background: read & understand this section

Last semester's project refactored the code and involved considerable programming, see Fall 2009 Technical Report. This semester's project will be much different.

Project

This semester will mostly involve running experiments using existing code. However, there are several program modules that you will need to learn how to run. There will also be some programming, probably with Python.

The experiments will focus on obtaining Receiver Operating Characteristic (ROC) curves and on running related large data experiments.

We might also run some experiments involving 'weak' and 'strong' enrollment. We recently focused on authentication experiments that used 'weak enrollment' data, where only non-test-subject data were used to train the system. Strong enrollment uses test-subject data (and possibly additional non-test-subject data) to train the system, and then uses independent (different) test-subject data to test the system.

Fast Agile XP Deliverables

We will use the agile methodology, particularly Extreme Programming (XP) which involves small releases and fast turnarounds in roughly two-week iterations. Some of these deliverables might be done in parallel by different members or subsets of the team. The following is the current list of deliverables (ordered by the date initiated, deliverable modifications marked in red, deliverable date marked in bold red if programming involved, completion date and related comments marked in green, pseudo-code marked in blue):
  1. 2/1-2/7 Keystroke Deliverable 1 Instructions
  2. 2/1-2/7 Keystroke Deliverable 2 Instructions
  3. 2/1-2/7 Keystroke Deliverable 3 Instructions
  4. 2/3 Data Collection
    For experimental purposes we need keystroke data samples over time at two-week intervals. Each team member is to record five keystroke samples, in alternate weeks over a nine week interval, with data sets collected in Week 4 (Feb 11), Week 6 (Feb 25), Week 8 (Mar 11), Week 10 (Mar 25), and Week 11 (Apr 8, after Spring Break). Thus, each team member will record a total of 25 data samples (5 samples at each of 5 recording times). These data are to be collected using the existing data collection method. Obtain details about using the existing data collection method from your customer Robert Zack. Team 4 (Test-Taker Setup & Data Collection) is in charge of this operation and will check the data.
  5. 2/11-2/22 Keystroke Deliverable 5 Instructions (Guide A)
  6. 2/13 Keystroke Deliverable 6 Instructions (Guide B)
  7. 3/9 Keystroke Deliverable 7 Instructions
  8. 3/25-4/3 Completed by Robert Zack. Run the feature extraction program on the raw data file from Team 4 to obtain a feature file, manually split the feature file into a training file (18 subjects, first 5 feature samples from each subject) and a test file (18 subjects, second 5 feature samples from each subject), and run BAS on these training and test files to obtain results similar to the common deliverables at the beginning of the semester.
    Note:
    The experiment at the beginning of the semester used different subjects for training and testing (we call that "weak" training) and this experiment uses the same subjects (but different data samples) for training and testing (we call this "strong" training).
    [Team 5 should run this experiment with both the old and the improved (real 239 features) feature extraction program.]