Keystroke Biometric: ROC Experiments
Keystroke Biometric Background: read & understand this section
Last semester's project refactored the code and involved considerable programming, see
Fall 2009 Technical Report.
This semester's project will be much different.
This semester will mostly involve running experiments using existing code.
However, there are several program modules that you will need to learn how to run.
There will also be some programming, probably with Python.
The experiments will focus on obtaining Receiver Operating Characteristic (ROC) curves
and on running related large data experiments.
We might also run some experiments involving 'weak' and 'strong' enrollment.
We recently focused on authentication experiments that used 'weak enrollment' data,
where only non-test-subject data were used to train the system.
Strong enrollment uses test-subject data (and possibly additional non-test-subject data)
to train the system, and then uses independent (different) test-subject data to test the system.
Fast Agile XP Deliverables
We will use the agile methodology,
particularly Extreme Programming (XP) which involves small releases and fast turnarounds in roughly two-week iterations.
Some of these deliverables might be done in parallel by different members or subsets of the team.
The following is the current list of deliverables
(ordered by the date initiated, deliverable modifications marked in red,
deliverable date marked in bold red if programming involved,
completion date and related comments marked in green,
pseudo-code marked in blue):
Keystroke Deliverable 1 Instructions
Keystroke Deliverable 2 Instructions
Keystroke Deliverable 3 Instructions
- 2/3 Data Collection
For experimental purposes we need keystroke data samples over time at two-week intervals.
Each team member is to record five keystroke samples,
in alternate weeks over a nine week interval, with data sets collected in
Week 4 (Feb 11), Week 6 (Feb 25), Week 8 (Mar 11), Week 10 (Mar 25), and Week 11 (Apr 8, after Spring Break).
Thus, each team member will record a total of 25 data samples (5 samples at each of 5 recording times).
These data are to be collected using the existing data collection method.
Obtain details about using the existing data collection method from your customer Robert Zack.
Team 4 (Test-Taker Setup & Data Collection) is in charge of this operation and will check the data.
- Do not repeat the same sample "choice" (e.g., write a letter of email to a friend) during a sample collection session.
- Use the same Login information (First Name/Last Name) for each weekly sample collection session.
There is no application validation at this time; any login will be accepted.
Care should be taken to monitor and ensure that there is login consistency.
- Each set of five samples from a subject should be spaced at roughly two-week intervals,
plus or minus no more than three days.
Keystroke Deliverable 5 Instructions (Guide A)
Keystroke Deliverable 6 Instructions (Guide B)
Keystroke Deliverable 7 Instructions
- 3/25-4/3 Completed by Robert Zack.
Run the feature extraction program on the raw data file from Team 4 to obtain a feature file,
manually split the feature file into a training file (18 subjects, first 5 feature samples from each subject)
and a test file (18 subjects, second 5 feature samples from each subject),
and run BAS on these training and test files to obtain results
similar to the common deliverables at the beginning of the semester.
The experiment at the beginning of the semester used different subjects for training and testing
(we call that "weak" training) and
this experiment uses the same subjects (but different data samples) for training and testing
(we call this "strong" training).
[Team 5 should run this experiment with both the old and the improved (real 239 features) feature extraction program.]