Keystroke Biometric: Data/Feature Experiments
Keystroke Biometric Background: read & understand this section
Last semester's project involved checking the resolution of the raw data, see
Fall 2009 Technical Report.
This semester's work will be substantially different, although the first set of experiments will involve the data.
Some programming will be required, especially for the feature-related studies.
The key-press and key-release timing data captured in the keystroke work should have a resolution in milliseconds
because the clock records to that accuracy.
Last semester's project, however, found that the last digit was frozen in a good portion of the data,
yielding a resolution of only centiseconds.
This semester we will determine whether this resolution difference results in a difference in authentication accuracy,
and we anticipate that data with the higher resolution will yield higher accuracy.
Earlier work indicates that the probability distributions of the key-press durations is roughly normally distributed.
Last semester's project verified this and also found that the
probability distributions of the key transition times is roughly log-normal,
and this phenomenon was discussed in a Recent Paper.
We will try to apply a correction to convert the log-normal feature data in normally distributed data
in an attempt to improve authentication accuracy of the system.
Another way to improve authentication accuracy might be to eliminate outliers in the feature distributions,
and we will also explore this.
Fast Agile XP Deliverables
We will use the agile methodology,
particularly Extreme Programming (XP) which involves small releases and fast turnarounds in roughly two-week iterations.
Some of these deliverables might be done in parallel by different members or subsets of the team.
The following is the current list of deliverables
(ordered by the date initiated, deliverable modifications marked in red,
deliverable date marked in bold red if programming involved,
completion date and related comments marked in green,
pseudo-code marked in blue):
Keystroke Deliverable 1 Instructions
Keystroke Deliverable 2 Instructions
Keystroke Deliverable 3 Instructions
- 2/3- Data Collection
For experimental purposes we need keystroke data samples over time at two-week intervals.
Each team member is to record five keystroke samples,
in alternate weeks over a nine week interval, with data sets collected in
Week 4 (Feb 11), Week 6 (Feb 25), Week 8 (Mar 11), Week 10 (Mar 25), and Week 11 (Apr 8, after Spring Break).
Thus, each team member will record a total of 25 data samples (5 samples at each of 5 recording times).
These data are to be collected using the existing data collection method.
Obtain details about using the existing data collection method from your customer Robert Zack.
Team 4 (Test-Taker Setup & Data Collection) is in charge of this operation and will check the data.
- Do not repeat the same sample "choice" (e.g., write a letter of email to a friend) during a sample collection session.
- Use the same Login information (First Name/Last Name) for each weekly sample collection session.
There is no application validation at this time; any login will be accepted.
Care should be taken to monitor and ensure that there is login consistency.
- Each set of five samples from a subject should be spaced at roughly two-week intervals,
plus or minus no more than three days.
- 2/9-3/10 239 Features (biofeature package)
The recent articles that describe the system list 239 feature measurements (see Appendix of the recent paper submission).
However, the feature vectors in the recent experiments only show 230 feature measurements.
Somewhere in the recent code changes we have apparently lost 9 features. Your task here is as follows:
Great job! The 1NN error rate went from 4.00% to 2.65%, a decrease of 34% (1.35/4.00).
- Count the number of features in the feature vector to make sure it is not 239 and that there is a problem.
- Check the code against the article's list of 239 features to determine which ones are missing.
- Report your findings to your customer and instructor.
- Fix the code to agree with the article's list of 239 features.
- Rerun the first three deliverables with the new code and report the results.
Is there an improvement with the 239 features?
- 2/24- Identify appropriate key logger software
We have been using a Java applet to capture the raw keystroke data.
Now we will investigate the use of key loggers to capture the data.
Many companies have free trials of their key logger software.
Your task is to explore using a number of these systems, such as the following
found with a Google search of "key logger":
Download several of these software systems (free trials) and use them to capture keystroke samples,
such as the "Hello World!" example shown in the "Keystroke Biometric Background" article.
Try to find three key logger systems that capture the key-press and key-release times
of the keyboard keys with an accuracy to the millisecond, similar to our Java applet.
In describing your solution to this deliverable, give an example of the output obtained from the three key logger systems.
Modify the feature extractor program (biofeature package)
Background and rationale:
The current keystroke biometric system was developed for running experiments and
is not appropriate for deployment in an actual test-taker authentication application.
The feature extraction program removes outliers and standardizes the feature measurements into the range 0-1.
These procedures perform best on large quantities of data involving many subjects
because we want the feature data for an individual subject to be unique (within the various 0-1 feature ranges)
relative to that of other subjects.
It is not reasonable, therefore, to extract features on a small amount of data,
such as a single sample from a user needing authentication, without mixing it with data from many other subjects.
For this reason, for experimental purposes we usually run the feature extractor on all the data and
then split the data into training and testing data.
To facilitate an actual authentication process, we will apply the same standardization min-max values to the testing data
that were obtained from the training data.
This will allow us to authenticate several users at a time in batch mode or one at a time in real-time.
The feature extractor will run as before for training,
only now it will output a file containing the standardization x-min and x-max values for each feature.
Then, for testing, these recorded x-min and x-max values will be read in and used to perform the standardization.
In summary, the revised feature extractor will have the option (by switch) of doing one of two things:
- computing, using, and outputting an x-min/x-max file (file output)
- inputting an x-min/x-max file and then using it (file input)
Redo Deliverables 1-3 with a slight variation.
- Extract features on only the 18-training-subject samples and output the file of x-min/x-max values.
- Extract features on only the 18-test-subject samples by reading in and using the x-min/x-max values from above.
- Run the authentication classifier to obtain accuracy results on these training and testing feature files.
(Note: accuracy might be a little lower than before.)
- 3/25-4/3 Part 1 by Robert - 4/9 Part 2 Strong vs Weak Training
Run on improved (239 feature) program, and get the files from Robert to run the test,
and the first part results to compare against.>
Run the feature extraction program on the raw data file from Team 4 to obtain a feature file,
manually split the feature file into a training file (18 subjects, first 5 feature samples from each subject)
and a test file (18 subjects, second 5 feature samples from each subject),
and run BAS on these training and test files to obtain results
similar to the common deliverables at the beginning of the semester.
The experiment at the beginning of the semester used different subjects for training and testing
(we call that "weak" training) and
this experiment uses the same subjects (but different data samples) for training and testing
(we call this "strong" training).
[Team 5 should run this experiment with both the old and the improved (real 239 features) feature extraction program.]
- 4/12- Another Weak Training Experiment
The last deliverable compared strong versus weak training but used different subjects for testing.
Now, we will run a weak training experiment,
but test on the same 18 subjects used in the strong training experiment of the previous deliverable.
For training we will use the 18-subject training data used in the baseline (original weak training) experiment,
and for testing we will use the 18-subject testing data used in the previous deiverable.
Again, you will need to combine the raw data files, run feature extraction, split the output file
appropriately into training and testing feature files, and run the BAS program on the two feature files.
Testing on the same data in both cases will give us a more comparable (fairer) comparison,
and these results should also be included in your technical paper.