Keystroke Biometric: Refactor System and Conduct Experiments
Keystroke Biometric Background: read & understand this section
Code Objective 1
The Keystroke Biometric System has evolved and has been enhanced for the past six years
to incorporate features and functions.
In order to improve the system so that it can handle larger subject and sample sizes,
refactoring of the Java code with a focus on the classification component is required.
This will require at least the following steps:
- The classifier-input feature data, currently stored in text-readable files, must be stored in binary files.
- The system, which currently requires the Eclipse IDE to run, should run as a standalone Java application.
- Errors and warnings in the code for all components should be resolved and obsolete code updated.
- Duplicating earlier experimental results will mark the successful completion of this objective.
Code Objective 2
Fix the feature extractor code resolving the discrepancy discovered last semester in the number of features.
Code Objective 3
Additional code modification tasks include converting several Python scripts into Java
so that we have a common code base.
These scripts are used to process classifier output files
to derive Receiver Operating Characteristic (ROC) curves that graphically illustrate system performance.
Code Objective 4
Modify the data collection, feature extractor, and associated components
so that they support strong-enrollment experiments.
Code Objective 5
Result sets from operations of the Keystroke Biometric System are stored in flat text files.
It is important to modify the application so that it efficiently uses database technology
instead of flat files for the collection and storage of data.
This helps with the scalability problem and other analysis objectives.
We recently focused on authentication experiments that used 'weak enrollment' data,
where only non-test-subject data were used to train the system.
Strong enrollment uses test-subject data (and possibly additional non-test-subject data)
to train the system, and then uses independent (different) test-subject data to test the system.
Experimental Objective 1
Run "weak-enrollment" experiments using a new data set collected this semester
and analyze classifier accuracy by comparing results to the longitudinal data from the last semester.
Experimental Objective 2
Conduct "strong-enrollment" experiments to analyze classifier accuracy
and compare results to the "weak-enrollment" experiments in Objective 1 and from data from the last semester.
Fast Agile XP Deliverables
We will use the agile methodology,
particularly Extreme Programming (XP) which involves small releases and fast turnarounds in roughly two-week iterations.
Many of these deliverables can be done in parallel by different members or subsets of the team.
The following is the current list of deliverables
(ordered by the date initiated, deliverable modifications marked in red,
deliverable date marked in bold red if programming involved,
completion date and related comments marked in green,
pseudo-code marked in blue):
Deliverables Common to all Keystroke Projects
Deliverables specific to this project:
- 10/5 10/15 (Good Job!)
Clean up the BAS (new version) code:
- Eliminate all error/warning messages.
- Eliminate unnecessary code to support the ProcessClassifierOutput.java process for
producing the kNN classifier performance results from the new BAS program.
A separate package or integration with the new BAS java package will satisfy
the objective. Currently, it is a redundant package that contains errors and unused code.
- Make the existing new BAS program parameter driven.
This includes setting the input and output file names,
number of NN's and the top X samples, etc., so the code does not have to be
"hard-coded" with changes for each change.
- 10/5 10/22 (Good Job!)
Create a stand alone Java application to replace the Eclipse IDE BAS (newer version),
and test it on Common Deliverables 2 & 3.
Modify the standalone BAS program to give it the capability to run large data sets.
Feature Extractor Deliverable. Refactor the biofeature source code.
This application is currently launched by selecting the UserInterface.java code in the biofeature package.
Remove errors and warnings like with the BAS package.
The code should be optimized, including enhancements to the file chooser and status reporting functionality similar to BAS.
When the code is error/warning free, create a standalone application for the feature extractor.
The default Fallback Method should also be changed to "Linguistic" from "TouchType" as part of this deliverable.
Python Code Deliverable.
- This deliverable is to process the classifier output into a form suitable for ROC Curve generation.
First convert the python code to java so that the code base is consistent with BAS and the BioFeature components.
The converted programs should run in Eclipse and as a standalone application.
The individual scripts can be combined into a single menu driven application (preferred)
or can maintained as separate modules.
- The un-weighted or weighted programs should ask the user for the name of the input and output files.
From a scan of the first block in the input file, the ‘1/W block’,
the program should learn how many W/B choices are in range to set the index in the program.
Typically, n=10, 15 or 20 are used, but any value is possible.
If n=10, there will be blocks of 10 W/B choices in the file;
if n=15, there will be blocks of 15 W/B choices in the file; etc.
If the scan functionality cannot be completed within a few iterations,
a chooser should be used instead to prompt the user for the n value to set the range.
If this approach is used,
the program should handle errors the error condition that results if the n selection does not match the data file.
Output from this program should be FRR, FAR, FRR%, and FAR%.
- Python Code