Feature-Vector and Feature-Vector-Difference Data Formats

Feature-Vector Data Files

The data takes the form of a text-readable file or corresponding spreadsheet. The form of the file is as follows with fields in a record comma delimited and items in a field slash delimited:

Feature Value Normalization

The following pseudo-code will normalized the feature values into the range 0-1.
for i = 1 to number_of_features
     min =  999999 {initialize to a large positive number}
     max = -999999 {initialize to a large negative number}
     for j = 1 to number_of_samples {find min and max}
          if feature_value (i,j) < min then min = feature_value (i,j)
          if feature_value (i,j) > max then max = feature_value (i,j)
     end
     for j = 1 to number_of_samples {normalize}
          feature_value (i,j) = (feature_value (i,j) - min) / (max - min)
     end
end

Example

Keystroke biometric data example created September 2008
8
MaryJones/F/08-01-1981, left-handed, Dell laptop, copy task, 2, 0.13668, 0.53375
MaryJones/F/08-01-1981, left-handed, Dell laptop, copy task, 2, 0.14378, 0.56275
JohnSmith/M/06-01-1980, right-handed, Dell laptop, email task, 2, 0.53628, 0.43865
JohnSmith/M/06-01-1980, right-handed, Dell laptop, email task, 2, 0.43628, 0.53865
ChrisHill/F/02-04-1983, right-handed, Dell desktop, email task, 2, 0.39734, 0.92862
ChrisHill/F/02-04-1983, right-handed, Dell desktop, email task, 2, 0.49924, 0.98861

Notes:

1) The example has three classes: Mary Jones, John Smith, and Chris Hill.
2) Unknown, unavailable, or not relevant data items are indicated by "?".
3) Although the feature measurements have been represented with five decimal places as shown above for simplicity, eight or ten decimal places is recommended for the actual project data.

Feature-Vector-Difference Data Files

The above three-class example file can easily be converted into a two-class dichotomy-model vector-difference data file. The two authentication classes are the within-class (same person) and the between-class (different people) categories. We perform the conversion by taking all possible difference vectors. In this case, there are only 3 within-class vector pairs, one for each of the three people, and 12 (6*4/2, each of the 6 instances can be compared with the 4 instances from other people, then divide by 2 to eliminate duplicates) between-class vector pairs. In general, if n people provide m biometric samples each, there are m*(m-1)*n/2 within-class pairs and m*m*n*(n-1)/2 between-class pairs. The number of within-class and between-class pairs can be large (the between-class pairs usually far exceeds the number of within-class pairs), and then the training and test samples can be generated at random to limit the number. For each pair, a difference vector is computed by taking the absolute difference between each vector component. Because our biometric features are in the range 0-1, the difference vector features will also be in the range 0-1.

For the illustrative feature-vector file above, the feature vector record for the first within-class (same person) pair and for the first between-class (different people) pair would be:

Example

same, 2, |0.13668-0.14378|, |0.53375-0.56275|
different, 2, |0.13668-0.53628|, |0.53375-0.43865|