Feature-Vector Data Format

The data takes the form of a text-readable file or corresponding spreadsheet. The form of the file is as follows with fields in a record comma delimited and items in a field slash delimited:

Feature Value Normalization

The following pseudo-code will normalized the feature values into the range 0-1.
for i = 1 to number_of_features
     min =  999999 {initialize to a large positive number}
     max = -999999 {initialize to a large negative number}
     for j = 1 to number_of_samples {find min and max}
          if feature_value (i,j) < min then min = feature_value (i,j)
          if feature_value (i,j) > max then max = feature_value (i,j)
     end
     for j = 1 to number_of_samples {normalize}
          feature_value (i,j) = (feature_value (i,j) - min) / (max - min)
     end
end

Example

Keystroke biometric data example created September 2008
8
MaryJones/F/08-01-1981, left-handed, Dell laptop, copy task, 2, 0.13668, 0.53375
MaryJones/F/08-01-1981, left-handed, Dell laptop, copy task, 2, 0.14378, 0.56275
JohnSmith/M/06-01-1980, right-handed, Dell laptop, email task, 2, 0.53628, 0.43865
JohnSmith/M/06-01-1980, right-handed, Dell laptop, email task, 2, 0.43628, 0.53865
ChrisHill/F/02-04-1983, right-handed, Dell desktop, email task, 2, 0.39734, 0.92862
ChrisHill/F/02-04-1983, right-handed, Dell desktop, email task, 2, 0.49924, 0.98861

Notes:

1) The example has three classes: Mary Jones, John Smith, and Chris Hill.
2) Unknown, unavailable, or not relevant data items are indicated by "?".
3) Although the feature measurements have been represented with five decimal places as shown above for simplicity, eight or ten decimal places is recommended for the actual project data.