Clustering Discrimination
on Fusing Weak Biometrics

Background

Several biometric systems are sometimes fused (combined) to increase the discrimination power of a single biometric. It may also be possible to use pattern recognition clustering techniques to assist in the discrimination task. These methods seem particularly appropriate for the weak behavioral biometrics - keystroke, stylometry, voice, gait, etc. - that tend to have lower discrimination power than the stronger physiological biometrics like iris and fingerprint.

The 2008 federal Higher Education Opportunity Act requires institutions of higher learning to make greater access control efforts for the purposes of assuring that students of record are those actually accessing the systems and taking exams in online courses by adopting identification technologies as they become more ubiquitous. To meet these needs, keystroke and stylometry biometrics were investigated at Pace University towards developing a robust system to authenticate (verify) online test takers. The performance of the stylometry system on online tests, however, was rather poor and simply fusing the keystroke and stylometry systems by combining their features did not boost the performance of the keystroke system alone. This work has been described in last semester's technical paper from Research Day 2011 and extended in the IJCB2011 paper to be presented at the International Joint Conference on Biometrics in October 2011.

Project

This semester we will explore a clustering technique as a way of assisting the discrimination power of the individual keystroke and stylometry biometric systems.

The idea of using clustering is as follows. For each of the systems we anticipate that users will fall into a set of clusters with users having similar traits falling into the same cluster. For example, there might be keystroke system clusters of fast left-handed touch typists, fast right-handed touch typists, slow left-handed hunt-and-peck typists, slow right-handed hunt-and-peck typists, etc. Similarly, there might be stylometry system slusters for long-winded large-vocabulary users, etc. Now, when a user taking an online test tries to be verified, his/her keystroke within-class difference vector might be close to the between-class sample (the distance vector of the true user and another user), making it difficult for the keystroke system alone to verify that user. However, the true user's stylometry sample might be much different than the other user's in terms of the stylometry clusters, thus disambiguating the two users and allowing the first to be verified.

The stylometry and keystroke data obtained last semester and over the summer will be used for these experiments. The data will be made available to the team in a form suitable for Matlab processing.

The project team will implement an algorithm to cluster the feature vector samples of the keystroke and stylometry systems to test the above idea. Matlab will be the preferred software for the experiments. This project is particularly suitable for students who have taken Dr Cha's Data Mining or Pattern Recognition course. Because we are particularly interested in obtaining results on this idea and considering the potential difficulty of the work, students selecting this project will likely receive excellent grades.