Stylometry uses statistical analysis, pattern recognition, and artificial intelligence techniques. For features, stylometry typically analyzes the text by using word frequencies and identifying patterns in common parts of speech. A framework paper and MIT Thesis describe some existing systems.
This is a continuation of previous projects, see Research Day 2010 paper, Research Day 2011 paper, and especially IJCB2011 Conference Paper.
Last semester we developed a reasonably robust Pace University Stylometry Biometric System (PSBS) and the feature set is currently being enlarged by Vinnie Monaco. The design of the stylometry features is based on the following criteria:
Last semester we used the PSBS in an effort to enhance the Pace University Keystroke Biometric System (PKBS) on the answers entered by students taking online short-answer tests, see above IJCB2011 Conference Paper. However, because the stylometry results were rather poor last semester, this project will focus solely on stylometry and on much longer text input with the aim of obtaining reasonable accuracy on the PSBS.
We have some long-text samples and expect to obtain more from DPS students and graduates teaching at various institutions. Most of this semester's effort will be running experiments to obtain accuracy (e.g., Equal Error Rate) as a function of text length.