Suggested Research Topics

The following research topics suggested by Drs. Tappert and Cha are suitable for D.P.S. or M.S. dissertations, and also for briefer preliminary studies in research-related courses. Of course, the research conducted for a D.P.S. dissertation must be more extensive than that for an M.S. dissertation, and similarly for an M.S. dissertation relative to a small research study required for a course. For many of these topics preliminary research has been undertaken either by us or by some of our students. Significant research opportunities exist in each of these topics by either extending the earlier work or by undertaking new approaches.

There is a clear distinction between research and project work. Research is original, rigorous work that advances knowledge, improves professional practice, and/or contributes to the understanding of subject. Research methods depend upon the nature of the research: controlled experiment, empirical studies, theoretical analyses, or other methods as appropriate. We require DPS research work to be of sufficient strength to be able to distill from it a paper worthy of publication in a refereed journal or conference proceedings. Project work, on the other hand, uses known technology to develop systems, usually according to specified customer requirements. Many of our projects, however, are developed to provide support for our research, so there is interplay between the project and research activities. For an overview of research and projects in these areas, see Research/Projects Interplay (e-Learn 2007) or better yet the more recent Capstone Projects (e-Learn 2011) and associated slides.

We have four major areas of research – biometrics, interactive visual image studies, natural language processing and related forensics studies, and general pattern recognition studies – and several other areas that, depending on the particular study, tend to relate to one or several of these major areas. The four areas are also highly interrelated – biometrics, for example, is basically a subarea of pattern recognition. For listings of various studies in these areas, see Dr. Tappert's publication list and Dr. Cha's publication list


For an overview of this area, see An introduction to biometric recognition by Jain, et al., 2004 or the textbook used in our DPS course on Emerging Information Technologies. The main problem we consider is the task of establishing the distinctiveness of each individual in a population when there is a set of measurements that have an inherent variability for each individual. This task of establishing individuality can be thought of as showing the distinctiveness of the individual classes with a small error rate in discrimination. This is important for most forensic science applications such as writer, face, iris, fingerprint, speaker, or bite mark identification. All these applications face the problem of scientifically establishing individuality, which is motivated by court rulings such as Daubert vs. Merrell Dow Pharmaceuticals that concerns the uniqueness of handwriting. Dr. Cha's dissertation, for example, was on writer individuality where he employed a dichotomy model of feature vector differences that is inferable to the general population. There are other biometric applications for which studies can be made using the same model, and preliminary studies have been made on some of these (see below).

Individuality based on keystroke patterns

We have explored the keystroke biometric for long text input which had previously received little attention. Four students have completed DPS dissertations in this area, although the last two have generalized their research and only used the keystroke biometric data as a case study.
  1. Mary Curtin (completed 2006) explored the feasibility of user identification with the keystroke biometric on long-text input – see the conference paper that summarizes her work, Keystroke Biometric Recognition on Long-Text Input: A Feasibility Study (2006).
  2. Mary Villani (completed 2006) studied user identification accuracy of a long-text-input keystroke biometric system as a function of the two independent variables, keyboard types and input modes – see the conference paper that summarizes her work, Keystroke Biometric Recognition Studies on Long-Text Input under Ideal and Application-Oriented Conditions (2006).
  3. Mark Ritzmann (completed 2008) investigated strategies for managing missing or incomplete information and applied several alternative fallback procedures to the keystroke data. He also generalized related procedures for potential applications to business problems of incomplete information – see his dissertation.
  4. Robert Zack (completed 2010) invented new extensions of the k-nearest-neighbor classification technique to obtain Receiver Operating Characteristic (ROC) curves, and employed a user-authentication keystroke biometric system to verify and illustrate his results – see Biometrics conference article (2010) and associated slides that summarize his work.
In recent years, 2009-2010, the keystroke biometric studies have been extended, see Journal article (2010). These extensions have primarily been in two areas: The following are possible additional studies using our ever-increasing inventory of keystroke biometric data:

Individuality based on writing style (stylometry)

The linguistics (words, syntax, etc.) used in writing can be used to identify or verify the author. Several masters-level projects have initiated work in this area, see CSIS Research Day papers, and John Stewart (DPS 2011) is currently working in this area). For a related study that counts word usage, see He Counts Your Words.

The 2008 Higher Education Opportunities Act (HEOA) is a strong motivator for work on the biometrics of keystroke and stylometry. Extending the work of John Stewart, the following are possible additional studies using our ever-increasing inventory of keystroke and stylometry data:

Speaker individuality

We have conducted preliminary investigations of speaker individuality based on earlier work by Dr. Cha on writer individuality. The idea is to determine whether a speaker's voice is sufficient to accurately verify the identity of the speaker. Related work involves, for example, such topics as whether a speaker can disguise his/her speech sufficiently so as not to be identified. See the CSIS Research Day 2004 paper summarizing an M.S. Dissertation, Jan 2004, Establishing the Uniqueness of the Human Voice for Security Applications.

Individuality based on iris images

See the CSIS Research Day 2004 paper summarizing a study by M.S. Research Seminar student Seung Choi, Use of Histogram Distances in Iris Authentication and the 2005 Int. J. Graphics, Vision and Image Processing article, On the Individuality of the Iris Biometric

Individuality based on mouse movement patterns

Mouse movement patterns also have the potential for identification or verification. One could study, for example, mouse movements and mouse clicks for performing the standard editing operations of insert, delete, copy, etc. The mouse is used for moving the cursor to highlight text in cutting, to mark target locations in pasting, and to click on dropdown menus. Editing methods can further differentiate users -- for example, some use dropdown menus, others use shortcut keys. Several masters-level projects have initiated work in this area, see CSIS Research Day papers.

Individuality based on fingerprints

On-line fingerprint verification by Jain, et al., is a good introduction to this area of research.

Biometric system evaluation studies

We are beginning to research various methods of evaluating biometric systems. Our first study in this area is a paper (work performed by an M.S. student) entitled Evaluation of Biometric Identification in Open Systems (slides) presented at the Audio- and Video-based Biometric Person Authentication (AVBPA 2005) Conference.

Another general problem is determining how well a system developed on a small population of users generalizes to larger populations. For example, Dr. Cha proposed in his dissertation a dichotomy model of feature vector differences that is potentially inferable to the general population. It would be interesting to conduct further studies, of either a theoretical or empirical nature, on this or other models.

Interactive Visual Image Research

Combining human and machine capabilities can lead to powerful systems. In this regard we are inspired by a quote by Albert Einstein who said: "Computers are incredibly fast, accurate and stupid; humans are incredibly slow, inaccurate and brilliant; together they are powerful beyond imagination." We are interested in improving accuracy and speed in visual recognition tasks. Specifically, we want to enhance human-computer interaction in applications of pattern recognition where higher accuracy is required than is currently achievable by automated systems, but where there is enough time for a limited amount of human interaction. This topic has so far received only limited attention from the research community. Our current, model-based approach to interactive recognition was originated at RPI, and then investigated jointly at RPI and Pace University (see Interactive flower recognition below). Our NSF Proposal gives a good introduction and background to this area. The proposal's objective was to develop guidelines for the design of mobile interactive object classification systems; to explore where interaction is, and where it is not, appropriate; and to demonstrate working interactive recognition systems. The proposal targeted three very different domains – foreign signs, faces, and skin diseases – and this approach could also be used in other domains.

Interactive flower recognition

Our initial success in recognizing flowers established a methodology for continued work in the area of interactive visual systems, see Flower Study.

Interactive flag recognition

An M.S. student, Eduardo Hart, completed studies resulting in two conference papers: Interactive Flag Identification Using Image Retrieval Techniques and Interactive Flag Identification Using a Fuzzy Neural Technique.

Interactive skin lesion recognition

See the study by M.S. student John Sikorski: Identification of Malignant Melanoma by Wavelet Analysis.

Interactive rare coin grading

Completed 2003 DPS dissertation by Rick Bassett entitled Computer-based Objective Interactive Numismatic System.

Interactive archeological artifact studies

Completed 2005 DPS dissertation by Sheb Bishop entitled Classification of Greek Pottery Shapes and Schools Using Image Retrieval Techniques.

Interactive analysis of paintings

Completed 2005 DPS dissertation by Tom Lombardi entitled The Classification of Style in Fine-Art Painting, and related conference papers.

Natural Language Processing and Related Forensics Research

Pen computing studies involving shorthand and chatroom symbols

Alphabets of shorthand symbols have been developed for handheld devices, notably the Graffiti alphabet for Palm handhelds. An extension of this for even faster input would be to use shorthand symbols that correspond to words and phrases such as the text symbols used for chatroom communication. See the summaries of an M.S. in CS dissertation: the IWFHR 2004 conference paper Use of Chatroom Abbreviations and Shorthand Symbols in Pen Computing and the HCI 2005 conference paper Common Chatroom Abbreviations Speed Pen Computing. This work could be extended.

Automatic extraction of dynamic information from static images of handwriting for forensic studies

Forensic handwriting examination has the challenge of working only with static images of handwriting. However, much of the information relating to the identity of the specific writer, or relating to whether the writing was natural or distorted, is contained in the dynamics of the handwriting. Therefore, it can be important in the field of forensic science to develop digital image processing techniques to evaluate the dynamic and temporal components of the handwriting from static images.

Handwriting style analysis

Handwriting originates from a particular copybook style such as Palmer or Zaner-Bloser that one learns in childhood. Since questioned document examination plays an important investigative and forensic role in many types of crime, it is important to develop a system that helps objectively identify a questioned document's handwriting style. Identifying the copybook style of a questioned document, such as a ransom note, can help to reduce the scope of the suspect population in the identification of an individual writer. Preliminary offline (OCR type) work on this subject resulted in the conference paper, Similarity-Based Handwriting Copybook Style Identification and the Journal of Forensic Document Examiners article Handwriting Copybook Style Identification for Questioned Document Examination. In 2005 Mary Manfredi extended this work using pseudo online data to complete her DPS dissertation entitled "Copybook Style Determination of Pseudo-Online Handwriting Data," summarized in a paper presented at the International Graphonomics Society (IGS 2005) conference, Similarity-Based Copybook Style Analysis Using Pseudo-Online Handwriting.

Handwriting synthesis of a particular writer's style

This topic is related to the handwriting style analysis described above. Forensic examiners would like the capability of creating a handwriting document in a particular writer's style. For example, the FBI might want to synthetically create a ransom note in a particular writer's style.

The detection of handwriting forged by novices

It is known that many forgeries, particularly those by novice forgers, are written slowly in order to accurately capture the writing shape and style of the true writer. Initial work in this area dealt with the development of a fractal number estimate of the wrinkliness of the handwriting from static images where we found an inverse correlation between the wrinkliness and the speed of the writing. See the summary of an M.S. dissertation presented at the International Graphonomics Society (IGS 2004) conference, The Detection of Forged Handwriting Using a Fractal Number Estimate of Wrinkliness. This research could be extended is several ways.

An analyzer that determines a writer's handedness and pen grip from static images of the handwriting

For forensic examinations it is important to be able to determine the probability that a writer of a document was left-handed or right-handed. Determining the writer's pen grip is also of significant value.

Language recognition from short telephone speech input

Completed 2002 DPS dissertation by Jonathan Law entitled An Efficient First Pass of a Two-Stage Approach for Automatic Language Identification of Telephone Speech.

Language recognition from text input

This is a topic that has important applications but has been minimally explored. One application is to find websites in a particular language and possibly in a particular domain as well. Another application is to detect a language shift, say from English to French, within the same document with the purpose of appropriately shifting to the proper accent when converting into speech in a Text-To-Speech (TTS) system. Otherwise, the TTS system when reading such sentences will treat everything as though it were English and likely mispronounce words in the other language. Working in this area, Bashir Ahmed, DPS 2004, completed his dissertation entitled "Detection of Foreign Words and Names in Written Text," summarized in the CSIS Research Day 2004 paper Language Identification from Text Using N-gram Based Cumulative Frequency Addition.

Spam detection

See CSIS Research Day paper describing preliminary work by an M.S. student from a course assignment, A Neural Network Classifier for Junk E-Mail. This work is currently being extended by DPS 2006 student Ted Markowitz, see CEAS 2006 Conference Paper.

Association Rules in Microarray Data

See paper describing preliminary work by an M.S. student, Mining Association Rules in Microarray Data.

General Pattern Recognition Studies

There are many problems in the area of pattern recognition, and several applications of pattern recognition have been described in the biometrics and interactive visual image sections. Applications to other areas are possible, and more fundamental pattern recognition problems can also be investigated.

Fundamental Pattern Recognition Research

Fundamental pattern recognition research can involve such areas as feature extraction, similarity and distance metrics, and pattern classification techniques. For example, see the following:

A theoretical study of shape recognition

Completed 2006 DPS dissertation study by Carl Abrams, see conference papers.

AI and Pattern Recognition studies relating to the Human Brain

See CSIS Research Day papers describing masters-level projects in this area.

Other Suggested Research

Technology life cycle and business-related studies

In the second-year DPS course on emerging information technologies, we studied the phases of the technology life cycle as described by Kendall and Kurzweil. Such a dissertation, for example, could examine (compare/contrast) the phases of the cycles described by Kendall and Kurzweil (and possibly other versions of the technology life cycle), give examples of technologies as they have gone through the cycle, and present one or more hypotheses and provide evidence to support them. Your hypothesis might be that the time period from invention to general use has been getting shorter and shorter, as might be anticipated by Kurzweil's idea of time speeding up. Another hypothesis might be that the time period from invention to general use for Internet inventions is shorter than for non-Internet inventions.

Combinatorial optimization problem

This is the problem of assigning students (or company employees) to project teams where each student (or employee) lists his/her project preferences, preferences for team meeting locations (based on where they live), preferences for scheduling weekly team meetings (day/time), and work experience. The instructor (or company project manager) can give different weights to the four parameters (and possibly others) and specify a different number of students to each project. The algorithm should optimize the assignment of students to team projects based on the above (or other) constraints, and each team should have a good mix of team member work experiences (leadership as well as technical capability). For preliminary work on this problem see the team group assignment paper. This is an NP-hard problem that is amenable to sub-optimal optimization algorithms, such as genetic algorithms. If you are interested in this problem, you can work with us, or possibly with Lixin Tao or Mike Gargano.

"Look Who's Speaking" Robot

For interactive man/machine dialogue it is desirable to develop a robot that turns to focus on the speaker. This will involve the determination of the location of the speaker from the audio signal of the voice source or from the visual image of the speaker, or both.