Interactive Visual System*

The goal of visual pattern recognition during the past fifty years has been the development of automated systems that rival or even surpass human accuracy, at higher speed and lower cost. Human interaction is considered, if at all, only to deal with "rejects" in the final step.

There are pronounced differences between human and machine cognitive abilities. Humans apply to recognition a rich set of contextual constraints and superior noise filtering abilities to excel in gestalt tasks, like object-background separation. Computers, however, can store thousands of images and associations between them, never forget a name or a label, and compute geometric moments and probability distributions.

These differences suggest that a system that combines human and machine abilities can, in some situations, outperform both. This is the general goal of CAVIAR (Computer Assisted Visual InteraActive Recognition).

Interesting research problems include:

This application of CAVIAR is designed to recognize wild flowers, or other families of similar objects, more accurately than machine vision and faster than most laypersons. It draws on the technologies of sequential pattern recognition, image database, expert systems, pen computing, and digital camera technology. For a description of a rudimentary system implemented on a laptop, see the paper Interactive Visual Pattern Recognition by G. Nagy and J. Zou, Proc. Int. Conf. Pattern Recognition (ICPR), 2002.

This project concerns the development of the system on a handheld computer together with the direct capture and processing of associated photos. Suggested steps to follow are:

If the code does not fit into the handheld, or cannot be ported/converted to run on the handheld, it is likely due to the size of the pattern recognition portion of the code. The alternative is to run that code on a server, but that may also be difficult or impractical. Thus, for a minimal system, the code that provides the human interaction should be ported/converted and the interaction tested by having the system limit the choices based on number of petals and other features entered by the user. Photos of the resulting choices can then be shown to the user for his/her final decision.

It has been suggested that IPAQ or Sharp Zaurus would be suitable handheld platforms. Both take a plug-in camera, and both run linux. We are not sure whether IPAQ has a camera driver for linux. TWAIN is Windows oriented. Other candidates might be PALM O/S and Ricoh. Yet another is HP Jornada 586 Pocket PC with HP pocket camera, as used in a CMU sign translation project demonstrated at ICPR2002; although they are trying to solve a simpler problem fully automatically, their hardware requirements appear to be the same as CAVIAR.

* Much of the project description background was taken directly from Jie Zou's Web site.