The goal of this project is to develop a set of Natural Language Processing (NLP) tools. This is an opportunity to learn about NLP and to learn how to build research-oriented software tools (algorithms) in Java.

The project customer (mentor) is a research scientist in the Speech Group at the IBM T.J. Watson Research Lab in Yorktown. Throughout the project, he will be actively involved and coordinate the team's effort to code the software and perform preliminary experiments. He will instruct the team on graph theory and how to build several basic NLP algorithms that can be found in the book Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Jurafsky and Martin. The team will also implement a new algorithm for statistical string pattern matching.

In the first semester, the team will understand the algorithms, implement them in Java, and perfrom some basic testing.

In the second semester, the team will improve the toolkit to deal with more general and complicated senarios. The team will also learn how to improve the software algorithms based on experimental results.