Linguistic Analysis of DPS Dissertations
The Doctor of Professional Studies (DPS) in Computing at Pace University is a unique doctoral program
that allows active IT professionals to earn a doctorate degree in three years through part-time study.
There are currently about 70 completed DPS dissertations, see
The goal of this project is to use DPS dissertations as training datasets so that documents can be classified in real-time.
A typical use case would be a user performing searches on the Internet and having documents "scored" in terms of relevance to the training datasets.
This project basically involves learning information extraction tools, such as LingPipe and Mallet, and applying them to the DPS dissertations.
These are very interesting tools!
For additional information see
- Using a topic extractor of your choice extract a training data set.
Two good choices are
- Take the references section in each dissertation and see what the scores for each document are, you can normalize them to your particular tool choice.
- Do some searches for scholarly works in the area and see what the scores are for those documents.
- Use some other sources of documents at least 10, randomly selected or specifically chosen and see what the results are.
- Analyze twitter feed and see if the application process.
Fast Agile XP Deliverables We will use the agile methodology, particularly Extreme Programming (XP) which involves small releases and fast turnarounds in roughly two-week iterations. Many of these deliverables can be done in parallel by different members or subsets of the team.
The following is the current list of deliverables
(ordered by the date initiated,
initiated date marked in bold red if programming involved,
deliverable modifications marked in red,
completion date and related comments marked in green,
pseudo-code marked in blue):
- 2/1 .
Plan the semester's work with your customer Rinaldo DiGiorgio.