There are about 60 DPS dissertations, see
We want to categorize the DPS dissertations in a number of ways and collect associated information in one location.
Each dissertation will be given
- subject matter category: software development - agile, software development - non agile, agile appproach in non software development areas,
pattern recognition - biometrics, pattern recognition - other than biometrics, networking, data mining, web/internet related, etc.
- methods used: surveys, interviews, statistical analysis, programming, etc.
- advisor and other committee members
- related publications (no more than 2)
- years to completion
A database will collect this information with a web interface to facilitate access.
The software will be a MySQL database with PHP providing the web interface.
The database will consist of the following three tables exactly as specified below.
TABLE: Dissertations - contains statistics on each dissertation:
TABLE: Committee Members:
- PK - author (full name in one field)
- class year (2002, 2003, ..., etc.)
- fraction of years not enrolled (usually 0, but some students took time off)
- date of successful defense (mm/dd/yyyy format)
- months-to-completion = month_of_defense + 3 -12*fraction_of_years_not_enrolled + 12*(year_of_defense - class_year +1)
- dissertation title
- FK - committee member 1 = advisor (full name in one field)
- FK - committee member 2 (full name in one field)
- FK - committee member 3 (full name in one field)
- FK - committee member 4, if any (full name in one field)
- FK - committee member 5, if any (full name in one field)
- FK - committee member = earlier advisor 1, if any (full name in one field, indicates student-faculty disconnect)
- FK - committee member = earlier advisor 2, if any (full name in one field, indicates another student-faculty disconnect)
- primary subject category (customer provided)
- secondary subject category, if any (customer provided)
- tertiary subject category, if any (customer provided)
- primary method used (customer provided)
- secondary method used, if any (customer provided)
- tertiary method used, if any (customer provided)
- number of pages total
- number of pages without appendices
- number of figures
- number of tables
- number of numbered and cited references
- number of bibliography documents
(some dissertations have a general bibliography in addition to cited references)
- FK - title of external publication 1 (customer provided)
- FK - title of external publication 2 (customer provided)
TABLE: External Publications:
- PK - name (full name in one field)
- institution (Pace, IBM, etc.)
Project customers will provide lists of subject categories and methods used,
and provide their assignments to each dissertation.
They will also provide the related external publications and authors.
- PK - title of external publication
- FK - author - of related dissertation
- FK - author - committee member 1 = advisor, if any
- FK - author - committee member 2, if any
- FK - author - committee member 3, if any
- FK - author - committee member 4, if any
- FK - author - committee member 5, if any
- other citing information (book chapter/journal/conference, date of publication, etc.)
Some reports of interest are:
Customer Rinaldo DiGiorgio also has ideas for accessing information directly from the dissertations
by using context and extraction software, for details see
Automatic Dissertation Information Extraction.
- appropriate averages of the individual dissertation information: average number of pages, etc.
- table of subject categories with dissertation counts, methods used, and average years-to-completion
- table of dissertation advisors with dissertation counts, breakdown of subject categories and methods used, and average years-to-completion
(this will provide information on the subjects categories and methods supported by faculty members)
- table of methods used with average years-to-completion
Fast Agile XP Deliverables
We will use the agile methodology,
particularly Extreme Programming (XP) which involves small releases and fast turnarounds in roughly two-week iterations.
Many of these deliverables can be done in parallel by different members or subsets of the team.
The following is the current list of deliverables
(ordered by the date initiated,
initiated date marked in bold red if programming involved,
deliverable modifications marked in red,
completion date and related comments marked in green,
pseudo-code marked in blue):
- 9/24 10/11
Create an Entity Relationship Diagram (ERD) for the database with tables for dissertations, faculty advisors,
subject categories, methods, etc.
Get approval on the ERD from the customers.
- 9/24 11/18
Construct the database and after minimal database population run some simple queries to test it.
- 9/24 11/18
Populate the database.
The customers will provide subject categories, methods, and related publications for each dissertation.
- 9/24 11/18
Obtain the reports suggested here and possible others to be determined by the customers.
Simultaneous to the above deliverables, begin work on customer Rinaldo's associated portion of the project.
Fix/modify the following in the database:
- Author table: Abraham Guerra need the "months-to-completion"
- Advisor/Dissertation report: add columns for number-of-committees (advisor + other committee member),
median-years-to-completion, number of publications, and if possible add counts to the categories and methods
- Subject/Years report: add number-of-dissertations, median years to completion, and let us know which author is missing a subject
- Method/Years report: add number-of-dissertations, median years to completion, and let us know which author is missing a method
- Create a publication report: total number of publications, number of authors with one or more publications,
average publications per author, number of advisors with one or more publications, average number of advisor publications,
percentages of publications by subject