Data Mining Customer-Related Subway Incidents
Project ContinuationThis is a continuation of an earlier project, and the results of that project can be
found in a Research Day paper entitled
Data Mining Customer-Related Subway Incidents.
Rujul Inamdar (listed as the project's subject matter expert) did most of the data mining work on the earlier project
and will serve as a consultant on this project.
BackgroundWe have a large database of train incidence reports from the N.Y. Transit Authority. The data for each incident consists of the time of day, day of week, season (winter, spring,
summer, fall), occurring station, occurring Borough (Manhattan, Brooklyn, Bronx, Queens), and the trouble code. A brief summary of the data in the database and some related material are shown in
Predictive Model for Customer Related Subway Incidents and Analysis of Service Related
Contributory or Causative Factors of Subway Rail Rage.
ProjectWe want to conduct data mining experiments on the database. We are particularly interested in problems involving train customers, and these involve trouble codes for armed customer,
door incident, sick customer, injured customer, unruly customer, vandalism, etc. We will begin by examining the number of different types of incidences over time (from year to year), over seasons,
at various stations and Boroughs, etc. By spotting trends and making discoveries in the data, our overall goal is to make improvements in the system of handling incidences and possibly reduce the
number of incident occurrences.
The following is an example scenario that might be investigated. A passenger’s frustration builds while waiting for a train that has been delayed. Upon it’s arrival, the train crew
informs the customers of further extensive delays. The customer’s vented displeasure over the perceived poor service escalates from a heated verbal exchange to shoving match and threats of
further physical harm against a member of the train crew. Related incidence reports in the database might include a report of customer threats against a member of the train crew. Then, using the
time, date, and location of the reported threats, we might mine the database to discover that a train coming into that platform was indeed delayed for over thirty minutes and might have been the
source of the customer's anger. On the other hand, an examination of a similar threat might find no obvious poor service that might have caused the anger, perhaps indicating that the customer was
likely irritated by a cause unrelated to train service (mental illness, family problems, etc.). By examining a large number of such incidents, we might be able to estimate percentages on the
pathways of a Customer Aggression Model.
Through a literature search, we would also like to obtain a comparison of how such incidents are monitored and handled by all the large metropolitan metro systems (London, Berlin, Moscow, Paris,
Tokyo, Shanghai, etc.).
Inroads on the following items should be made early in the semester:
- Perform standard data mining techniques on the subway incident database to look for unusual/unexpected correlations. An unexpected correlation often mentioned in data mining textbooks is that
when women send their husbands to the grocery store to by diapers for their infant child, the husband often picks up a six-pack of beer -- and the stores use this correlation to put beer on
shelves near the diapers. Another example is the Analysis of Emergency 911 Calls.
- A determination should be made concerning the feasibility of determining whether a customer's source of anger can be traced to train delays or other problems.
- A improvement (refinement) of the customer aggression model should be provided.
- If possible, rough estimates should be determined for the strengths of the links in the customer aggression model.
ToolsWeka is a set of Java algorithms for data mining, see the following links