Data Mining Customer-Related Subway Incidents
We have a large database of train incidence reports from the N.Y. Transit Authority.
The data for each incident consists of the time of day, day of week,
season (winter, spring, summer, fall), occurring station, occurring Borough
(Manhattan, Brooklyn, Bronx, Queens), and the trouble code.
A brief summary of the data in the database and some related material are shown in
Predictive Model for Customer Related Subway Incidents and
Analysis of Service Related Contributory or Causative Factors of Subway Rail Rage.
We want to conduct data mining experiments on the database.
We are particularly interested in problems involving train customers, and these involve
trouble codes for armed customer, door incident, sick customer, injured customer,
unruly customer, vandalism, etc.
We will begin by examining the number of different types of incidences over time (from year to year),
over seasons, at various stations and Boroughs, etc.
By spotting trends and making discoveries in the data,
our overall goal is to make improvements in the system of handling incidences
and possibly reduce the number of incident occurrences.
The following is an example scenario that might be investigated.
A passengerís frustration builds while waiting for a train that has been delayed.
Upon itís arrival, the train crew informs the customers of further extensive delays.
The customerís vented displeasure over the perceived poor service escalates from a heated verbal exchange
to shoving match and threats of further physical harm against a member of the train crew.
Related incidence reports in the database might include a report of customer threats against a member of the train crew.
Then, using the time, date, and location of the reported threats, we might mine the database
to discover that a train coming into that platform
was indeed delayed for over thirty minutes and might have been the source of the customer's anger.
On the other hand, an examination of a similar threat might find no obvious poor service that might have caused the anger,
perhaps indicating that the customer was likely irritated by a cause unrelated to train service
(mental illness, family problems, etc.).
By examining a large number of such incidents, we might be able to estimate percentages on the pathways of a
Customer Aggression Model.
Through a literature search,
we would also like to obtain a comparison of how such incidents are monitored and handled
by all the large metropolitan metro systems (London, Berlin, Moscow, Paris, Tokyo, Shanghai, etc.).
Midterm Checkpoint (our second classroom meeting).
Inroads on the following items should be made:
- Perform standard data mining techniques on the subway incident database to look for unusual/unexpected correlations.
An unexpected correlation often mentioned in data mining textbooks is
that when women send their husbands to the grocery store to by diapers
for their infant child, the husband often picks up a six-pack of beer --
and the stores use this correlation to put beer on shelves near the diapers.
Another example is the
Analysis of Emergency 911 Calls.
- A determination should be made concerning the feasibility of determining
whether a customer's source of anger can be traced to train delays or other problems.
- A improvement (refinement) of the customer aggression model should be provided.
- If possible, rough estimates should be determined for the strengths of the links in the customer aggression model.
Weka is a set of Java algorithms for data mining, see the following links