Driver Dynamics Big Data Repository

Background

Driving behavior is believed to be unique to an individual. The way a driver turns a steering wheel in a vehicle depends on hand-eye coordination, hand shape and size, muscle control, foot strength and experience with the vehicle. Driving behavior is ubiquitous for automobile drivers so there is much motivation to be able to identify and verify a driver based on driving dynamics.

The customer is trying to build secure database that can be utilized as a master data source to perform driving behavior analysis using Weka (via ODBC connection). Every day the customer receives 3-4 email blasts of zip files which contain 200+ CSV files which are tracked to an equal or greater number of drivers. Each file is recorded with an ID as a primary key, and about 113 columns of driving dynamic parameters. The ID is actually a vehicle id (which is unique to a driver) but doesn't uniquely represent a driver by name, etc. It uniquely represents a particular vehicle driven by a user. This ID can be used across time to track the same driver driving the same vehicle over a period of time.

The multiple email blast is simply due to limitations to the size of the ZIP file that can be produced. Because of the zip file size limit, multiple zip files are created and sent via multiple emails to the customer, which is difficult to manage. All CSVs in a zip file can be safely combined with all emails received on one day and be taken as the totality of the data for the day.

The problem is the customer has been using MS Access, and running special .BAT program in order to combine data files. For instance, one day of data from 1/10/17 is 439 CSV files, which totals 925MB. MS Access has a size limit of 2GB, therefore, the amount of data is just too much for MS Access - the customer can only store and manipulate (1) day of data in Access - and this preventing the customer from furthering their research.

Project

Database: This semester the database will be populated with driver dynamics data from 2016 and 2017.

See Project Slides Description.