**Textbooks Available Online:**

*Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data*, EMC Education Services 2015.

*Big Data Now,* O'Reilly Media 2012.

*Pattern Classification*, Duda, Hart, and Stork. Wiley 2000.

*Pattern Recognition and Machine Learning*, Bishop. Springer 2006.

*The Elements of Statistical Learning*,
Hastie, Tibshirani, and Friedman. Springer 2011.

*Data Mining (3rd Ed.)*, Witten, Frank, and Hall. Morgan Kaufmann 2011.

**Other Books**

*Introduction to Algorithms*, Cormen, Leiserson, Rivest, and Stein. MIT Press 2009.

*Big Data, Data Mining, and Machine Learning*, Jared Dean. Wiley 2014.

*Analytics in a Big Data World: The Essential Guide to Data Science and its Applications*, Bart Baesens. Wiley 2014.

*Data Smart: Using Data Science to Transform Information into Insight*, John Foreman. Wiley 2013.

*Doing Data Science: Straight Talk from the Frontline*, Cathy O'Neil and Rachel Schutt. O'Reilly 2013.

**Assignment 1: Bayes Decision Theory vs kNN**

Introduction to pattern classification and machine learning: Duda, Chapter 1

Bayes Decision Theory: Duda, Chapter 2

Non-parametric classification procedures, including kNN: Duda, Chapter 4

kNN procedure: Hastie, Chapter 2

*Bayes' Theorem*:
*New York Times Articles on Bayesian Statistics*
*Video 1*
*Video 2*
*Video 3*

Normal Probability Distribution:
*Univariate Distribution*
*Multivariate Distribution*

*Covariance Matrix*
*Mahalanobis Distance*

*k-Nearest-Neighbor (kNN) algorithm*
*Video 1*
*Video 2*
*Video 3*

**Assignment 2: Linear Regression**

Simple linear regression - Khan Academy

*Formula Derivation (4 parts)*

*Examples (2 parts)*

*R-squared or coefficient of determination (2 parts)*

*Linear regression calculator*

General regression: Bishop, Chapter 1

*General regression analysis via matrix pseudoinverse - Algorithms texbook*

*Linear Regression versus Principal Component Analysis*

**Assignment 3: K-Means Clustering**

*Algorithm when seeds are samples*

*Algorithm when seeds are random points, not samples*