CSC 591: Foundations of Data Science

Graduate Course for Data Science Track, NCSU, Computer Science Department, 2015

Course Description: Students will learn core data science principles related to statistical data analysis. This course introduces ideas in statistical learning and will help students prepare for advanced courses in data mining and machine learning. Focus will also be given on applying these principles for variety of data analysis tasks using R. Topics: Random variables and probability distributions, exploratory data analysis, variable selection, sampling methods, histograms and probability distributions, density estimation, missing data and imputation, mixture models, latent variables, and expectation maximization, regression analysis, discriminant analysis, bagging and boosting, principle component analysis, information theory – entropy, mutual information, Bayesian information criteria, conditional independence, rescaling and low-dimensional summaries, factor analysis, graphical causal models and causal inference, and evaluating predictive models.

Text Books:

  • (Required): Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. “An Introduction to Statistical Learning with Applications in R” (Free book, with slides, R-code and data).
  • Richard A. Berk. “Statistical Learning from a Regression Perspective.” Springer.
  • Géza Schay. Introduction to Probability with Statistical Applications, Springer.
  • Birger Stjernholm Madsen. Statistics for Non-Statisticians, Springer.
  • OpenIntro Statistics, 3ed. https://www.openintro.org/stat/textbook.php
  • Information Theory, Inference, and Learning Algorithms, by David MacKay. (http://www.inference.phy.cam.ac.uk/itprnn/book.html).

Offerings: Fall 2015, Fall 2016, Fall 2017, Fall 2018, Fall 2019.