DataCamp. You need to sign up for it, using the invite link. You will need to use your @masonlive.gmu.edu email to get free access.
Stanford's Statistical Learning class.
Piazza for discussing HW/project/midterm. You can sign up here
For each week I provide direct links to DataCamp courses and Statistical Learning videos.
Datacamp: Working with Dates and Times in R
Resampling Methods (ISLR Chapter 5). Videos: a, b, c, d, e, HandsOn
Midterm due. Please submit 3 files to the BB. (a) First page with your signature acknowledging the honor code (b) PDF file with you solution (do not copy-paste large code chunks, just essential parts that you added on top of my scripts) (c) zip file with R scripts.
Linear Model Selection and Regularization (ISLR Chapter 6). Videos: a, b, c, d and e, f, g, h, i, j, k, HandsOn
Project proposals Due. Proposal is to be half-a-page and to contain: (a) description of the data to be analysed. (b) What is the goal of analysis, e.g. predict y from x or understand relations between y and x. (c) Why this is important? (d) What methods you think you will use? You can do it in bullet-points.
HW4 Due (ISLR: Ch 8, Ex. 4, 7, 8, 11)
Unsupervised Learning (ISLR Chapter 10). Videos: Unsupervised Learning (Chapter 10) a, b, c, d, e, HandsOn
Datacamp: Unsupervised Learning in R
Project intermediate reports are Due
Life session discussing the projects
HW5 Due (ISLR: Ch 10, Ex. 3, 10)
Optimization (I will post my videos on YouTube and will post links here when done)
Datacamp: Communicating with Data in the Tidyverse (Ch 3-4). Plus you can use rmarkdown materials on the official website
Final Project Presentations Due. Delivered as html file generated from R Markdown. Your HTML pages will be published on the course page and will be publicly available.
Introduces predictive analytics with applications in engineering, business, finance,health care, and social economic areas. Topics include time series and cross-sectional data processing, data visualization, correlation, linear and multiple regressions, classification and clustering, time series decomposition, factor models and causal models, predictive modeling performance analysis, and case study. Provides a foundation of basic theory and methodology with applied examples to analyze large engineering, social, and econometric data for predictive decision making. Hands-on experiments with R will be emphasized.
Syllabus
Fridays 4:30 pm - 7:10pm at Sandbridge Hall 107 (Jan 21, 2020 - May 13, 2020)
Grade composition: Grade based entirely on participation in class, homework assignments, in-class midterm and final project.
Diez, Barr and Cetinkaya-Rundel OpenIntro Statistics, OpenIntro, 2015
James, Witten, Hastie and Tibshirani, An Introduction to Statistical Learning with Applications in R, Springer, 2009.
Kuhn and Johnson, Applied Predictive Modeling, Springer, 2013.
Hyndman and Athanasopoulos, Forecasting: Principles and Practice, OTexts, 2013.
Airbnb (Random Forest)
Facebook (Decision trees and logistic regrsssion)
Youtube (deep learning)
Uber (time series)
Debby Kermer (data services): contact info
UCE ML Repo (Lots of datasets along with descriptions of each)
Knuggets (Lot of links to datasets relevant for data mining)
ExonData (The site has links to plenty of regional, state, and local economic data)
3stages (Searchable listing 363 Internet sites of Social Science data)
data.gov (A repository for information collected by the federal government)
Chicago (Urban Analytics)
Here are the courses that cover different aspects of data science
Statistical modeling (STAT250, SYST664, OR719, STAT 554)
Data management (AIT614)
Optimization (OR604)