Starting this week (March 23) this class is moving to online. The section below will have the schedule and a list of online resources. Before you proceed, you need to sign up for DataCamp, here is the invite link. You will need to use your @masonlive.gmu.edu email to get free access.

Week of March 23: 1. Datacamp:https:learn.datacamp.com

*courses*working-with-dates-and-times-in-r 2. Resampling Methods (Chapter 5)Week of March 30: 1. Midterm due

Week of April 6: 1. Linear Model Selection and Regularization (Chapter 6) 2. Project proposals Due

Week of April 12 1. HW3 Due 2. Tree-based Methods (Chapter 8) 3. Datacamp: https:learn.datacamp.com

*courses*machine-learning-with-tree-based-models-in-rWeek of April 19 1. HW4 Due 2. Tree-based Methods (Chapter 8)

Week of April 26 1. Project intermediate reports are Due 2. Life session discussing the projects

Week of April 27 1. HW5 Due 2. Unsupervised Learning (Chapter 10)

Week of May 4 1. Final Project Presentations Due. You can upload on YouTube or to write a Blogpost

Graduate standing (Undergraduate engineering math: Calculus, probability theory, statistics, and some basic computer programming skills.)

Week 1: Predicting with probability (OpenIntro Statistics Ch 2, 3)

Week 3: Data and Statistics (OpenIntro Statistics Ch 4, 5)

Week 4: Linear regression (ISLR Ch 3)

Week 5: Model diagnostics (APM Ch 4, ISLR Ch 2, 5)

Week 6: Classification (ISLR Ch 4, APM Ch 12, Tennis)

Week 7: In-class midterm

Week 8: Tree-based methods (ISLR Ch 8)

Week 9: Optimization (Notes)

Regularization

Model Estimation

Robust Model Estimation

Week 10: Lasso and Model Selection (ISLR Ch 6)

Week 11: SVM (ISLR Ch 9)

Week 12: Clustering + PCA (ISLR Ch 10)

Week 13: Time series forecasting (FPP)

Week 14: Final Project presentations

Exam week: Projects due

Students will have a in-class midterm exam and final project. There are 5 homework assignments; students are encouraged to work in small groups. Each homework has 2-3 ‘‘theoretical questions’’ and 2-3 ‘‘hands-on’’ problems. Theoretical questions will be based on the material covered in class. Hands-on problems will require using R and routines provided by instructor to perform data analysis tasks. For the final project a student or a group of students can choose their own data set and a hypothesis to verify. Instructor will have 1-2 data sets/analysis problems, in case students have hard time identifying it on their own. Work on the final project can begin as soon as class starts. Each group will submit the final report.

You can choose which software you use. I recommend investing the time to learn R. Python is good choice as well. R is the dominant software package for real world Predictive Analytics and is used throughout other courses. This open-source software is available for free download at www.r-project.org and you can find documentation there. A great way to start learning is to buy a book and start working through tutorials. A good guide is Adler’s R in a Nutshell. They have many tutorials to help you get up to speed. You can browse other options by searching ‘R statistics’ on Amazon. If you are new to R (and even if not) you should complete a tutorial to familiarize yourself with the language. A great option is the TryR code school.

Take home Midterm 40% + Final project + 30% + Homework 30%. Scores of each component are normalized to be out of 100. Grades will be posted on Bb. Cut-offs: 97 (A+), 93 (A), 90 (A-), 87 (B+), 82 (B), 79 (B-), 77 (C+), 73 (C), 70 (C-), 67 (D+), 60 (D)

Upload one file for the report pdf

*word) and one for the code and supplementary material (R*zip).Handwritten report must be neat and easily readable.

The report must be CLEAR and BRIEF. Try not to compensate the lack of the main discussion by giving the definitions and redundant explanations. Please, address exactly what is asked in the problem.

If you are to put some code segment in the report body, copy only essential parts of the code you added on the top of the provided code.

The code must be well separated by problem.

You can submit homeworks late with no penalty before they get graded (you simply taking a chance that your HW won't be graded if submitted past due)

There are two options:

Data analysis. You can deliver results in a form of a report, webpage (a blog post or a dedicated webpage), presentation, or video.

Tutorial on a topic not covered in class. Same delivery options (cannot be done in group).

You can mix Data analysis and Tutorial, i.e. describe methodology that was not covered in class and use it to perform analysis of a real life dataset (can be done in a group) The data analysis project will be evaluated based on the following criteria:

How adequate is the chosen analysis methodology for the problem at hand

Level of sophistication of the analysis

Correctness of the conclusions (or absence) made based on the data

The presentation of the results Results on the data analysis project should contain:

Problem description and possible hypothesis

Data description

Methodologies used (larger groups should try different methodologies if applicable)

Results and conclusion Final project attribution

Each student documents his/her contribution

Individual grades will be based on full participation in the project Other requirements

Data analysis final project to be done in a group of size more then one and less then five

The page limit for final report is 6. Use HW submission guidelines. Submit code and data in a separate archived file

For the data analysis project, the team forms the hypothesis, finds the data and performs the analysis. Some suggested data sets are listed on the course page.

Each team presents their final project at the last class of the semester. Given the size of the class each team will have 6 minutes for presentation. Presentations can be informal but need to convey message clearly and be informative and useful to your peer students. Please respect the time limit!