Graduate standing (Undergraduate engineering math: Calculus, probability theory, statistics, and some basic computer programming skills.)
Week 1: Predicting with probability (OpenIntro Statistics Ch 2, 3)
Week 3: Data and Statistics (OpenIntro Statistics Ch 4, 5)
Week 4: Linear regression (ISLR Ch 3)
Week 5: Model diagnostics (APM Ch 4, ISLR Ch 2, 5)
Week 6: Classification (ISLR Ch 4, APM Ch 12)
Week 7: Take home mid-term
Week 8: Tree-based methods (ISLR Ch 8)
Week 9: Optimization (Notes)
Regularization
Model Estimation
Robust Model Estimation
Week 10: Lasso and Model Selection (ISLR Ch 6)
Week 11: SVM (ISLR Ch 9)
Week 12: Clustering + PCA (ISLR Ch 10)
Week 13: Time series forecasting (FPP)
Week 14: Final Project presentations
Exam week: Projects due
Students will have a take-home midterm exam and final project. There are approximately 6 homework assignments; students are encouraged to work in small groups. Each homework has 2-3 ‘‘theoretical questions’’ and 2-3 ‘‘hands-on’’ problems. Theoretical questions will be based on the material covered in class. Hands-on problems will require using R and routines provided by instructor to perform data analysis tasks. For the final project a student or a group of students can choose their own data set and a hypothesis to verify. Instructor will have 1-2 data sets/analysis problems, in case students have hard time identifying it on their own. Work on the final project can begin as soon as class starts. Each group will submit the final report.
You can choose which software you use. I recommend investing the time to learn R. Python is good choice as well. R is the dominant software package for real world Predictive Analytics and is used throughout other courses. This open-source software is available for free download at www.r-project.org and you can find documentation there. A great way to start learning is to buy a book and start working through tutorials. A good guide is Adler’s R in a Nutshell. They have many tutorials to help you get up to speed. You can browse other options by searching ‘R statistics’ on Amazon. If you are new to R (and even if not) you should complete a tutorial to familiarize yourself with the language. A great option is the TryR code school.
Take home Midterm 40% + Final project + 30% + Homework 30%. Scores of each component are normalized to be out of 100. Grades will be posted on Bb. Cut-offs: 97 (A+), 93 (A), 90 (A-), 87 (B+), 82 (B), 79 (B-), 77 (C+), 73 (C), 70 (C-), 67 (D+), 60 (D)
Upload ONE file for the report pdfword) and one for the code and supplementary material (Rzip).
No canned handwriting. If prefer handwriting then submit in the class. Handwritten report must be neat and easily readable.
The report must be CLEAR and BRIEF. I would like to ensure the students that redundant information not only does not help them in getting more credit, but also may result in extra penalty. Thus I advise them try not to and compensate the lack of the main discussion by giving the definitions and redundant explanations. They must be right to the point, and address exactly what is asked in the problem; nothing more or less. They should also double check their report to ensure the explanation is clear, simple and understandable. Any ambiguous explanation may result in penalty.
Whenever the students feel that they need to also put some code segment in the report body, they must only copy the shortest possible amount of code. Furthermore, they should not copy the code segments that you provided for them. Violation from this point may result in penalty.
The solutions must be written in the right order. Problem 1, Problem 2.a, 2.b and …
The code must be well separated. When I open the code, I should easily see which part corresponds to each problem. Thus, students must mark each section clearly. Of course, the more clear and readable their code is, it is more likely that they can earn the right credit they deserve. Cleaning the code in scale of homework problems is a matter of a few minutes. It really worth it to take this short time and avoid possible credit loss. It will also be a very good practice for them in their future jobs.
You can submit homeworks late with no penalty before they get graded (you simply taking a chance that your HW won't be graded if submitted past due)
No late submissions for take-home midterm or final project accepted.
There are two options:
Data analysis. You can deliver results in a form of a report, webpage (a blog post or a dedicated webpage), presentation, or video.
Tutorial on a topic not covered in class. Same delivery options (cannot be done in group).
You can mix Data analysis and Tutorial, i.e. describe methodology that was not covered in class and use it to perform analysis of a real life dataset (can be done in a group) The data analysis project will be evaluated based on the following criteria:
How adequate is the chosen analysis methodology for the problem at hand
Level of sophistication of the analysis
Correctness of the conclusions (or absence) made based on the data
The presentation of the results Results on the data analysis project should contain:
Problem description and possible hypothesis
Data description
Methodologies used (larger groups should try different methodologies if applicable)
Results and conclusion Final project attribution
Each student documents his/her contribution
Individual grades will be based on full participation in the project Other requirements
Data analysis final project to be done in a group of size more then one and less then five
The page limit for final report is 6. Use HW submission guidelines. Submit code and data in a separate archived file
For the data analysis project, the team forms the hypothesis, finds the data and performs the analysis. Some suggested data sets are listed on the course page.
Each team presents their final project at the last class of the semester. Given the size of the class each team will have 6 minutes for presentation. Presentations can be informal but need to convey message clearly and be informative and useful to your peer students. Please respect the time limit!