**Department of Systems Engineering and Operations Research**

**George Mason University**

**Fall 2018**

This is a PhD course focused on developing deep learning predictive models. We will learn both practical and theoretical aspects of deep learning. We will consider applications in engineering, finance and artificial intelligence. It is targeted towards the students who have completed an introductory courses in statistics and optimization. We will make extensive use of computational tools, such as the Python language, both for illustration in class and in homework problems. The class will consist of 12 lectures given by the instructor on several advanced topics in deep learning. At another 3 lectures students will present on the topic of their choice.

4/20/2018: First class is on Aug 27 at 4:30pm

**Course Materials**: Dropbox folder

**Instructor**: Vadim Sokolov (vsokolov(at)gmu.edu)

**Office**: Engineering Building, Room 2242

**Tel**: 703 993-4533

**Office hours**: By appointment

**Lectures**: Enterprise Hall 77. 4:30-7:10pm on Mondays

**Grades**: 40% homework (individual), 60% final research project (group)

Convex Optimization

Stochastic gradient descent and its variants (ADAM, RMSpropr, Nesterov acceleration)

Second order methods

ADMM

Regularization (l1, l2 and dropout)

Batch normalization

Theory of deep learning

Universal approximators

Curse of dimensionality

Kernel spaces

Topology and geometry

Computational aspects (accelerated linear algebra, reduced precision calculations, parallelism)

Architectures (CNN, LSTM, MLP, VAE)

Bayesian DL

Deep reinforcement learning

Hyperparameter selection and parameter initialization

Generative models (GANs)

Deep Learning (book page)

Deep Learning with Python (book page)

Learning Deep Architectures for AI (monograph)

Why does Monte Carlo Fail to Work Properly in High-Dimensional Optimization Problems? (paper)

Kolmogorov Superposition Theorem and its application to multivariate function decompositions and image representation (paper)

Revisiting the Unreasonable Effectiveness of Data (blog)

Representation Learning: A Review and New Perspectives (paper)

An Application of Kolmogorov’s Superposition Theorem to Function Reconstruction in Higher Dimensions (dissertation)

On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of a Smaller Number of Variables ([https:link.springer.com

*chapter*10.1007/978-94-011-3030-1_55On functions of three variables (paper)

Tuning CNN architecture (blog)

Sequence to Sequence Learning with Neural Networks (paper)

Skip RNN (blog and paper)

Learning the Enigma with Recurrent Neural Networks (blog)

LSTM blog

VAE with a VampPrior (paper)

Bayesian DL (blog)

Recognition Networks for Approximate Inference in BN20 Networks (paper)

Non-linear regression models for Approximate Bayesian Computation (paper)

DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression (paper)

Fast ε-free Inference of Simulation Models with Bayesian Conditional Density Estimation (paper)

Auto-Encoding Variational Bayes (paper)

Composing graphical models with neural networks for structured representations and fast inference (paper)

Generative Adversarial Networks (presentation)

GANs at OpenAI (blog)

Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks (paper)

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (paper)

Auto-Encoding Variational Bayes (paper)

Twin Networks: Using the Future as a Regularizer (paper)

Don't Decay the Learning Rate, Increase the Batch Size (paper)

DL Tuning (blog)

50 Years of Data Science by Donoho (paper)

Security (blog)

Unsupervised learning (blog)

Cybersecurity (paper collection)

Stanford's CS231n (course page)

Stanford's STATS385 (course page)

UC Berkeley Stat241B (lectures)

UCUC CSE598 (course page)

TF Playground (Google)

SnakeViz (python profiler)

Pytorch resources (a curated list of tutorials, papers, projects)