**Department of Systems Engineering and Operations Research**

**George Mason University**

**Fall 2019**

This is a graduate level course focused on developing deep learning predictive models. We will learn both practical and theoretical aspects of deep learning. We will consider applications in engineering, finance and artificial intelligence. It is targeted towards the students who have completed an introductory courses in statistics and optimization. We will make extensive use of computational tools, such as the Python language, both for illustration in class and in homework problems. The class will consist of 9 lectures given by the instructor on several advanced topics in deep learning. At another 5 lectures students will present on a given topic.

3/15/2018: No Class on November 27 (Thanksgiving recess)

3/15/2018: First class is on Aug 28 at 7:10pm

3/15/2018: Last class is on Dec 4

**Course Materials**: to be posted

**Instructor**: Vadim Sokolov (vsokolov(at)gmu.edu)

**Office**: Engineering Building, Room 2242

**Tel**: 703 993-4533

**Office hours**: By appointment

**Lectures**: Krug Hall 5. 7:30-10pm on Wed

**Grades**: 40% homework, 60% class presentations

Convex Optimization

Stochastic gradient descent and its variants (ADAM, RMSpropr, Nesterov acceleration)

Second order methods

ADMM

Regularization (l1, l2 and dropout)

Batch normalization

Theory of deep learning

Universal approximators

Curse of dimensionality

Kernel spaces

Topology and geometry

Computational aspects (accelerated linear algebra, reduced precision calculations, parallelism)

Architectures (CNN, LSTM, MLP, VAE)

Bayesian DL

Deep reinforcement learning

Hyperparameter selection and parameter initialization

Generative models (GANs)

Deep Learning (book page)

Deep Learning with Python (book page)

Learning Deep Architectures for AI (monograph)

Tuning CNN architecture (blog)

Sequence to Sequence Learning with Neural Networks (paper)

Skip RNN (blog and paper)

Learning the Enigma with Recurrent Neural Networks (blog)

LSTM blog

Generative Adversarial Networks (presentation)

GANs at OpenAI (blog)

Adaptive Neural Trees (paper)

Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

An intriguing failing of convolutional neural networks and the CoordConv solution

HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent (paper)

SGD (link)

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

Neural Architecture Search with Reinforcement Learning (code)

Regularized Evolution for Image Classifier Architecture Search

On the importance of initialization and momentum in deep learning

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

The Marginal Value of Adaptive Gradient Methods in Machine Learning

Polyak, Boris, and Pavel Shcherbakov. “Why does Monte Carlo fail to work properly in high-dimensional optimization problems?.” Journal of Optimization Theory and Applications 173, no. 2 (2017): 612-627. (paper)

Leni, Pierre-Emmanuel, Yohan D. Fougerolle, and Frédéric Truchetet. “Kolmogorov superposition theorem and its application to multivariate function decompositions and image representation.” In Signal Image Technology and Internet Based Systems, 2008. SITIS’08. IEEE International Conference on, pp. 344-351. IEEE, 2008. (paper)

Klartag, Bo'az. “A central limit theorem for convex sets.” Inventiones mathematicae 168, no. 1 (2007): 91-131. (paper, slides)

Sun, Chen, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. “Revisiting unreasonable effectiveness of data in deep learning era.” In Computer Vision (ICCV), 2017 IEEE International Conference on, pp. 843-852. IEEE, 2017. (blog)

Bengio, Yoshua, Aaron Courville, and Pascal Vincent. “Representation learning: A review and new perspectives.” IEEE transactions on pattern analysis and machine intelligence 35, no. 8 (2013): 1798-1828. (paper)

Braun, Jürgen. “An application of Kolmogorov's superposition theorem to function reconstruction in higher dimensions.” (2009). (dissertation)

Kolmogorov. “On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of a Smaller Number of Variables” (paper)

Arnold. “On functions of three variables” (collection of papers)

Bianchini, Monica, and Franco Scarselli. “On the complexity of shallow and deep neural network classifiers.” In ESANN. 2014.(paper)

Girosi, Federico, and Tomaso Poggio. “Representation properties of networks: Kolmogorov's theorem is irrelevant.” Neural Computation 1, no. 4 (1989): 465-469. (paper)

Kůrková, Věra. “Kolmogorov's theorem and multilayer neural networks.” Neural networks 5, no. 3 (1992): 501-506. (paper)

Poggio, Tomaso, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, and Qianli Liao. “Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review.” International Journal of Automation and Computing 14, no. 5 (2017): 503-519. (paper)

Telgarsky, Matus. “Representation benefits of deep feedforward networks.” arXiv preprint arXiv:1509.08101 (2015). (paper)

Montufar, Guido F., Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. “On the number of linear regions of deep neural networks.” In Advances in neural information processing systems, pp. 2924-2932. 2014. (paper)

Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. “Understanding deep learning requires rethinking generalization.” arXiv preprint arXiv:1611.03530 (2016). (paper)

Lin, Henry W., Max Tegmark, and David Rolnick. “Why does deep and cheap learning work so well?.” Journal of Statistical Physics 168, no. 6 (2017): 1223-1247. (paper)

Stéphane Mallat 1: Mathematical Mysteries of Deep Neural Networks (video)

Theory of Deep Learning II: Landscape of the Empirical Risk in Deep Learning

Model-Ensemble Trust-Region Policy Optimization Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

VAE with a VampPrior (paper)

Bayesian DL (blog)

Recognition Networks for Approximate Inference in BN20 Networks (paper)

Non-linear regression models for Approximate Bayesian Computation (paper)

DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression (paper)

Fast ε-free Inference of Simulation Models with Bayesian Conditional Density Estimation (paper)

Auto-Encoding Variational Bayes (paper)

Composing graphical models with neural networks for structured representations and fast inference (paper)

Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks (paper)

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (paper)

Auto-Encoding Variational Bayes (paper)

Twin Networks: Using the Future as a Regularizer (paper)

Don't Decay the Learning Rate, Increase the Batch Size (paper)

DL Tuning (blog)

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

50 Years of Data Science by Donoho (paper)

Papers with code link

Security (blog)

Unsupervised learning (blog)

Cybersecurity (paper collection)

Stanford's CS231n (course page)

Stanford's STATS385 (course page)

UC Berkeley Stat241B (lectures)

UCUC CSE598 (course page)

TF Playground (Google)

SnakeViz (python profiler)

Pytorch resources (a curated list of tutorials, papers, projects)