Bayes, AI and Deep Learning
Foundations of Data Science
Preface

Modern AI rests on three foundations: Bayesian reasoning, statistical learning, and deep neural networks. This book develops all three—starting from probability and decision theory, progressing through classical machine learning, and ending with transformers and autonomous agents. Whether you’re a business analyst seeking operational intuition or an engineer building systems, this book provides mathematical depth and practical fluency. Throughout, uncertainty is a first-class citizen: the foundation for better algorithms and clearer thinking about data, models, and decisions.
The material draws from courses we teach to MBAs at the University of Chicago Booth School of Business and engineers at George Mason University. These two audiences demand different emphases—business students want operational intuition and decision frameworks; engineers want mathematical rigor and implementation details—but each gains from the other—the same Bayesian methods that power marketing analytics and financial risk modeling also drive robotics and autonomous systems.
When John McCarthy coined “artificial intelligence” in 1956, the field meant expert systems, hand-coded rules and algorithms. The foundations trace further back: Claude Shannon’s information theory, John von Neumann’s game theory and decision science, and Richard Bellman’s dynamic programming. The shift to data-driven learning—models that generalize from examples defines modern AI. The contrast:
For example, the Kalman filter navigated Apollo 11 to the moon in 1969. Today’s autonomous systems—from rocket landings to self-driving cars—rely on optimization and learning algorithms that would have been computationally infeasible then.
How we interact with AI as consumers has evolved through four stages:
- Search. Early search engines answered a single question with a ranked list of webpages. The PageRank algorithm, developed by Google founders, used power iterations to rank these pages by relevance. Statistical tools like Kendall’s tau and Spearman’s rank correlation measured the similarity between the ranking and actual relevance.
- Suggestions. The first popular suggestion algorithm was developed by Netflix. It used collaborative filtering to recommend movies to users based on their viewing history and that of others, easing the burden of choice.
- Summaries. Systems like ChatGPT go beyond retrieval: they synthesize and generalize across domains.
- Agents. Autonomous systems that perceive their environment, reason about goals, and take actions—orchestrating tools to execute multi-step tasks without step-by-step human guidance.
Building systems at each stage requires the same foundations: probabilistic reasoning to handle uncertainty, statistical learning to extract patterns, and scalable computation to train models. This book develops all three.
The book is organized into three parts:
- Part 1: Bayesian Learning: Probability, decision theory, and Bayesian inference.
- Part 2: Statistical Learning: Pattern-matching algorithms including regression, decision trees, and generalized linear models.
- Part 3: Deep Learning: Neural network architectures, optimization via gradient descent, convolutional networks for vision, natural language processing, large language models, and autonomous agents.
Chapters include derivations, Python/R code, and problems and case studies.