References

A. N. Kolmogorov. 1938. “On the Analytic Methods of Probability Theory.” Rossíiskaya Akademiya Nauk, no. 5: 5–41.
Acemoglu, Daron, and Pascual Restrepo. 2018. “Artificial Intelligence, Automation and Work.” National Bureau of Economic Research.
Actor, Jonas. 2018. “Computation for the Kolmogorov Superposition Theorem.” {{MS Thesis}}, Rice.
Albert, Jim. 1993. “A Statistical Analysis of Hitting Streaks in Baseball: Comment.” Journal of the American Statistical Association 88 (424): 1184–88. https://www.jstor.org/stable/2291255.
Altić, Mirela Slukan. 2013. “Exploring Along the Rome Meridian: Roger Boscovich and the First Modern Map of the Papal States.” In History of Cartography: International Symposium of the ICA, 2012, 71–89. Springer.
Amazon. 2021. “The History of Amazon’s Forecasting Algorithm.” Amazon Science. https://www.amazon.science/latest-news/the-history-of-amazons-forecasting-algorithm.
Amit, Yali, Gilles Blanchard, and Kenneth Wilder. 2000. “Multiple Randomized Classifiers: MRCL.”
Andrews, D. F., and C. L. Mallows. 1974. “Scale Mixtures of Normal Distributions.” Journal of the Royal Statistical Society. Series B (Methodological) 36 (1): 99–102. https://www.jstor.org/stable/2984774.
Arnol’d, Vladimir I. 2006. “Forgotten and Neglected Theories of Poincaré.” Russian Mathematical Surveys 61 (1): 1.
Ayala, Orlando, and Patrice Bechard. 2024. “Reducing Hallucination in Structured Outputs via Retrieval-Augmented Generation.” In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track), 228–38. Mexico City, Mexico: Association for Computational Linguistics.
Bach, Francis. 2024. “High-Dimensional Analysis of Double Descent for Linear Regression with Random Projections.” SIAM Journal on Mathematics of Data Science 6 (1): 26–50.
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2014. “Neural Machine Translation by Jointly Learning to Align and Translate.” arXiv. https://arxiv.org/abs/1409.0473.
Barron, Andrew R. 1993. “Universal Approximation Bounds for Superpositions of a Sigmoidal Function.” IEEE Transactions on Information Theory 39 (3): 930–45.
Baum, Leonard E., Ted Petrie, George Soules, and Norman Weiss. 1970. “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains.” The Annals of Mathematical Statistics 41 (1): 164–71. https://www.jstor.org/stable/2239727.
Baylor, Denis, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, et al. 2017. “Tfx: A Tensorflow-Based Production-Scale Machine Learning Platform.” In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1387–95. ACM.
Behnia, Farnaz, Dominik Karbowski, and Vadim Sokolov. 2021. “Deep Generative Models for Vehicle Speed Trajectories.” arXiv Preprint arXiv:2112.08361. https://arxiv.org/abs/2112.08361.
Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019. “Reconciling Modern Machine-Learning Practice and the Classical Bias–Variance Trade-Off.” Proceedings of the National Academy of Sciences 116 (32): 15849–54.
Benoit, Dries F., and Dirk Van den Poel. 2012. “Binary Quantile Regression: A Bayesian Approach Based on the Asymmetric Laplace Distribution.” Journal of Applied Econometrics 27 (7): 1174–88.
Berge, Travis, Nitish Sinha, and Michael Smolyansky. 2016. “Which Market Indicators Best Forecast Recessions?” FEDS Notes, August.
Bhadra, Anindya, Jyotishka Datta, Nick Polson, Vadim Sokolov, and Jianeng Xu. 2021. “Merging Two Cultures: Deep and Statistical Learning.” arXiv Preprint arXiv:2110.11561. https://arxiv.org/abs/2110.11561.
Bird, Steven, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. Beijing ; Cambridge Mass.: O’Reilly Media.
Bojarski, Mariusz, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, et al. 2016. “End to End Learning for Self-Driving Cars.” arXiv Preprint arXiv:1604.07316. https://arxiv.org/abs/1604.07316.
Bonfiglio, Rita, Annarita Granaglia, Raffaella Giocondo, Manuel Scimeca, and Elena Bonanno. 2021. “Molecular Aspects and Prognostic Significance of Microcalcifications in Human Pathology: A Narrative Review.” International Journal of Molecular Sciences 22 (120).
Bottou, Léon, Frank E Curtis, and Jorge Nocedal. 2018. “Optimization Methods for Large-Scale Machine Learning.” SIAM Review 60 (2): 223–311.
Brillinger, David R. 2012. “A Generalized Linear Model With Gaussian Regressor Variables.” In Selected Works of David Brillinger, edited by Peter Guttorp and David Brillinger, 589–606. Selected Works in Probability and Statistics. New York, NY: Springer.
Bryson, Arthur E. 1961. “A Gradient Method for Optimizing Multi-Stage Allocation Processes.” In Proc. Harvard Univ. Symposium on Digital Computers and Their Applications. Vol. 72.
Campagnoli, Patrizia, Sonia Petrone, and Giovanni Petris. 2009. Dynamic Linear Models with R. New York, NY: Springer.
Candes, Emmanuel J, and Michael B Wakin. 2008. “An Introduction To Compressive Sampling. A Sensing/Sampling Paradigm That Goes Against the Common Knowledge in Data Aquisition.” IEEE Signal Processing Magazine 25 (21).
Cannon, Alex J. 2018. “Non-Crossing Nonlinear Regression Quantiles by Monotone Composite Quantile Regression Neural Network, with Application to Rainfall Extremes.” Stochastic Environmental Research and Risk Assessment 32 (11): 3207–25.
Carlin, Bradley P, Nicholas G Polson, and David S Stoffer. 1992. “A Monte Carlo Approach to Nonnormal and Nonlinear State-Space Modeling.” Journal of the American Statistical Association 87 (418): 493–500.
Carreira-Perpinán, Miguel A, and Weiran Wang. 2014. “Distributed Optimization of Deeply Nested Systems.” In AISTATS, 10–19.
Carter, Chris K, and Robert Kohn. 1994. “On Gibbs Sampling for State Space Models.” Biometrika 81 (3): 541–53.
Carvalho, Carlos M, Hedibert F Lopes, Nicholas G Polson, and Matt A Taddy. 2010. “Particle Learning for General Mixtures.” Bayesian Analysis 5 (4): 709–40.
Carvalho, Carlos M., Nicholas G. Polson, and James G. Scott. 2010. “The Horseshoe Estimator for Sparse Signals.” Biometrika, asq017.
Chernozhukov, Victor, Iván Fernández-Val, and Alfred Galichon. 2010. “Quantile and Probability Curves Without Crossing.” Econometrica 78 (3): 1093–1125. https://www.jstor.org/stable/40664520.
Chib, Siddhartha. 1998. “Estimation and Comparison of Multiple Change-Point Models.” Journal of Econometrics 86 (2): 221–41.
Chung, Hyung Won, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, et al. 2022. “Scaling Instruction-Finetuned Language Models.” arXiv. https://arxiv.org/abs/2210.11416.
Cook, R. Dennis. 2007. “Fisher Lecture: Dimension Reduction in Regression.” Statistical Science, 1–26. https://www.jstor.org/stable/27645799.
Cootner, Paul H. 1967. The Random Character of Stock Market Prices. MIT press.
Coppejans, Mark. 2004. “On Kolmogorov’s Representation of Functions of Several Variables by Functions of One Variable.” Journal of Econometrics 123 (1): 1–31.
Cover, T., and P. Hart. 1967. “Nearest Neighbor Pattern Classification.” IEEE Transactions on Information Theory 13 (1): 21–27.
Dabney, Will, Georg Ostrovski, David Silver, and Rémi Munos. 2018. “Implicit Quantile Networks for Distributional Reinforcement Learning.” arXiv. https://arxiv.org/abs/1806.06923.
Dabney, Will, Mark Rowland, Marc G. Bellemare, and Rémi Munos. 2017. “Distributional Reinforcement Learning with Quantile Regression.” arXiv. https://arxiv.org/abs/1710.10044.
Davison, Anthony Christopher. 2003. Statistical Models. Vol. 11. Cambridge university press.
Dean, Jeffrey, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, et al. 2012. “Large Scale Distributed Deep Networks.” In Advances in Neural Information Processing Systems, 1223–31.
DeGroot, Morris H. 2005. Optimal Statistical Decisions. Wiley classics library ed. Wiley Classics Library. Hoboken, NJ: Wiley-Interscience.
Demb, Robert, and David Sprecher. 2021. “A Note on Computing with Kolmogorov Superpositions Without Iterations.” Neural Networks 144 (December): 438–42.
Devroye, Luc. 1986. Non-Uniform Random Variate Generation. Springer Science & Business Media.
Diaconis, Persi, and Frederick and Mosteller. 1989. “Methods for Studying Coincidences.” Journal of the American Statistical Association 84 (408): 853–61.
Diaconis, Persi, and David Freedman. 1987. “A Dozen de Finetti-style Results in Search of a Theory.” In Annales de l’IHP Probabilités Et Statistiques, 23:397–423.
Diaconis, Persi, and Mehrdad Shahshahani. 1981. “Generating a Random Permutation with Random Transpositions.” Probability Theory and Related Fields 57 (2): 159–79.
———. 1984. “On Nonlinear Functions of Linear Combinations.” SIAM Journal on Scientific and Statistical Computing 5 (1): 175–91.
Diaconis, P., and D. Ylvisaker. 1983. “Quantifying Prior Opinion.”
Dixon, Mark J., and Stuart G. Coles. 1997. “Modelling Association Football Scores and Inefficiencies in the Football Betting Market.” Journal of the Royal Statistical Society Series C: Applied Statistics 46 (2): 265–80.
Dixon, Matthew F, Nicholas G Polson, and Vadim O Sokolov. 2019. “Deep Learning for Spatio-Temporal Modeling: Dynamic Traffic Flows and High Frequency Trading.” Applied Stochastic Models in Business and Industry 35 (3): 788–807.
Dreyfus, Stuart. 1962. “The Numerical Solution of Variational Problems.” Journal of Mathematical Analysis and Applications 5 (1): 30–45.
———. 1973. “The Computational Solution of Optimal Control Problems with Time Lag.” IEEE Transactions on Automatic Control 18 (4): 383–85.
Efron, Bradley, and Carl Morris. 1975. “Data Analysis Using Stein’s Estimator and Its Generalizations.” Journal of the American Statistical Association 70 (350): 311–19.
———. 1977. “Stein’s Paradox in Statistics.” Scientific American 236 (5): 119–27.
Enikolopov, Ruben, Vasily Korovkin, Maria Petrova, Konstantin Sonin, and Alexei Zakharov. 2013. “Field Experiment Estimate of Electoral Fraud in Russian Parliamentary Elections.” Proceedings of the National Academy of Sciences 110 (2): 448–52.
Eric Tassone, and Farzan Rohani. 2017. “Our Quest for Robust Time Series Forecasting at Scale.”
Feller, William. 1971. An Introduction to Probability Theory and Its Applications. Wiley.
Feynman, Richard. n.d. “Feynman :: Rules of Chess.”
Fredholm, Ivar. 1903. “Sur Une Classe d’équations Fonctionnelles.” Acta Mathematica 27 (none): 365–90.
Friedman, Jerome H., and Werner Stuetzle. 1981. “Projection Pursuit Regression.” Journal of the American Statistical Association 76 (376): 817–23.
Frühwirth-Schnatter, Sylvia, and Rudolf Frühwirth. 2007. “Auxiliary Mixture Sampling with Applications to Logistic Models.” Computational Statistics & Data Analysis 51 (April): 3509–28.
———. 2010. “Data Augmentation and MCMC for Binary and Multinomial Logit Models.” In Statistical Modelling and Regression Structures: Festschrift in Honour of Ludwig Fahrmeir, 111–32.
Frühwirth-Schnatter, Sylvia, Rudolf Frühwirth, Leonhard Held, and Håvard Rue. 2008. “Improved Auxiliary Mixture Sampling for Hierarchical Models of Non-Gaussian Data.” Statistics and Computing 19 (4): 479.
Gan, Link, and Alan Fritzler. 2016. “How to Become an Executive.”
García-Arenzana, Nicolás, Eva María Navarrete-Muñoz, Virginia Lope, Pilar Moreo, Carmen Vidal, Soledad Laso-Pablos, Nieves Ascunce, et al. 2014. Calorie Intake, Olive Oil Consumption and Mammographic Density Among Spanish Women.” International Journal of Cancer 134 (8): 1916–25.
Gramacy, Robert B., and Nicholas G. Polson. 2012. “Simulation-Based Regularized Logistic Regression.” arXiv. https://arxiv.org/abs/1005.3430.
Griewank, Andreas, Kshitij Kulshreshtha, and Andrea Walther. 2012. “On the Numerical Stability of Algorithmic Differentiation.” Computing. Archives for Scientific Computing 94 (2-4): 125–49.
Guan, Xinyu, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, and Mao Yang. 2025. rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking.” arXiv. https://arxiv.org/abs/2501.04519.
Hahn, P. Richard, Jared S. Murray, and Carlos M. Carvalho. 2020. “Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion).” Bayesian Analysis 15 (3): 965–1056.
Halevy, Alon, Peter Norvig, and Fernando Pereira. 2009. “The Unreasonable Effectiveness of Data.” IEEE Intelligent Systems 24 (2): 8–12.
Hardt, Moritz, Ben Recht, and Yoram Singer. 2016. “Train Faster, Generalize Better: Stability of Stochastic Gradient Descent.” In International Conference on Machine Learning, 1225–34. PMLR.
Hastie, Trevor, Andrea Montanari, Saharon Rosset, and Ryan J. Tibshirani. 2022. “Surprises in High-Dimensional Ridgeless Least Squares Interpolation.” The Annals of Statistics 50 (2): 949–86.
Held, Leonhard, and Chris C. Holmes. 2006. “Bayesian Auxiliary Variable Models for Binary and Multinomial Regression.” Bayesian Analysis 1 (1): 145–68.
Hermann, Jeremy, and Mike Del Balso. 2017. “Meet Michelangelo: Uber’s Machine Learning Platform.”
Hou, Zhen, Hao Liu, Jiang Bian, Xing He, and Yan Zhuang. 2025. “Enhancing Medical Coding Efficiency Through Domain-Specific Fine-Tuned Large Language Models.” Npj Health Systems 2 (1): 14.
Hyndman, Rob J., and George Athanasopoulos. 2021. Forecasting: Principles and Practice. 3rd ed. edition. Melbourne, Australia: Otexts.
Igelnik, B., and N. Parikh. 2003. “Kolmogorov’s Spline Network.” IEEE Transactions on Neural Networks 14 (4): 725–33.
Immer, Alexander, Matthias Bauer, Vincent Fortuin, Gunnar Rätsch, and Khan Mohammad Emtiyaz. 2021. “Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning.” In International Conference on Machine Learning, 4563–73. PMLR.
indeed. 2018. “Jobs of the Future: Emerging Trends in Artificial Intelligence.”
Irwin, Neil. 2016. “How to Become a C.E.O.? The Quickest Path Is a Winding One.” The New York Times, September.
Iwata, Shigeru. 2001. “Recentered and Rescaled Instrumental Variable Estimation of Tobit and Probit Models with Errors in Variables.” Econometric Reviews 20 (3): 319–35.
Januschowski, Tim, Yuyang Wang, Kari Torkkola, Timo Erkkilä, Hilaf Hasson, and Jan Gasthaus. 2022. “Forecasting with Trees.” International Journal of Forecasting, Special Issue: M5 competition, 38 (4): 1473–81.
kaggle. 2020. “M5 Forecasting - Accuracy.” https://kaggle.com/competitions/m5-forecasting-accuracy.
Kallenberg, Olav. 1997. Foundations of Modern Probability. 2nd ed. edition. Springer.
Kalman, R. E., and R. S. Bucy. 1961. “New Results in Linear Filtering and Prediction Theory.” Journal of Basic Engineering 83 (1): 95–108.
Kalman, Rudolph Emil. 1960. “A New Approach to Linear Filtering and Prediction Problems.” Transactions of the ASME–Journal of Basic Engineering 82 (Series D): 35–45.
Keskar, Nitish Shirish, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima.” arXiv Preprint arXiv:1609.04836. https://arxiv.org/abs/1609.04836.
Keynes, John Maynard. 1921. A Treatise on Probability. Macmillan.
Kingma, Diederik, and Jimmy Ba. 2014. “Adam: A Method for Stochastic Optimization.” arXiv Preprint arXiv:1412.6980. https://arxiv.org/abs/1412.6980.
Klartag, Bo’az. 2007. “A Central Limit Theorem for Convex Sets.” Inventiones Mathematicae 168 (1): 91–131.
Kolmogoroff, Andrei. 1931. Über Die Analytischen Methoden in Der Wahrscheinlichkeitsrechnung.” Mathematische Annalen 104 (1): 415–58.
Kolmogorov, AN. 1942. “Definition of Center of Dispersion and Measure of Accuracy from a Finite Number of Observations (in Russian).” Izv. Akad. Nauk SSSR Ser. Mat. 6: 3–32.
———. 1956. “On the Representation of Continuous Functions of Several Variables as Superpositions of Functions of Smaller Number of Variables.” In Soviet. Math. Dokl, 108:179–82.
Kreps, David. 1988. Notes On The Theory Of Choice. Boulder: Westview Press.
Levina, Elizaveta, and Peter Bickel. 2001. “The Earth Mover’s Distance Is the Mallows Distance: Some Insights from Statistics.” In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, 2:251–56. IEEE.
Lin, Zhouhan, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. “A Structured Self-attentive Sentence Embedding.” arXiv. https://arxiv.org/abs/1703.03130.
Lindgren, Georg. 1978. “Markov Regime Models for Mixed Distributions and Switching Regressions.” Scandinavian Journal of Statistics 5 (2): 81–91. https://www.jstor.org/stable/4615692.
Lindley, D. V. 1961. “The Use of Prior Probability Distributions in Statistical Inference and Decisions.” In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, 4.1:453–69. University of California Press.
Linnainmaa, Seppo. 1970. “The Representation of the Cumulative Rounding Error of an Algorithm as a Taylor Expansion of the Local Rounding Errors.” Master’s Thesis (in Finnish), Univ. Helsinki, 6–7.
Logan, John A. 1983. “A Multivariate Model for Mobility Tables.” American Journal of Sociology 89 (2): 324–49.
Logunov, A. A. 2004. “Henri Poincare and Relativity Theory.” https://arxiv.org/abs/physics/0408077.
Lorentz, George G. 1976. “The 13th Problem of Hilbert.” In Proceedings of Symposia in Pure Mathematics, 28:419–30. American Mathematical Society.
MacKay, David JC. 1992. “Bayesian Interpolation.” Neural Computation 4 (3): 415–47.
Maharaj, Shiva, Nick Polson, and Vadim Sokolov. 2023. “Kramnik Vs Nakamura or Bayes Vs p-Value.” {{SSRN Scholarly Paper}}. Rochester, NY.
Malthouse, Edward, Richard Mah, and Ajit Tamhane. 1997. “Nonlinear Partial Least Squares.” Computers & Chemical Engineering 12 (April): 875–90.
Mehrasa, Nazanin, Yatao Zhong, Frederick Tung, Luke Bornn, and Greg Mori. 2017. “Learning Person Trajectory Representations for Team Activity Analysis.” arXiv Preprint arXiv:1706.00893. https://arxiv.org/abs/1706.00893.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” arXiv. https://arxiv.org/abs/1301.3781.
Milman, Vitali D, and Gideon Schechtman. 2009. Asymptotic Theory of Finite Dimensional Normed Spaces: Isoperimetric Inequalities in Riemannian Manifolds. Vol. 1200. Springer.
Nadaraya, E. A. 1964. “On Estimating Regression.” Theory of Probability & Its Applications 9 (1): 141–42.
Naik, Prasad, and Chih-Ling Tsai. 2000. “Partial Least Squares Estimator for Single-Index Models.” Journal of the Royal Statistical Society. Series B (Statistical Methodology) 62 (4): 763–71. https://www.jstor.org/stable/2680619.
Nakkiran, Preetum, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever. 2021. “Deep Double Descent: Where Bigger Models and More Data Hurt*.” Journal of Statistical Mechanics: Theory and Experiment 2021 (12): 124003.
Nareklishvili, Maria, Nicholas Polson, and Vadim Sokolov. 2022. “Deep Partial Least Squares for Iv Regression.” arXiv Preprint arXiv:2207.02612. https://arxiv.org/abs/2207.02612.
———. 2023a. “Generative Causal Inference,” June. https://arxiv.org/abs/2306.16096.
———. 2023b. “Feature Selection for Personalized Policy Analysis,” July. https://arxiv.org/abs/2301.00251.
Nesterov, Yurii. 1983. “A Method of Solving a Convex Programming Problem with Convergence Rate O (1/K2).” In Soviet Mathematics Doklady, 27:372–76.
———. 2013. Introductory Lectures on Convex Optimization: A Basic Course. Vol. 87. Springer Science & Business Media.
Nicosia, Luca, Giulia Gnocchi, Ilaria Gorini, Massimo Venturini, Federico Fontana, Filippo Pesapane, Ida Abiuso, et al. 2023. “History of Mammography: Analysis of Breast Imaging Diagnostic Achievements over the Last Century.” Healthcare 11 (1596).
Ostrovskii, GM, Yu M Volin, and WW Borisov. 1971. “Uber Die Berechnung von Ableitungen.” Wissenschaftliche Zeitschrift Der Technischen Hochschule f Ur Chemie, Leuna-Merseburg 13 (4): 382–84.
Ouyang, Long, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, et al. 2022. “Training Language Models to Follow Instructions with Human Feedback.” Advances in Neural Information Processing Systems 35: 27730–44.
Pan, Zhenyu, Haozheng Luo, Manling Li, and Han Liu. 2025. “Chain-of-Action: Faithful and Multimodal Question Answering Through Large Language Models.” arXiv. https://arxiv.org/abs/2403.17359.
Parzen, Emanuel. 2004. “Quantile Probability and Statistical Data Modeling.” Statistical Science 19 (4): 652–62. https://www.jstor.org/stable/4144436.
Petris, Giovanni. 2010. “An R Package for Dynamic Linear Models.” Journal of Statistical Software 36 (October): 1–16.
Poincaré, Henri. 1898. “La Mesure Du Temps.” Revue de métaphysique Et de Morale 6 (1): 1–13.
Polson, Nicholas G., James G. Scott, and Jesse Windle. 2013. “Bayesian Inference for Logistic Models Using PólyaGamma Latent Variables.” Journal of the American Statistical Association 108 (504): 1339–49.
Polson, Nicholas G, and James Scott. 2018. AIQ: How People and Machines Are Smarter Together. St. Martin’s Press.
Polson, Nicholas G., and Vadim Sokolov. 2023. “Generative AI for Bayesian Computation.” https://arxiv.org/abs/2305.14972.
Polson, Nicholas G, Vadim Sokolov, et al. 2017. “Deep Learning: A Bayesian Perspective.” Bayesian Analysis 12 (4): 1275–1304.
Polson, Nicholas, and Steven Scott. 2011. “Data Augmentation for Support Vector Machines.” Bayesian Analysis 6 (March).
Polson, Nicholas, and Vadim Sokolov. 2020. “Deep Learning: Computational Aspects.” Wiley Interdisciplinary Reviews: Computational Statistics 12 (5): e1500.
Polson, Nicholas, Vadim Sokolov, and Jianeng Xu. 2021. “Deep Learning Partial Least Squares.” arXiv Preprint arXiv:2106.14085. https://arxiv.org/abs/2106.14085.
Polson, Nick, Fabrizio Ruggeri, and Vadim Sokolov. 2024. “Generative Bayesian Computation for Maximum Expected Utility.” Entropy 26 (12): 1076.
Poplin, Ryan, Avinash V Varadarajan, Katy Blumer, Yun Liu, Michael V McConnell, Greg S Corrado, Lily Peng, and Dale R Webster. 2018. “Prediction of Cardiovascular Risk Factors from Retinal Fundus Photographs via Deep Learning.” Nature Biomedical Engineering 2 (3): 158.
Qian, Chen, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, et al. 2024. ChatDev: Communicative Agents for Software Development.” arXiv. https://arxiv.org/abs/2307.07924.
Ritter, Hippolyt, Aleksandar Botev, and David Barber. 2018. “A Scalable Laplace Approximation For Neural Networks.”
Robbins, Herbert, and Sutton Monro. 1951. “A Stochastic Approximation Method.” The Annals of Mathematical Statistics 22 (3): 400–407.
Rubin, Hal S. Stern, John B. Carlin. 2015. Bayesian Data Analysis. 3rd ed. New York: Chapman and Hall/CRC.
Rumelhart, David E, Geoffrey E Hinton, and Ronald J Williams. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323 (6088): 533.
Schmidhuber, Jürgen. 2015. “Deep Learning in Neural Networks: An Overview.” Neural Networks 61: 85–117.
Schmidt-Hieber, Johannes. 2021. “The KolmogorovArnold Representation Theorem Revisited.” Neural Networks 137 (May): 119–26.
Schwertman, Neil C, AJ Gilks, and J Cameron. 1990. “A Simple Noncalculus Proof That the Median Minimizes the Sum of the Absolute Deviations.” The American Statistician 44 (1): 38–39.
Scott, Steven L. 2002. “Bayesian Methods for Hidden Markov Models.” Journal of the American Statistical Association 97 (457): 337–51.
———. 2015. “Multi-Armed Bandit Experiments in the Online Service Economy.” Applied Stochastic Models in Business and Industry 31 (1): 37–45.
Scott, Steven L. 2022. BoomSpikeSlab: MCMC for Spike and Slab Regression.”
Scott, Steven L., and Hal R. Varian. 2015. “Bayesian Variable Selection for Nowcasting Economic Time Series.” In Economic Analysis of the Digital Economy, 119–35. University of Chicago Press.
Scott, Steven, and Hal Varian. 2014. “Predicting the Present with Bayesian Structural Time Series.” Int. J. Of Mathematical Modelling and Numerical Optimisation 5 (January): 4–23.
Sean J. Taylor, and Ben Letham. 2017. “Prophet: Forecasting at Scale - Meta Research.” Meta Research. https://research.facebook.com/blog/2017/2/prophet-forecasting-at-scale/.
Shen, Changyu, Enrico G Ferro, Huiping Xu, Daniel B Kramer, Rushad Patell, and Dhruv S Kazi. 2021. “Underperformance of Contemporary Phase III Oncology Trials and Strategies for Improvement.” Journal of the National Comprehensive Cancer Network 19 (9): 1072–78.
Shiryayev, A. N. 1992. “On Analytical Methods in Probability Theory.” In Selected Works of a. N. Kolmogorov: Volume II Probability Theory and Mathematical Statistics, edited by A. N. Shiryayev, 62–108. Dordrecht: Springer Netherlands.
Silver, David, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, et al. 2017. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.” arXiv. https://arxiv.org/abs/1712.01815.
Simpson, Edward. 2010. “Edward Simpson: Bayes at Bletchley Park.” Significance 7 (2): 76–80.
Singh, Pratyush Kumar, Kathryn A. Farrell-Maupin, and Danial Faghihi. 2024. “A Framework for Strategic Discovery of Credible Neural Network Surrogate Models Under Uncertainty.” arXiv. https://arxiv.org/abs/2403.08901.
Smith, A. F. M. 1975. “A Bayesian Approach to Inference about a Change-Point in a Sequence of Random Variables.” Biometrika 62 (2): 407–16. https://www.jstor.org/stable/2335381.
Sokolov, Vadim. 2017. “Discussion of Deep Learning for Finance: Deep Portfolios’.” Applied Stochastic Models in Business and Industry 33 (1): 16–18.
Spiegelhalter, David, and Yin-Lam Ng. 2009. “One Match to Go!” Significance 6 (4): 151–53.
Stein, Charles. 1964. “Inadmissibility of the Usual Estimator for the Variance of a Normal Distribution with Unknown Mean.” Annals of the Institute of Statistical Mathematics 16 (1): 155–60.
Stern, H, Adam Sugano, J Albert, and R Koning. 2007. “Inference about Batter-Pitcher Matchups in Baseball from Small Samples.” Statistical Thinking in Sports, 153–65.
Stigler, Stephen M. 1981. “Gauss and the Invention of Least Squares.” The Annals of Statistics, 465–74.
Sun, Duxin, Wei Gao, Hongxiang Hu, and Simon Zhou. 2022. “Why 90% of Clinical Drug Development Fails and How to Improve It?” Acta Pharmaceutica Sinica B 12 (7): 3049–62.
Sutskever, Ilya, James Martens, George Dahl, and Geoffrey Hinton. 2013. “On the Importance of Initialization and Momentum in Deep Learning.” In International Conference on Machine Learning, 1139–47.
Taleb, Nassim Nicholas. 2007. The Black Swan: The Impact of the Highly Improbable. Annotated edition. New York. N.Y: Random House.
Tarone, Robert E. 1982. “The Use of Historical Control Information in Testing for a Trend in Proportions.” Biometrics. Journal of the International Biometric Society, 215–20.
Tesauro, Gerald. 1995. “Temporal Difference Learning and TD-Gammon.” Communications of the ACM 38 (3): 58–68.
Tiao, Louis. 2019. “Pólya-Gamma Bayesian Logistic Regression.” Blog post.
Tikhonov, Andrei N. 1963. “Solution of Incorrectly Formulated Problems and the Regularization Method.” Sov Dok 4: 1035–38.
Tikhonov, Andrey Nikolayevich et al. 1943. “On the Stability of Inverse Problems.” In Dokl. Akad. Nauk Sssr, 39:195–98.
Tsai, Yao-Hung Hubert, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. “Transformer Dissection: A Unified Understanding of Transformer’s Attention via the Lens of Kernel.” arXiv. https://arxiv.org/abs/1908.11775.
Varian, Hal R. 2010. “Computer Mediated Transactions.” American Economic Review 100 (2): 1–10.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. “Attention Is All You Need.” arXiv. https://arxiv.org/abs/1706.03762.
Vecer, Jan, Frantisek Kopriva, and Tomoyuki Ichiba. 2009. “Estimating the Effect of the Red Card in Soccer: When to Commit an Offense in Exchange for Preventing a Goal Opportunity.” Journal of Quantitative Analysis in Sports 5 (1).
Viterbi, A. 1967. “Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm.” IEEE Transactions on Information Theory 13 (2): 260–69.
Watanabe, Sumio. 2013. “A Widely Applicable Bayesian Information Criterion.” The Journal of Machine Learning Research 14 (1): 867–97.
Watson, Geoffrey S. 1964. “Smooth Regression Analysis.” Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) 26 (4): 359–72. https://www.jstor.org/stable/25049340.
Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” arXiv. https://arxiv.org/abs/2201.11903.
Werbos, Paul. 1974. “Beyond Regression:" New Tools for Prediction and Analysis in the Behavioral Sciences.” Ph. D. Dissertation, Harvard University.
Werbos, Paul J. 1982. “Applications of Advances in Nonlinear Sensitivity Analysis.” In System Modeling and Optimization, 762–70. Springer.
West, Mike, and Jeff Harrison. 1997. Bayesian Forecasting and Dynamic Models. Springer.
Windle, Jesse. 2023. BayesLogit: Bayesian Logistic Regression.” R package version 2.1.
Windle, Jesse, Nicholas G. Polson, and James G. Scott. 2014. “Sampling Polya-Gamma Random Variates: Alternate and Approximate Techniques.” arXiv. https://arxiv.org/abs/1405.0506.
Wojna, Zbigniew, Alex Gorban, Dar-Shyang Lee, Kevin Murphy, Qian Yu, Yeqing Li, and Julian Ibarz. 2017. “Attention-Based Extraction of Structured Information from Street View Imagery.” arXiv Preprint arXiv:1704.03549. https://arxiv.org/abs/1704.03549.
Wold, Herman. 1975/ed. “Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (NIPALS) Approach.” Journal of Applied Probability 12 (S1): 117–42.
Yaari, Menahem E. 1987. “The Dual Theory of Choice Under Risk.” Econometrica 55 (1): 95–115. https://www.jstor.org/stable/1911158.
Yang, An, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, et al. 2025. “Qwen2. 5-1m Technical Report.” arXiv Preprint arXiv:2501.15383. https://arxiv.org/abs/2501.15383.
Yao, Shunyu, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. “Tree of Thoughts: Deliberate Problem Solving with Large Language Models.” arXiv. https://arxiv.org/abs/2305.10601.
Ye, Yixin, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, and Pengfei Liu. 2025. LIMO: Less Is More for Reasoning.” arXiv. https://arxiv.org/abs/2502.03387.
Zeiler, Matthew D. 2012. ADADELTA: An Adaptive Learning Rate Method.” arXiv Preprint arXiv:1212.5701. https://arxiv.org/abs/1212.5701.
Zhang, Yichi, Anirban Datta, and Sudipto Banerjee. 2018. “Scalable Gaussian Process Classification with Pólya-Gamma Data Augmentation.” arXiv Preprint arXiv:1802.06383. https://arxiv.org/abs/1802.06383.