References

A. N. Kolmogorov. 1938. “On the Analytic Methods of Probability Theory.” Rossíiskaya Akademiya Nauk, no. 5: 5–41.

Actor, Jonas. 2018. “Computation for the Kolmogorov Superposition Theorem.” {{MS Thesis}}, Rice.

Albert, Jim. 1993. “A Statistical Analysis of Hitting Streaks in Baseball: Comment.” Journal of the American Statistical Association 88 (424): 1184–88. https://www.jstor.org/stable/2291255.

Altić, Mirela Slukan. 2013. “Exploring Along the Rome Meridian: Roger Boscovich and the First Modern Map of the Papal States.” In History of Cartography: International Symposium of the ICA, 2012, 71–89. Springer.

Amazon. 2021. “The History of Amazon’s Forecasting Algorithm.”

Amit, Yali, Gilles Blanchard, and Kenneth Wilder. 2000. “Multiple Randomized Classifiers: MRCL.”

Andrews, D. F., and C. L. Mallows. 1974. “Scale Mixtures of Normal Distributions.” Journal of the Royal Statistical Society. Series B (Methodological) 36 (1): 99–102. https://www.jstor.org/stable/2984774.

Apley, Daniel W., and Jingyu Zhu. 2020. “Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models.” Journal of the Royal Statistical Society Series B: Statistical Methodology 82 (4): 1059–86.

Arjovsky, Martin, Soumith Chintala, and Léon Bottou. 2017. “Wasserstein Generative Adversarial Networks.” Proceedings of the 34th International Conference on Machine Learning, 214–23.

Arnol’d, Vladimir I. 2006. “Forgotten and Neglected Theories of Poincaré.” Russian Mathematical Surveys 61 (1): 1.

Ayala, Orlando, and Patrice Bechard. 2024. “Reducing Hallucination in Structured Outputs via Retrieval-Augmented Generation.” In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track), 228–38. Mexico City, Mexico: Association for Computational Linguistics.

Bach, Francis. 2024. “High-Dimensional Analysis of Double Descent for Linear Regression with Random Projections.” SIAM Journal on Mathematics of Data Science 6 (1): 26–50.

Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2014. “Neural Machine Translation by Jointly Learning to Align and Translate.” arXiv. https://arxiv.org/abs/1409.0473.

Barron, Andrew R. 1993. “Universal Approximation Bounds for Superpositions of a Sigmoidal Function.” IEEE Transactions on Information Theory 39 (3): 930–45.

Baum, Leonard E., Ted Petrie, George Soules, and Norman Weiss. 1970. “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains.” The Annals of Mathematical Statistics 41 (1): 164–71. https://www.jstor.org/stable/2239727.

Baylor, Denis, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, et al. 2017. “Tfx: A Tensorflow-Based Production-Scale Machine Learning Platform.” In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1387–95. ACM.

Behnia, Farnaz, Dominik Karbowski, and Vadim Sokolov. 2021. “Deep Generative Models for Vehicle Speed Trajectories.” arXiv Preprint arXiv:2112.08361. https://arxiv.org/abs/2112.08361.

Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019. “Reconciling Modern Machine-Learning Practice and the Classical Bias–Variance Trade-Off.” Proceedings of the National Academy of Sciences 116 (32): 15849–54.

Bellemare, Marc G., Will Dabney, and Rémi Munos. 2017. “A Distributional Perspective on Reinforcement Learning.” Proceedings of the 34th International Conference on Machine Learning, 449–58.

Benoit, Dries F., and Dirk Van den Poel. 2012. “Binary Quantile Regression: A Bayesian Approach Based on the Asymmetric Laplace Distribution.” Journal of Applied Econometrics 27 (7): 1174–88.

Berge, Travis, Nitish Sinha, and Michael Smolyansky. 2016. “Which Market Indicators Best Forecast Recessions?” FEDS Notes, August.

Bhadra, Anindya, Jyotishka Datta, Nick Polson, Vadim Sokolov, and Jianeng Xu. 2021. “Merging Two Cultures: Deep and Statistical Learning.” arXiv Preprint arXiv:2110.11561. https://arxiv.org/abs/2110.11561.

Bird, Steven, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. Beijing ; Cambridge Mass.: O’Reilly Media.

Bojarski, Mariusz, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, et al. 2016. “End to End Learning for Self-Driving Cars.” arXiv Preprint arXiv:1604.07316. https://arxiv.org/abs/1604.07316.

Bonfiglio, Rita, Annarita Granaglia, Raffaella Giocondo, Manuel Scimeca, and Elena Bonanno. 2021. “Molecular Aspects and Prognostic Significance of Microcalcifications in Human Pathology: A Narrative Review.” International Journal of Molecular Sciences 22 (120).

Bottou, Léon, Frank E Curtis, and Jorge Nocedal. 2018. “Optimization Methods for Large-Scale Machine Learning.” SIAM Review 60 (2): 223–311.

Braun, Heinrich, and Martin Riedmiller. 2009. “Constructive Neural Network Learning Algorithms for Pattern Classification.” IEEE Transactions on Neural Networks 20 (1): 84–97.

Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32.

Brillinger, David R. 2012. “A Generalized Linear Model With ‘Gaussian’ Regressor Variables.” In Selected Works of David Brillinger, edited by Peter Guttorp and David Brillinger, 589–606. Selected Works in Probability and Statistics. New York, NY: Springer.

Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” Advances in Neural Information Processing Systems 33: 1877–1901.

Bryson, Arthur E. 1961. “A Gradient Method for Optimizing Multi-Stage Allocation Processes.” In Proc. Harvard Univ. Symposium on Digital Computers and Their Applications. Vol. 72.

Bryson, Arthur E., and Yu-Chi Ho. 1969. Applied Optimal Control: Optimization, Estimation, and Control. Waltham, MA: Blaisdell Publishing Company.

Bumgarner, John M., Chad T. Lambert, Ayman A. Hussein, Daniel J. Cantillon, Bryan Baranowski, Kathy Wolski, Bruce D. Lindsay, Oussama M. Wazni, and Khaldoun G. Tarakji. 2018. “Smartwatch Algorithm for Automated Detection of Atrial Fibrillation.” Journal of the American College of Cardiology 71 (21): 2381–88.

Camerer, Colin F. 2003. Behavioral Game Theory: Experiments in Strategic Interaction. The Roundtable Series in Behavioral Economics. New York Princeton: Russell sage foundation Princeton university press.

Campagnoli, Patrizia, Sonia Petrone, and Giovanni Petris. 2009. Dynamic Linear Models with R. New York, NY: Springer.

Cannon, Alex J. 2018. “Non-Crossing Nonlinear Regression Quantiles by Monotone Composite Quantile Regression Neural Network, with Application to Rainfall Extremes.” Stochastic Environmental Research and Risk Assessment 32 (11): 3207–25.

Carlin, Bradley P, Nicholas G Polson, and David S Stoffer. 1992. “A Monte Carlo Approach to Nonnormal and Nonlinear State-Space Modeling.” Journal of the American Statistical Association 87 (418): 493–500.

Carter, Chris K, and Robert Kohn. 1994. “On Gibbs Sampling for State Space Models.” Biometrika 81 (3): 541–53.

Carvalho, Carlos M, Hedibert F Lopes, Nicholas G Polson, and Matt A Taddy. 2010. “Particle Learning for General Mixtures.” Bayesian Analysis 5 (4): 709–40.

Carvalho, Carlos M., Nicholas G. Polson, and James G. Scott. 2010. “The Horseshoe Estimator for Sparse Signals.” Biometrika, asq017.

Chen, Tianqi, and Carlos Guestrin. 2016. “XGBoost: A Scalable Tree Boosting System.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–94. New York, NY, USA: ACM.

Chernozhukov, Victor, Iván Fernández-Val, and Alfred Galichon. 2010. “Quantile and Probability Curves Without Crossing.” Econometrica 78 (3): 1093–1125. https://www.jstor.org/stable/40664520.

Chib, Siddhartha. 1998. “Estimation and Comparison of Multiple Change-Point Models.” Journal of Econometrics 86 (2): 221–41.

Choi, Hee Min, and James P. Hobert. 2013. “The Pólya-Gamma Gibbs Sampler for Bayesian Logistic Regression Is Uniformly Ergodic.” Electronic Journal of Statistics 7: 2054–64. https://doi.org/10.1214/13-EJS837.

Chung, Hyung Won, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, et al. 2022. “Scaling Instruction-Finetuned Language Models.” arXiv. https://arxiv.org/abs/2210.11416.

Clark, Kevin, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. 2019. “What Does BERT Look at? An Analysis of BERT’s Attention.” In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 276–86. Association for Computational Linguistics.

Cook, R. Dennis. 2007. “Fisher Lecture: Dimension Reduction in Regression.” Statistical Science, 1–26. https://www.jstor.org/stable/27645799.

Cootner, Paul H. 1967. The Random Character of Stock Market Prices. MIT press.

Coppejans, Mark. 2004. “On Kolmogorov’s Representation of Functions of Several Variables by Functions of One Variable.” Journal of Econometrics 123 (1): 1–31.

Cover, T., and P. Hart. 1967. “Nearest Neighbor Pattern Classification.” IEEE Transactions on Information Theory 13 (1): 21–27.

Craven, Mark, and Jude W. Shavlik. 1996. “Extracting Tree-Structured Representations of Trained Networks.” In Advances in Neural Information Processing Systems, 8:24–30. MIT Press.

Dabney, Will, Georg Ostrovski, David Silver, and Rémi Munos. 2018. “Implicit Quantile Networks for Distributional Reinforcement Learning.” arXiv. https://arxiv.org/abs/1806.06923.

Davison, Anthony Christopher. 2003. Statistical Models. Vol. 11. Cambridge university press.

de Finetti, Bruno. 1937. “Foresight: Its Logical Laws, Its Subjective Sources.” In Studies in Subjective Probability, edited by Henry E. Kyburg and Howard E. Smokler, 93–158. New York: Wiley.

Dean, Jeffrey, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, et al. 2012. “Large Scale Distributed Deep Networks.” In Advances in Neural Information Processing Systems, 1223–31.

Dembo, Amir. 2021. “A Note on the Universal Approximation Capability of Deep Neural Networks.” arXiv Preprint arXiv:2104.xxxxx.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–86. Minneapolis, Minnesota: Association for Computational Linguistics.

Devroye, Luc. 1986. Non-Uniform Random Variate Generation. Springer Science & Business Media.

Diaconis, Persi, and Frederick and Mosteller. 1989. “Methods for Studying Coincidences.” Journal of the American Statistical Association 84 (408): 853–61.

Diaconis, Persi, and David Freedman. 1987. “A Dozen de Finetti-style Results in Search of a Theory.” In Annales de l’IHP Probabilités Et Statistiques, 23:397–423.

Diaconis, Persi, and Mehrdad Shahshahani. 1981. “Generating a Random Permutation with Random Transpositions.” Probability Theory and Related Fields 57 (2): 159–79.

———. 1984. “On Nonlinear Functions of Linear Combinations.” SIAM Journal on Scientific and Statistical Computing 5 (1): 175–91.

Diaconis, P., and D. Ylvisaker. 1983. “Quantifying Prior Opinion.”

Dietterich, Thomas G. 2000. “Ensemble Methods in Machine Learning.” In Multiple Classifier Systems, 1–15. Berlin, Heidelberg: Springer.

Dixon, Mark J., and Stuart G. Coles. 1997. “Modelling Association Football Scores and Inefficiencies in the Football Betting Market.” Journal of the Royal Statistical Society Series C: Applied Statistics 46 (2): 265–80.

Dixon, Matthew F, Nicholas G Polson, and Vadim O Sokolov. 2019. “Deep Learning for Spatio-Temporal Modeling: Dynamic Traffic Flows and High Frequency Trading.” Applied Stochastic Models in Business and Industry 35 (3): 788–807.

Dreyfus, Stuart. 1962. “The Numerical Solution of Variational Problems.” Journal of Mathematical Analysis and Applications 5 (1): 30–45.

———. 1973. “The Computational Solution of Optimal Control Problems with Time Lag.” IEEE Transactions on Automatic Control 18 (4): 383–85.

Duchi, John, Elad Hazan, and Yoram Singer. 2011. “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.” Journal of Machine Learning Research 12 (61): 2121–59.

Efron, Bradley, and Carl Morris. 1975. “Data Analysis Using Stein’s Estimator and Its Generalizations.” Journal of the American Statistical Association 70 (350): 311–19.

———. 1977. “Stein’s Paradox in Statistics.” Scientific American 236 (5): 119–27.

Enikolopov, Ruben, Vasily Korovkin, Maria Petrova, Konstantin Sonin, and Alexei Zakharov. 2013. “Field Experiment Estimate of Electoral Fraud in Russian Parliamentary Elections.” Proceedings of the National Academy of Sciences 110 (2): 448–52.

Fedus, William, Barret Zoph, and Noam Shazeer. 2022. “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.” Journal of Machine Learning Research 23 (120): 1–39.

Fefferman, Charles L. 2006. “Existence and Smoothness of the Navier–Stokes Equation.” The Millennium Prize Problems, 57–67.

Feller, William. 1971. An Introduction to Probability Theory and Its Applications. Wiley.

Feng, Guanhao, Nicholas G. Polson, and Jianeng Xu. 2016. “The Market for English Premier League (EPL) Odds.” Journal of Quantitative Analysis in Sports 12 (4). https://arxiv.org/abs/1604.03614.

Firpo, Sergio, Nicole M. Fortin, and Thomas Lemieux. 2009. “Unconditional Quantile Regressions.” Econometrica : Journal of the Econometric Society 77 (3): 953–73.

Fredholm, Ivar. 1903. “Sur Une Classe d’équations Fonctionnelles.” Acta Mathematica 27 (none): 365–90.

Friedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 1189–1232.

Friedman, Jerome H., and Werner Stuetzle. 1981. “Projection Pursuit Regression.” Journal of the American Statistical Association 76 (376): 817–23.

Frühwirth-Schnatter, Sylvia, and Rudolf Frühwirth. 2007. “Auxiliary Mixture Sampling with Applications to Logistic Models.” Computational Statistics & Data Analysis 51 (April): 3509–28.

———. 2010. “Data Augmentation and MCMC for Binary and Multinomial Logit Models.” In Statistical Modelling and Regression Structures: Festschrift in Honour of Ludwig Fahrmeir, 111–32.

Frühwirth-Schnatter, Sylvia, Rudolf Frühwirth, Leonhard Held, and Håvard Rue. 2008. “Improved Auxiliary Mixture Sampling for Hierarchical Models of Non-Gaussian Data.” Statistics and Computing 19 (4): 479.

Frühwirth-Schnatter, Sylvia, and Helga Wagner. 2010. “Stochastic Model Specification Search for Gaussian and Partial Non-Gaussian State Space Models.” Journal of Econometrics 154: 85–100.

Fukushima, Kunihiko. 1980. “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position.” Biological Cybernetics 36 (4): 193–202.

Gal, Yarin, and Zoubin Ghahramani. 2016. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In International Conference on Machine Learning, 1050–59.

Galton, Francis. 1907. “Vox Populi.” Nature 75: 450–51.

Galushkin, A. I. 1973. “Synthesis of Multilayer Systems of Pattern Recognition.” Neurocomputers and Their Application.

Gan, Link, and Alan Fritzler. 2016. “How to Become an Executive.”

García-Arenzana, Nicolás, Eva María Navarrete-Muñoz, Virginia Lope, Pilar Moreo, Carmen Vidal, Soledad Laso-Pablos, Nieves Ascunce, et al. 2014. “Calorie Intake, Olive Oil Consumption and Mammographic Density Among Spanish Women.” International Journal of Cancer 134 (8): 1916–25.

Gleick, James. 1992. Genius: The Life and Science of Richard Feynman. New York: Pantheon Books.

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65.

Gramacy, Robert B., and Nicholas G. Polson. 2012. “Simulation-Based Regularized Logistic Regression.” arXiv. https://arxiv.org/abs/1005.3430.

Griewank, Andreas, Kshitij Kulshreshtha, and Andrea Walther. 2012. “On the Numerical Stability of Algorithmic Differentiation.” Computing. Archives for Scientific Computing 94 (2-4): 125–49.

Guan, Xinyu, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, and Mao Yang. 2025. “rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking.” arXiv. https://arxiv.org/abs/2501.04519.

Hahn, P. Richard, Jared S. Murray, and Carlos M. Carvalho. 2020. “Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion).” Bayesian Analysis 15 (3): 965–1056.

Halevy, Alon, Peter Norvig, and Fernando Pereira. 2009. “The Unreasonable Effectiveness of Data.” IEEE Intelligent Systems 24 (2): 8–12.

Hardt, Moritz, Ben Recht, and Yoram Singer. 2016. “Train Faster, Generalize Better: Stability of Stochastic Gradient Descent.” In International Conference on Machine Learning, 1225–34. PMLR.

Hastie, Trevor, Andrea Montanari, Saharon Rosset, and Ryan J. Tibshirani. 2022. “Surprises in High-Dimensional Ridgeless Least Squares Interpolation.” The Annals of Statistics 50 (2): 949–86.

Heaton, J. B., and N. G. Polson. 2012. “Smart Money, Dumb Money: Learning Type from Price.” Working Paper.

Heaton, J. B., N. G. Polson, and Jan Hendrik Witte. 2016. “Deep Learning for Finance: Deep Portfolios.” Applied Stochastic Models in Business and Industry.

Held, Leonhard, and Chris C. Holmes. 2006. “Bayesian Auxiliary Variable Models for Binary and Multinomial Regression.” Bayesian Analysis 1 (1): 145–68.

Hermann, Jeremy, and Mike Del Balso. 2017. “Meet Michelangelo: Uber’s Machine Learning Platform.”

Hinton, Geoffrey, Nitish Srivastava, and Kevin Swersky. 2012. “Neural Networks for Machine Learning, Lecture 6a: Overview of Mini-Batch Gradient Descent.” Coursera Lecture.

Hou, Zhen, Hao Liu, Jiang Bian, Xing He, and Yan Zhuang. 2025. “Enhancing Medical Coding Efficiency Through Domain-Specific Fine-Tuned Large Language Models.” Npj Health Systems 2 (1): 14.

Huang, Gao, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E. Hopcroft, and Kilian Q. Weinberger. 2017. “Snapshot Ensembles: Train 1, Get M for Free.” In International Conference on Learning Representations.

Huber, Peter J. 1985. “Projection Pursuit.” The Annals of Statistics 13 (2): 435–75.

Hyndman, Rob J., and George Athanasopoulos. 2021. Forecasting: Principles and Practice. 3rd ed. edition. Melbourne, Australia: Otexts.

Igelnik, B., and N. Parikh. 2003. “Kolmogorov’s Spline Network.” IEEE Transactions on Neural Networks 14 (4): 725–33.

Immer, Alexander, Matthias Bauer, Vincent Fortuin, Gunnar Rätsch, and Khan Mohammad Emtiyaz. 2021. “Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning.” In International Conference on Machine Learning, 4563–73. PMLR.

Irwin, Neil. 2016. “How to Become a C.E.O.? The Quickest Path Is a Winding One.” The New York Times, September.

Iwata, Shigeru. 2001. “Recentered and Rescaled Instrumental Variable Estimation of Tobit and Probit Models with Errors in Variables.” Econometric Reviews 20 (3): 319–35.

Jacobs, Robert A., Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. “Adaptive Mixtures of Local Experts.” Neural Computation 3 (1): 79–87.

Januschowski, Tim, Yuyang Wang, Kari Torkkola, Timo Erkkilä, Hilaf Hasson, and Jan Gasthaus. 2022. “Forecasting with Trees.” International Journal of Forecasting, Special Issue: M5 competition, 38 (4): 1473–81.

Jeffreys, Harold. 1998. Theory of Probability. Third Edition, Third Edition. Oxford Classic Texts in the Physical Sciences. Oxford, New York: Oxford University Press.

Jiang, Wenxin, and Martin A. Tanner. 1999a. “Hierarchical Mixtures-of-Experts for Generalized Linear Models: Some Results on Denseness and Consistency.” In Proceedings of the Sixteenth International Conference on Machine Learning, 214–22. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

———. 1999b. “On the Identifiability of Mixtures-of-Experts.” Neural Networks 12 (9): 1253–58.

Jiménez-Luna, José, Francesca Grisoni, Nils Weskamp, and Gisbert Schneider. 2020. “DrugEx V2: De Novo Design of Drug Molecule by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology.” Journal of Cheminformatics 12 (1): 1–12.

Johannes, Michael S., Nick Polson, and Seung M. Yae. 2009. “Quantile Filtering and Learning.” SSRN Electronic Journal.

Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, et al. 2021. “Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89.

Kaczmarz, Stefan. 1937. “Angenäherte Auflösung von Systemen Linearer Gleichungen.” Bulletin International de l’Académie Polonaise Des Sciences Et Des Lettres 35: 355–57.

kaggle. 2020. “M5 Forecasting - Accuracy.” https://kaggle.com/competitions/m5-forecasting-accuracy.

Kallenberg, Olav. 1997. Foundations of Modern Probability. 2nd ed. edition. Springer.

Kalman, R. E., and R. S. Bucy. 1961. “New Results in Linear Filtering and Prediction Theory.” Journal of Basic Engineering 83 (1): 95–108.

Kalman, Rudolph Emil. 1960. “A New Approach to Linear Filtering and Prediction Problems.” Transactions of the ASME–Journal of Basic Engineering 82 (Series D): 35–45.

Kelly, J. L. 1956. “A New Interpretation of Information Rate.” Bell System Technical Journal 35 (4): 917–26.

Keskar, Nitish Shirish, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima.” arXiv Preprint arXiv:1609.04836. https://arxiv.org/abs/1609.04836.

Keynes, John Maynard. 1930. “Economic Possibilities for Our Grandchildren.” In Essays in Persuasion, 358–73. W. W. Norton & Company.

Kim, Young-Hoon, Jaehyung Shim, Hyoung-Seob Park, et al. 2024. “Diagnostic Accuracy of Single-Lead Handheld ECG Devices for Atrial Fibrillation Detection.” Journal of Cardiovascular Electrophysiology 35: 614–21.

Kingma, Diederik, and Jimmy Ba. 2014. “Adam: A Method for Stochastic Optimization.” arXiv Preprint arXiv:1412.6980. https://arxiv.org/abs/1412.6980.

Klartag, Bo’az. 2007. “A Central Limit Theorem for Convex Sets.” Inventiones Mathematicae 168 (1): 91–131.

Koenker, Roger. 2005. Quantile Regression. Econometric Society Monographs. Cambridge: Cambridge University Press.

Kolmogoroff, Andrei. 1931. “Über Die Analytischen Methoden in Der Wahrscheinlichkeitsrechnung.” Mathematische Annalen 104 (1): 415–58.

———. 1933. Grundbegriffe Der Wahrscheinlichkeitsrechnung. Vol. 2. Ergebnisse Der Mathematik Und Ihrer Grenzgebiete. Berlin: Springer.

Kolmogorov, AN. 1942. “Definition of Center of Dispersion and Measure of Accuracy from a Finite Number of Observations (in Russian).” Izv. Akad. Nauk SSSR Ser. Mat. 6: 3–32.

———. 1956. “On the Representation of Continuous Functions of Several Variables as Superpositions of Functions of Smaller Number of Variables.” In Soviet. Math. Dokl, 108:179–82.

Kolmogorov, Andrey N. 1933. Grundbegriffe Der Wahrscheinlichkeitsrechnung. Berlin: Springer.

Köppen, Mario. 2000. “The Curse of Dimensionality.” Proceedings of the 5th Online World Conference on Soft Computing in Industrial Applications 4: 4–8.

Kuleshov, Volodymyr, Nathan Fenner, and Stefano Ermon. 2018. “Accurate Uncertainties for Deep Learning Using Calibrated Regression.” Proceedings of the 35th International Conference on Machine Learning, 2796–2804.

LeCun, Yann, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. 2002. “Efficient Backprop.” In Neural Networks: Tricks of the Trade, 9–50. Springer.

Levina, Elizaveta, and Peter Bickel. 2001. “The Earth Mover’s Distance Is the Mallows Distance: Some Insights from Statistics.” In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, 2:251–56. IEEE.

Lin, Zhouhan, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. “A Structured Self-attentive Sentence Embedding.” arXiv. https://arxiv.org/abs/1703.03130.

Lindgren, Georg. 1978. “Markov Regime Models for Mixed Distributions and Switching Regressions.” Scandinavian Journal of Statistics 5 (2): 81–91. https://www.jstor.org/stable/4615692.

Lindley, D. V. 1961. “The Use of Prior Probability Distributions in Statistical Inference and Decisions.” In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, 4.1:453–69. University of California Press.

Linnainmaa, Seppo. 1970. “The Representation of the Cumulative Rounding Error of an Algorithm as a Taylor Expansion of the Local Rounding Errors.” Master’s Thesis (in Finnish), Univ. Helsinki, 6–7.

Logunov, A. A. 2004. “Henri Poincare and Relativity Theory.” https://arxiv.org/abs/physics/0408077.

Lorentz, George G. 1976. “The 13th Problem of Hilbert.” In Proceedings of Symposia in Pure Mathematics, 28:419–30. American Mathematical Society.

Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 4765–74. Curran Associates, Inc.

Mackay, Charles. 1841. Extraordinary Popular Delusions and the Madness of Crowds. London: Richard Bentley.

MacKay, David JC. 1992. “Bayesian Interpolation.” Neural Computation 4 (3): 415–47.

Maharaj, Shiva, Nick Polson, and Vadim Sokolov. 2023. “Kramnik Vs Nakamura or Bayes Vs p-Value.” {{SSRN Scholarly Paper}}. Rochester, NY.

Malthouse, Edward, Richard Mah, and Ajit Tamhane. 1997. “Nonlinear Partial Least Squares.” Computers & Chemical Engineering 12 (April): 875–90.

Mehrasa, Nazanin, Yatao Zhong, Frederick Tung, Luke Bornn, and Greg Mori. 2017. “Learning Person Trajectory Representations for Team Activity Analysis.” arXiv Preprint arXiv:1706.00893. https://arxiv.org/abs/1706.00893.

Metropolis, Nicholas. 1987. “The Beginning of the Monte Carlo Method.” Los Alamos Science 15: 125–30.

Metropolis, Nicholas, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953. “Equation of State Calculations by Fast Computing Machines.” The Journal of Chemical Physics 21 (6): 1087–92.

Metropolis, Nicholas, and Stanislaw Ulam. 1949. “The Monte Carlo Method.” Journal of the American Statistical Association 44 (247): 335–41.

Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” arXiv. https://arxiv.org/abs/1301.3781.

Milman, Vitali D, and Gideon Schechtman. 2009. Asymptotic Theory of Finite Dimensional Normed Spaces: Isoperimetric Inequalities in Riemannian Manifolds. Vol. 1200. Springer.

Minsky, Marvin, and Seymour Papert. 1969. Perceptrons: An Introduction to Computational Geometry. Cambridge, MA: MIT Press.

Miotto, Marilù, Nicola Rossberg, and Bennett Kleinberg. 2022. “Who Is GPT-3? An Exploration of Personality, Values and Demographics.” arXiv Preprint arXiv:2209.14338. https://arxiv.org/abs/2209.14338.

Mnih, Volodymyr, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. “Recurrent Models of Visual Attention.” Advances in Neural Information Processing Systems 27: 2204–12.

Nadaraya, E. A. 1964. “On Estimating Regression.” Theory of Probability & Its Applications 9 (1): 141–42.

Naik, Prasad, and Chih-Ling Tsai. 2000. “Partial Least Squares Estimator for Single-Index Models.” Journal of the Royal Statistical Society. Series B (Statistical Methodology) 62 (4): 763–71. https://www.jstor.org/stable/2680619.

Nakkiran, Preetum, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever. 2021. “Deep Double Descent: Where Bigger Models and More Data Hurt*.” Journal of Statistical Mechanics: Theory and Experiment 2021 (12): 124003.

Nareklishvili, Maria, Nicholas Polson, and Vadim Sokolov. 2022. “Deep Partial Least Squares for Iv Regression.” arXiv Preprint arXiv:2207.02612. https://arxiv.org/abs/2207.02612.

———. 2023a. “Generative Causal Inference,” June. https://arxiv.org/abs/2306.16096.

———. 2023b. “Feature Selection for Personalized Policy Analysis,” July. https://arxiv.org/abs/2301.00251.

Nelder, J. A., and R. W. M. Wedderburn. 1972. “Generalized Linear Models.” Royal Statistical Society. Journal. Series A: General 135 (3): 370–84.

Nesterov, Yurii. 1983. “A Method of Solving a Convex Programming Problem with Convergence Rate O (1/K2).” In Soviet Mathematics Doklady, 27:372–76.

Nicosia, Luca, Giulia Gnocchi, Ilaria Gorini, Massimo Venturini, Federico Fontana, Filippo Pesapane, Ida Abiuso, et al. 2023. “History of Mammography: Analysis of Breast Imaging Diagnostic Achievements over the Last Century.” Healthcare 11 (1596).

Ostrovskii, GM, Yu M Volin, and WW Borisov. 1971. “Uber Die Berechnung von Ableitungen.” Wissenschaftliche Zeitschrift Der Technischen Hochschule f Ur Chemie, Leuna-Merseburg 13 (4): 382–84.

Ouyang, Long, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, et al. 2022. “Training Language Models to Follow Instructions with Human Feedback.” Advances in Neural Information Processing Systems 35: 27730–44.

Pan, Zhenyu, Haozheng Luo, Manling Li, and Han Liu. 2025. “Chain-of-Action: Faithful and Multimodal Question Answering Through Large Language Models.” arXiv. https://arxiv.org/abs/2403.17359.

Park, Joon Sung, Lindsay Popowski, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2024. “Generative Agent Simulations of 1,000 People.” arXiv Preprint arXiv:2411.10109. https://arxiv.org/abs/2411.10109.

Parzen, Emanuel. 2004. “Quantile Probability and Statistical Data Modeling.” Statistical Science 19 (4): 652–62. https://www.jstor.org/stable/4144436.

Petris, Giovanni. 2010. “An R Package for Dynamic Linear Models.” Journal of Statistical Software 36 (October): 1–16.

Poincaré, Henri. 1898. “La Mesure Du Temps.” Revue de Métaphysique Et de Morale 6 (1): 1–13.

———. 1952. Science and Hypothesis. New York]: Dover Publications.

Polson, Nicholas. 1996. “Convergence of Markov Chain Monte Carlo Algorithms (with Discussion).” Bayesian Statistics 5: 297–321.

Polson, Nicholas G., and James G. Scott. 2012. “Good, Great, or Lucky? Screening for Firms with Sustained Superior Performance Using Heavy-Tailed Priors.” The Annals of Applied Statistics 6 (1): 161–85.

———. 2016. “Mixtures, Envelopes and Hierarchical Duality.” Journal of the Royal Statistical Society Series B: Statistical Methodology 78 (4): 701–27.

Polson, Nicholas G., James G. Scott, and Jesse Windle. 2013. “Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables.” Journal of the American Statistical Association 108 (504): 1339–49.

Polson, Nicholas G., and Steven L. Scott. 2011. “Data Augmentation for Support Vector Machines.” Bayesian Analysis 6 (1): 1–23.

Polson, Nicholas G, and James Scott. 2018. AIQ: How People and Machines Are Smarter Together. St. Martin’s Press.

Polson, Nicholas G., and Vadim Sokolov. 2023. “Generative AI for Bayesian Computation.” https://arxiv.org/abs/2305.14972.

Polson, Nicholas G, Vadim Sokolov, et al. 2017. “Deep Learning: A Bayesian Perspective.” Bayesian Analysis 12 (4): 1275–1304.

Polson, Nicholas, and Vadim Sokolov. 2020. “Deep Learning: Computational Aspects.” Wiley Interdisciplinary Reviews: Computational Statistics 12 (5): e1500.

Polson, Nicholas, Vadim Sokolov, and Jianeng Xu. 2021. “Deep Learning Partial Least Squares.” arXiv Preprint arXiv:2106.14085. https://arxiv.org/abs/2106.14085.

Polson, Nick, Fabrizio Ruggeri, and Vadim Sokolov. 2024. “Generative Bayesian Computation for Maximum Expected Utility.” Entropy 26 (12): 1076.

Polson, Nick, and Vadim Sokolov. 2025. “Negative Probability.” Applied Stochastic Models in Business and Industry 41 (1): e2910.

Polson, Nick, Vadim Sokolov, and Jianeng Xu. 2023. “Quantum Bayesian Computation.” Applied Stochastic Models in Business and Industry 39 (6): 869–83.

Poplin, Ryan, Avinash V Varadarajan, Katy Blumer, Yun Liu, Michael V McConnell, Greg S Corrado, Lily Peng, and Dale R Webster. 2018. “Prediction of Cardiovascular Risk Factors from Retinal Fundus Photographs via Deep Learning.” Nature Biomedical Engineering 2 (3): 158.

Qian, Chen, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, et al. 2024. “ChatDev: Communicative Agents for Software Development.” arXiv. https://arxiv.org/abs/2307.07924.

Radford, Alec, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. “Improving Language Understanding by Generative Pre-Training.” OpenAI.

Raffel, Colin, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.” Journal of Machine Learning Research 21 (140): 1–67.

Ramsey, Frank P. 1926. “Truth and Probability.” Histoy of {{Economic Thought Chapters}}. McMaster University Archive for the History of Economic Thought.

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “"Why Should I Trust You?": Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44. ACM.

Riquelme, Carlos, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, and Neil Houlsby. 2021. “Scaling Vision with Sparse Mixture of Experts.” In Advances in Neural Information Processing Systems, 34:8583–95.

Ritter, Hippolyt, Aleksandar Botev, and David Barber. 2018. “A Scalable Laplace Approximation For Neural Networks.”

Robbins, Herbert, and Sutton Monro. 1951. “A Stochastic Approximation Method.” The Annals of Mathematical Statistics 22 (3): 400–407.

Roberts, Gareth O., and Jeffrey S. Rosenthal. 2001. “Optimal Scaling for Various Metropolis-Hastings Algorithms.” Statistical Science 16 (4): 351–67.

Romano, Yaniv, Evan Patterson, and Emmanuel Candes. 2019. “Conformalized Quantile Regression.” In Advances in Neural Information Processing Systems. Vol. 32. Curran Associates, Inc.

Rosenblatt, F. 1958. “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain.” Psychological Review 65 (6): 386–408.

Rubin, Hal S. Stern, John B. Carlin. 2015. Bayesian Data Analysis. 3rd ed. New York: Chapman and Hall/CRC.

Rumelhart, David E, Geoffrey E Hinton, and Ronald J Williams. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323 (6088): 533.

Salinas, David, Valentin Flunkert, and Jan Gasthaus. 2019. “DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.” arXiv:1704.04110 [Cs, Stat], February. https://arxiv.org/abs/1704.04110.

Schmidhuber, Jürgen. 2015. “Deep Learning in Neural Networks: An Overview.” Neural Networks 61: 85–117.

Schmidt-Hieber, Johannes. 2021. “The Kolmogorov–Arnold Representation Theorem Revisited.” Neural Networks 137 (May): 119–26.

Schwertman, Neil C, AJ Gilks, and J Cameron. 1990. “A Simple Noncalculus Proof That the Median Minimizes the Sum of the Absolute Deviations.” The American Statistician 44 (1): 38–39.

Scott, Steven L. 2002. “Bayesian Methods for Hidden Markov Models.” Journal of the American Statistical Association 97 (457): 337–51.

———. 2015. “Multi-Armed Bandit Experiments in the Online Service Economy.” Applied Stochastic Models in Business and Industry 31 (1): 37–45.

Scott, Steven L. 2022. “BoomSpikeSlab: MCMC for Spike and Slab Regression.”

Scott, Steven L., and Hal R. Varian. 2015. “Bayesian Variable Selection for Nowcasting Economic Time Series.” In Economic Analysis of the Digital Economy, 119–35. University of Chicago Press.

Scott, Steven, and Hal Varian. 2014. “Predicting the Present with Bayesian Structural Time Series.” Int. J. Of Mathematical Modelling and Numerical Optimisation 5 (January): 4–23.

Sellke, Thomas, M. J Bayarri, and James O Berger. 2001. “Calibration of ρ Values for Testing Precise Null Hypotheses.” The American Statistician 55 (1): 62–71.

Selvaraju, Ramprasaath R., Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.” In Proceedings of the IEEE International Conference on Computer Vision, 618–26. IEEE.

Shapiro, Sam. 1988. “Selection, Follow-up, and Analysis in the Health Insurance Plan Study: A Randomized Trial with Breast Cancer Screening.” Journal of the National Cancer Institute 80 (14): 1125–32.

Shazeer, Noam, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.” In International Conference on Learning Representations.

Shimony, Abner. 1955. “Coherence and the Axioms of Confirmation.” The Journal of Symbolic Logic 20 (1): 1–28. https://www.jstor.org/stable/2268039.

Shinn, Noah, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. “Reflexion: Language Agents with Verbal Reinforcement Learning.” https://arxiv.org/abs/2303.11366.

Shiryayev, A. N. 1992. “On Analytical Methods in Probability Theory.” In Selected Works of a. N. Kolmogorov: Volume II Probability Theory and Mathematical Statistics, edited by A. N. Shiryayev, 62–108. Dordrecht: Springer Netherlands.

Silver, David, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, et al. 2017. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.” arXiv. https://arxiv.org/abs/1712.01815.

Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman. 2013. “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps.” arXiv Preprint arXiv:1312.6034. https://arxiv.org/abs/1312.6034.

Simpson, Edward. 2010. “Edward Simpson: Bayes at Bletchley Park.” Significance 7 (2): 76–80.

Singh, Pratyush Kumar, Kathryn A. Farrell-Maupin, and Danial Faghihi. 2024. “A Framework for Strategic Discovery of Credible Neural Network Surrogate Models Under Uncertainty.” arXiv. https://arxiv.org/abs/2403.08901.

Smith, A. F. M. 1975. “A Bayesian Approach to Inference about a Change-Point in a Sequence of Random Variables.” Biometrika 62 (2): 407–16. https://www.jstor.org/stable/2335381.

Sokolov, Vadim. 2017. “Discussion of ‘Deep Learning for Finance: Deep Portfolios’.” Applied Stochastic Models in Business and Industry 33 (1): 16–18.

Spiegelhalter, David, and Yin-Lam Ng. 2009. “One Match to Go!” Significance 6 (4): 151–53.

Sprecher, David A. 1965. “On the Structure of Continuous Functions of Several Variables.” Transactions of the American Mathematical Society 115: 340–55.

Srivastava, Nitish, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” Journal of Machine Learning Research 15 (1): 1929–58.

Stein, Charles. 1964. “Inadmissibility of the Usual Estimator for the Variance of a Normal Distribution with Unknown Mean.” Annals of the Institute of Statistical Mathematics 16 (1): 155–60.

Stern, Hal S. 1994. “A Brownian Motion Model for the Progress of Sports Scores.” Journal of the American Statistical Association 89 (427): 1128–34.

Stern, H, Adam Sugano, J Albert, and R Koning. 2007. “Inference about Batter-Pitcher Matchups in Baseball from Small Samples.” Statistical Thinking in Sports, 153–65.

Stigler, Stephen M. 1981. “Gauss and the Invention of Least Squares.” The Annals of Statistics, 465–74.

Stroud, Jonathan R., Peter Müller, and Nicholas G. Polson. 2003. “Nonlinear State-Space Models with State-Dependent Variances.” Journal of the American Statistical Association 98 (462): 377–86. https://www.jstor.org/stable/30045247.

Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. 2017. “Axiomatic Attribution for Deep Networks.” In Proceedings of the 34th International Conference on Machine Learning, 3319–28. PMLR.

Sutskever, Ilya, James Martens, George Dahl, and Geoffrey Hinton. 2013. “On the Importance of Initialization and Momentum in Deep Learning.” In International Conference on Machine Learning, 1139–47.

Tabar, Laszlo, CJ Gad Fagerberg, Anders Gad, Lennart Baldetorp, Lars H Holmberg, Ove Gröntoft, Ulf Ljungquist, et al. 1985. “Reduction in Mortality from Breast Cancer After Mass Screening with Mammography: Randomised Trial from the Breast Cancer Screening Working Group of the Swedish National Board of Health and Welfare.” The Lancet 325 (8433): 829–32.

Taleb, Nassim Nicholas. 2007. The Black Swan: The Impact of the Highly Improbable. Annotated edition. New York. N.Y: Random House.

Tesauro, Gerald. 1995. “Temporal Difference Learning and TD-Gammon.” Communications of the ACM 38 (3): 58–68.

Tiao, Louis. 2019. “Pólya-Gamma Bayesian Logistic Regression.” Blog post.

Tikhonov, Andrei N. 1963. “Solution of Incorrectly Formulated Problems and the Regularization Method.” Sov Dok 4: 1035–38.

Tikhonov, Andrey Nikolayevich et al. 1943. “On the Stability of Inverse Problems.” In Dokl. Akad. Nauk Sssr, 39:195–98.

Tsai, Yao-Hung Hubert, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. “Transformer Dissection: A Unified Understanding of Transformer’s Attention via the Lens of Kernel.” arXiv. https://arxiv.org/abs/1908.11775.

Turing, A. M. 1950. “Computing Machinery and Intelligence.” Mind; a Quarterly Review of Psychology and Philosophy 59 (236): 433–60.

Varian, Hal R. 2010. “Computer Mediated Transactions.” American Economic Review 100 (2): 1–10.

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. “Attention Is All You Need.” arXiv. https://arxiv.org/abs/1706.03762.

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30: 5998–6008.

Veblen, Thorstein. 1899. The Theory of the Leisure Class: An Economic Study of Institutions. New York: Macmillan.

———. 1921. The Engineers and the Price System. New York: B. W. Huebsch.

Vecer, Jan, Frantisek Kopriva, and Tomoyuki Ichiba. 2009. “Estimating the Effect of the Red Card in Soccer: When to Commit an Offense in Exchange for Preventing a Goal Opportunity.” Journal of Quantitative Analysis in Sports 5 (1).

Ville, Jean. 1939. “Étude Critique de La Notion de Collectif.” Thèses de l’entre-Deux-Guerres. PhD thesis, Université de Paris.

Viterbi, A. 1967. “Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm.” IEEE Transactions on Information Theory 13 (2): 260–69.

Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harvard Journal of Law & Technology 31: 841–87.

Wang, Shaun. 1996. “Premium Calculation by Transforming the Layer Premium Density.” ASTIN Bulletin 26 (1): 71–92.

Watanabe, Sumio. 2013. “A Widely Applicable Bayesian Information Criterion.” The Journal of Machine Learning Research 14 (1): 867–97.

Watson, Geoffrey S. 1964. “Smooth Regression Analysis.” Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) 26 (4): 359–72. https://www.jstor.org/stable/25049340.

Wehenkel, Antoine, and Gilles Louppe. 2019. “Unconstrained Monotonic Neural Networks.” Advances in Neural Information Processing Systems 32: 1545–55.

Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” arXiv. https://arxiv.org/abs/2201.11903.

Werbos, Paul. 1974. “Beyond Regression:" New Tools for Prediction and Analysis in the Behavioral Sciences.” Ph. D. Dissertation, Harvard University.

Werbos, Paul J. 1982. “Applications of Advances in Nonlinear Sensitivity Analysis.” In System Modeling and Optimization, 762–70. Springer.

West, Mike, and Jeff Harrison. 1997. Bayesian Forecasting and Dynamic Models. Springer.

Wiener, Norbert. 1950. The Human Use of Human Beings: Cybernetics and Society. Boston: Houghton Mifflin.

Windle, Jesse, Nicholas G. Polson, and James G. Scott. 2014. “Sampling Polya-Gamma Random Variates: Alternate and Approximate Techniques.” arXiv. https://arxiv.org/abs/1405.0506.

Wojna, Zbigniew, Alex Gorban, Dar-Shyang Lee, Kevin Murphy, Qian Yu, Yeqing Li, and Julian Ibarz. 2017. “Attention-Based Extraction of Structured Information from Street View Imagery.” arXiv Preprint arXiv:1704.03549. https://arxiv.org/abs/1704.03549.

Wold, Herman. 1975/ed. “Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (NIPALS) Approach.” Journal of Applied Probability 12 (S1): 117–42.

Yaari, Menahem E. 1987. “The Dual Theory of Choice Under Risk.” Econometrica 55 (1): 95–115. https://www.jstor.org/stable/1911158.

Yang, An, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, et al. 2025. “Qwen2. 5-1m Technical Report.” arXiv Preprint arXiv:2501.15383. https://arxiv.org/abs/2501.15383.

Yao, Shunyu, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. “Tree of Thoughts: Deliberate Problem Solving with Large Language Models.” arXiv. https://arxiv.org/abs/2305.10601.

Ye, Tong, Shijing Si, Jianzong Wang, Ning Cheng, Zhitao Li, and Jing Xiao. 2023. “On the Calibration and Uncertainty with Pólya-Gamma Augmentation for Dialog Retrieval Models.” In Proceedings of the AAAI Conference on Artificial Intelligence. https://arxiv.org/abs/2303.08606.

Ye, Yixin, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, and Pengfei Liu. 2025. “LIMO: Less Is More for Reasoning.” arXiv. https://arxiv.org/abs/2502.03387.

Zhang, Wei, Katsuyuki Itoh, Jun Tanida, and Yoshiki Ichioka. 1988. “Shift-Invariant Pattern Recognition Neural Network and Its Optical Architecture.” Proceedings of Annual Conference of the Japan Society of Applied Physics.

Zhang, Yichi, Anirban Datta, and Sudipto Banerjee. 2018. “Scalable Gaussian Process Classification with Pólya-Gamma Data Augmentation.” arXiv Preprint arXiv:1802.06383. https://arxiv.org/abs/1802.06383.