References

A. N. Kolmogorov. 1938. “On the Analytic Methods of Probability Theory.” Rossíiskaya Akademiya Nauk, no. 5: 5–41.
Actor, Jonas. 2018. “Computation for the Kolmogorov Superposition Theorem.” {{MS Thesis}}, Rice.
Albert, Jim. 1993. “A Statistical Analysis of Hitting Streaks in Baseball: Comment.” Journal of the American Statistical Association 88 (424): 1184–88.
Altić, Mirela Slukan. 2013. “Exploring Along the Rome Meridian: Roger Boscovich and the First Modern Map of the Papal States.” In History of Cartography: International Symposium of the ICA, 2012, 71–89. Springer.
Amazon. 2021. “The History of Amazon’s Forecasting Algorithm.”
Amit, Yali, Gilles Blanchard, and Kenneth Wilder. 2000. “Multiple Randomized Classifiers: MRCL.”
Andrews, D. F., and C. L. Mallows. 1974. “Scale Mixtures of Normal Distributions.” Journal of the Royal Statistical Society. Series B (Methodological) 36 (1): 99–102.
Apley, Daniel W., and Jingyu Zhu. 2020. “Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models.” Journal of the Royal Statistical Society Series B: Statistical Methodology 82 (4): 1059–86.
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. 2017. “Wasserstein Generative Adversarial Networks.” Proceedings of the 34th International Conference on Machine Learning, 214–23.
Armitage, Peter. 1975. Sequential Medical Trials. 2nd ed. Oxford: Blackwell Scientific Publications.
Arnol’d, Vladimir I. 2006. “Forgotten and Neglected Theories of Poincaré.” Russian Mathematical Surveys 61 (1): 1.
Ayala, Orlando, and Patrice Bechard. 2024. “Reducing Hallucination in Structured Outputs via Retrieval-Augmented Generation.” In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track), 228–38. Mexico City, Mexico: Association for Computational Linguistics.
Bach, Francis. 2024. “High-Dimensional Analysis of Double Descent for Linear Regression with Random Projections.” SIAM Journal on Mathematics of Data Science 6 (1): 26–50.
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2014. “Neural Machine Translation by Jointly Learning to Align and Translate.” arXiv. https://arxiv.org/abs/1409.0473.
Bai, Yuntao, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, et al. 2022. “Constitutional AI: Harmlessness from AI Feedback.” arXiv. https://arxiv.org/abs/2212.08073.
Barron, Andrew R. 1993. “Universal Approximation Bounds for Superpositions of a Sigmoidal Function.” IEEE Transactions on Information Theory 39 (3): 930–45.
Baum, Leonard E., Ted Petrie, George Soules, and Norman Weiss. 1970. “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains.” The Annals of Mathematical Statistics 41 (1): 164–71.
Behnia, Farnaz, Dominik Karbowski, and Vadim Sokolov. 2023. “Deep Generative Models for Vehicle Speed Trajectories.” Applied Stochastic Models in Business and Industry 39 (5): 701–19.
Behrouz, Ali, and Mohammad Pezeshki. 2025. MIRAS: Memory as an Optimization Object.” Google Research.
Behrouz, Ali, Mohammad Pezeshki, and Rasool Fakoor. 2025. “Titans: Learning to Memorize at Test Time.” arXiv Preprint arXiv:2501.00663. https://arxiv.org/abs/2501.00663.
Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019. “Reconciling Modern Machine-Learning Practice and the Classical Bias–Variance Trade-Off.” Proceedings of the National Academy of Sciences 116 (32): 15849–54.
Bellemare, Marc G., Will Dabney, and Rémi Munos. 2017. “A Distributional Perspective on Reinforcement Learning.” Proceedings of the 34th International Conference on Machine Learning, 449–58.
Benda, Norbert, Michael Branson, Willi Maurer, and Tim Friede. 2016. “Sequential Designs with Small Samples: Evaluation and Recommendations for Normal Responses.” Statistics in Medicine 35 (19): 3215–30.
Benoit, Dries F., and Dirk Van den Poel. 2012. “Binary Quantile Regression: A Bayesian Approach Based on the Asymmetric Laplace Distribution.” Journal of Applied Econometrics 27 (7): 1174–88.
Berge, Travis, Nitish Sinha, and Michael Smolyansky. 2016. “Which Market Indicators Best Forecast Recessions?” FEDS Notes, August.
Berry, Donald A. 1985. “Interim Analyses in Clinical Trials: Classical Vs. Bayesian Approaches.” Statistics in Medicine 4 (4): 521–26.
Berry, Scott M., Bradley P. Carlin, J. Jack Lee, and Peter Müller. 2010. Bayesian Adaptive Methods for Clinical Trials. Boca Raton: CRC Press.
Bhadra, Anindya, Jyotishka Datta, Nick Polson, Vadim Sokolov, and Jianeng Xu. 2021. “Merging Two Cultures: Deep and Statistical Learning.” arXiv Preprint arXiv:2110.11561. https://arxiv.org/abs/2110.11561.
Billingsley, Patrick. 1995. Probability and Measure. 3rd ed. New York: John Wiley & Sons.
Bird, Steven, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. Beijing ; Cambridge Mass.: O’Reilly Media.
Bonfiglio, Rita, Annarita Granaglia, Raffaella Giocondo, Manuel Scimeca, and Elena Bonanno. 2021. “Molecular Aspects and Prognostic Significance of Microcalcifications in Human Pathology: A Narrative Review.” International Journal of Molecular Sciences 22 (120).
Bottou, Léon, Frank E Curtis, and Jorge Nocedal. 2018. “Optimization Methods for Large-Scale Machine Learning.” SIAM Review 60 (2): 223–311.
Braun, Heinrich, and Martin Riedmiller. 2009. “Constructive Neural Network Learning Algorithms for Pattern Classification.” IEEE Transactions on Neural Networks 20 (1): 84–97.
Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32.
Brier, Glenn W. 1950. “Verification of Forecasts Expressed in Terms of Probability.” Monthly Weather Review 78 (1): 1–3.
Brillinger, David R. 2012. “A Generalized Linear Model With Gaussian Regressor Variables.” In Selected Works of David Brillinger, edited by Peter Guttorp and David Brillinger, 589–606. Selected Works in Probability and Statistics. New York, NY: Springer.
Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” Advances in Neural Information Processing Systems 33: 1877–1901.
Bryson, Arthur E. 1961. “A Gradient Method for Optimizing Multi-Stage Allocation Processes.” In Proc. Harvard Univ. Symposium on Digital Computers and Their Applications. Vol. 72.
Bryson, Arthur E., and Yu-Chi Ho. 1969. Applied Optimal Control: Optimization, Estimation, and Control. Waltham, MA: Blaisdell Publishing Company.
Bumgarner, John M., Chad T. Lambert, Ayman A. Hussein, Daniel J. Cantillon, Bryan Baranowski, Kathy Wolski, Bruce D. Lindsay, Oussama M. Wazni, and Khaldoun G. Tarakji. 2018. “Smartwatch Algorithm for Automated Detection of Atrial Fibrillation.” Journal of the American College of Cardiology 71 (21): 2381–88.
Camerer, Colin F. 2003. Behavioral Game Theory: Experiments in Strategic Interaction. The Roundtable Series in Behavioral Economics. New York Princeton: Russell sage foundation Princeton university press.
Campagnoli, Patrizia, Sonia Petrone, and Giovanni Petris. 2009. Dynamic Linear Models with R. New York, NY: Springer.
Cannon, Alex J. 2018. “Non-Crossing Nonlinear Regression Quantiles by Monotone Composite Quantile Regression Neural Network, with Application to Rainfall Extremes.” Stochastic Environmental Research and Risk Assessment 32 (11): 3207–25.
Carlin, Bradley P, Nicholas G Polson, and David S Stoffer. 1992. “A Monte Carlo Approach to Nonnormal and Nonlinear State-Space Modeling.” Journal of the American Statistical Association 87 (418): 493–500.
Carter, Chris K, and Robert Kohn. 1994. “On Gibbs Sampling for State Space Models.” Biometrika 81 (3): 541–53.
Carvalho, Carlos M, Hedibert F Lopes, Nicholas G Polson, and Matt A Taddy. 2010. “Particle Learning for General Mixtures.” Bayesian Analysis 5 (4): 709–40.
Carvalho, Carlos M., Nicholas G. Polson, and James G. Scott. 2010. “The Horseshoe Estimator for Sparse Signals.” Biometrika, asq017.
Chen, Charlie, Sebastian Borgeaud, Jean-Baptiste Alayrac, Eliza Buchatskaya, Sebastian Bodnariu, Benoit Steiner, Junteng Jia, et al. 2023. “Accelerating Large Language Model Decoding with Speculative Sampling.” arXiv Preprint arXiv:2302.01318. https://arxiv.org/abs/2302.01318.
Chen, Cong, Naitee Li, Shuai Yuan, Zoran Antonijevic, Wei Guo, et al. 2022. “Application of Bayesian Methods to Accelerate Rare Disease Drug Development: Scopes and Hurdles.” Orphanet Journal of Rare Diseases 17: 186.
Chen, Tianqi, and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–94. New York, NY, USA: ACM.
Chernozhukov, Victor, Iván Fernández-Val, and Alfred Galichon. 2010. “Quantile and Probability Curves Without Crossing.” Econometrica : Journal of the Econometric Society 78 (3): 1093–1125.
Chib, Siddhartha. 1998. “Estimation and Comparison of Multiple Change-Point Models.” Journal of Econometrics 86 (2): 221–41.
Choi, Hee Min, and James P Hobert. 2013. “Uniform Ergodicity of the Polya-Gamma Gibbs Sampler.” Electronic Journal of Statistics 7: 2054–64.
Chroma Research. 2024. “Evaluating Chunking Strategies for Retrieval.”
Chung, Hyung Won, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, et al. 2022. “Scaling Instruction-Finetuned Language Models.” arXiv. https://arxiv.org/abs/2210.11416.
Clark, Kevin, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. 2019. “What Does BERT Look at? An Analysis of BERT’s Attention.” In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 276–86. Association for Computational Linguistics.
Cootner, Paul H. 1967. The Random Character of Stock Market Prices. MIT press.
Coppejans, Mark. 2004. “On Kolmogorov’s Representation of Functions of Several Variables by Functions of One Variable.” Journal of Econometrics 123 (1): 1–31.
Cover, T., and P. Hart. 1967. “Nearest Neighbor Pattern Classification.” IEEE Transactions on Information Theory 13 (1): 21–27.
Cover, Thomas M., and Joy A. Thomas. 2006. Elements of Information Theory. John Wiley & Sons.
Craven, Mark, and Jude W. Shavlik. 1996. “Extracting Tree-Structured Representations of Trained Networks.” In Advances in Neural Information Processing Systems, 8:24–30. MIT Press.
Dabney, Will, Georg Ostrovski, David Silver, and Rémi Munos. 2018. “Implicit Quantile Networks for Distributional Reinforcement Learning.” arXiv. https://arxiv.org/abs/1806.06923.
Dao, Tri. 2023. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning.” arXiv Preprint arXiv:2307.08691. https://arxiv.org/abs/2307.08691.
Das, Payel, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, et al. 2024. “Larimar: Large Language Models with Episodic Memory Control.” In Proceedings of the 41st International Conference on Machine Learning (ICML).
Davison, Anthony Christopher. 2003. Statistical Models. Vol. 11. Cambridge university press.
de Finetti, Bruno. 1937. “Foresight: Its Logical Laws, Its Subjective Sources.” In Studies in Subjective Probability, edited by Henry E. Kyburg and Howard E. Smokler, 93–158. New York: Wiley.
———. 1940. “Il Problema Dei Pieni.” Giornale Dell’Istituto Italiano Degli Attuari 11 (1): 1–88.
Dean, Jeffrey, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, et al. 2012. “Large Scale Distributed Deep Networks.” In Advances in Neural Information Processing Systems, 1223–31.
DeGroot, Morris H. 1974. “Reaching a Consensus.” Journal of the American Statistical Association 69 (345): 118–21.
Dembo, Amir. 2021. “A Note on the Universal Approximation Capability of Deep Neural Networks.” arXiv Preprint arXiv:2104.xxxxx.
DeMets, David L., and K. K. Gordon Lan. 1994. “Interim Analysis: The Alpha Spending Function Approach.” Statistics in Medicine 13 (13-14): 1341–52.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–86. Minneapolis, Minnesota: Association for Computational Linguistics.
Devroye, Luc. 1986. Non-Uniform Random Variate Generation. Springer Science & Business Media.
Diaconis, Persi, and Frederick and Mosteller. 1989. “Methods for Studying Coincidences.” Journal of the American Statistical Association 84 (408): 853–61.
Diaconis, Persi, and David Freedman. 1987. “A Dozen de Finetti-style Results in Search of a Theory.” In Annales de l’IHP Probabilités Et Statistiques, 23:397–423.
Diaconis, Persi, and Mehrdad Shahshahani. 1981. “Generating a Random Permutation with Random Transpositions.” Probability Theory and Related Fields 57 (2): 159–79.
———. 1984. “On Nonlinear Functions of Linear Combinations.” SIAM Journal on Scientific and Statistical Computing 5 (1): 175–91.
Diaconis, P., and D. Ylvisaker. 1983. “Quantifying Prior Opinion.”
Dietterich, Thomas G. 2000. “Ensemble Methods in Machine Learning.” In Multiple Classifier Systems, 1–15. Berlin, Heidelberg: Springer.
Dixon, Mark J., and Stuart G. Coles. 1997. “Modelling Association Football Scores and Inefficiencies in the Football Betting Market.” Journal of the Royal Statistical Society Series C: Applied Statistics 46 (2): 265–80.
Dixon, Matthew F, Nicholas G Polson, and Vadim O Sokolov. 2019. “Deep Learning for Spatio-Temporal Modeling: Dynamic Traffic Flows and High Frequency Trading.” Applied Stochastic Models in Business and Industry 35 (3): 788–807.
Dreyfus, Stuart. 1962. “The Numerical Solution of Variational Problems.” Journal of Mathematical Analysis and Applications 5 (1): 30–45.
———. 1973. “The Computational Solution of Optimal Control Problems with Time Lag.” IEEE Transactions on Automatic Control 18 (4): 383–85.
Duchi, John, Elad Hazan, and Yoram Singer. 2011. “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.” Journal of Machine Learning Research 12 (61): 2121–59.
Duflo, Esther, Michael Greenstone, Rohini Pande, and Nicholas Ryan. 2013. “Truth-Telling by Third-party Auditors and the Response of Polluting Firms: Experimental Evidence from India*.” The Quarterly Journal of Economics 128 (4): 1499–1545.
Edwards, Ward, Harold Lindman, and Leonard J. Savage. 1963. “Bayesian Statistical Inference for Psychological Research.” Psychological Review 70 (3): 193–242.
Efron, Bradley, and Carl Morris. 1975. “Data Analysis Using Stein’s Estimator and Its Generalizations.” Journal of the American Statistical Association 70 (350): 311–19.
———. 1977. “Stein’s Paradox in Statistics.” Scientific American 236 (5): 119–27.
Enikolopov, Ruben, Vasily Korovkin, Maria Petrova, Konstantin Sonin, and Alexei Zakharov. 2013. “Field Experiment Estimate of Electoral Fraud in Russian Parliamentary Elections.” Proceedings of the National Academy of Sciences 110 (2): 448–52.
Fawcett, Tom. 2006. “An Introduction to ROC Analysis.” Pattern Recognition Letters 27 (8): 861–74.
Fedus, William, Barret Zoph, and Noam Shazeer. 2022. “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.” Journal of Machine Learning Research 23 (120): 1–39.
Fefferman, Charles L. 2006. “Existence and Smoothness of the NavierStokes Equation.” The Millennium Prize Problems, 57–67.
Feller, William. 1971. An Introduction to Probability Theory and Its Applications. Wiley.
Feng, Guanhao, Nicholas G. Polson, and Jianeng Xu. 2016. “The Market for English Premier League (EPL) Odds.” Journal of Quantitative Analysis in Sports 12 (4). https://arxiv.org/abs/1604.03614.
Fredholm, Ivar. 1903. “Sur Une Classe d’équations Fonctionnelles.” Acta Mathematica 27 (none): 365–90.
Friedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 1189–1232.
Friedman, Jerome H., and Werner Stuetzle. 1981. “Projection Pursuit Regression.” Journal of the American Statistical Association 76 (376): 817–23.
Frühwirth-Schnatter, Sylvia, and Rudolf Frühwirth. 2007. “Auxiliary Mixture Sampling with Applications to Logistic Models.” Computational Statistics & Data Analysis 51 (April): 3509–28.
———. 2010. “Data Augmentation and MCMC for Binary and Multinomial Logit Models.” In Statistical Modelling and Regression Structures: Festschrift in Honour of Ludwig Fahrmeir, 111–32.
Frühwirth-Schnatter, Sylvia, Rudolf Frühwirth, Leonhard Held, and Håvard Rue. 2008. “Improved Auxiliary Mixture Sampling for Hierarchical Models of Non-Gaussian Data.” Statistics and Computing 19 (4): 479.
Frühwirth-Schnatter, Sylvia, and Helga Wagner. 2010. “Stochastic Model Specification Search for Gaussian and Partial Non-Gaussian State Space Models.” Journal of Econometrics 154: 85–100.
Fukushima, Kunihiko. 1980. “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position.” Biological Cybernetics 36 (4): 193–202.
Gal, Yarin, and Zoubin Ghahramani. 2016. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In International Conference on Machine Learning, 1050–59.
Galton, Francis. 1907. “Vox Populi.” Nature 75: 450–51.
Galushkin, A. I. 1973. “Synthesis of Multilayer Systems of Pattern Recognition.” Neurocomputers and Their Application.
Gan, Link, and Alan Fritzler. 2016. “How to Become an Executive.”
Gao, Luyu, Xueguang Ma, Jimmy Lin, and Jamie Callan. 2022. “Precise Zero-Shot Dense Retrieval Without Relevance Labels.” arXiv Preprint arXiv:2212.10496. https://arxiv.org/abs/2212.10496.
García-Arenzana, Nicolás, Eva María Navarrete-Muñoz, Virginia Lope, Pilar Moreo, Carmen Vidal, Soledad Laso-Pablos, Nieves Ascunce, et al. 2014. Calorie Intake, Olive Oil Consumption and Mammographic Density Among Spanish Women.” International Journal of Cancer 134 (8): 1916–25.
Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis. 3rd ed. Boca Raton: Chapman and Hall/CRC.
Gleick, James. 1992. Genius: The Life and Science of Richard Feynman. New York: Pantheon Books.
Gneiting, Tilmann, and Adrian E Raftery. 2007. “Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association 102 (477): 359–78.
Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65.
Gramacy, Robert B., and Nicholas G. Polson. 2012. “Simulation-Based Regularized Logistic Regression.” arXiv. https://arxiv.org/abs/1005.3430.
Griewank, Andreas, Kshitij Kulshreshtha, and Andrea Walther. 2012. “On the Numerical Stability of Algorithmic Differentiation.” Computing. Archives for Scientific Computing 94 (2-4): 125–49.
Guan, Xinyu, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, and Mao Yang. 2025. rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking.” arXiv. https://arxiv.org/abs/2501.04519.
Gupte, Mihir, Eshan Dixit, Muhammad Tayyab, and Arun Adiththan. 2025. “What Works for’lost-in-the-Middle’in Llms? A Study on GM-extract and Mitigations.” arXiv Preprint arXiv:2511.13900. https://arxiv.org/abs/2511.13900.
Hahn, P. Richard, Jared S. Murray, and Carlos M. Carvalho. 2020. “Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion).” Bayesian Analysis 15 (3): 965–1056.
Halevy, Alon, Peter Norvig, and Fernando Pereira. 2009. “The Unreasonable Effectiveness of Data.” IEEE Intelligent Systems 24 (2): 8–12.
Hardt, Moritz, Ben Recht, and Yoram Singer. 2016. “Train Faster, Generalize Better: Stability of Stochastic Gradient Descent.” In International Conference on Machine Learning, 1225–34. PMLR.
Hastie, Trevor, Andrea Montanari, Saharon Rosset, and Ryan J. Tibshirani. 2022. “Surprises in High-Dimensional Ridgeless Least Squares Interpolation.” The Annals of Statistics 50 (2): 949–86.
Heath, David, and William Sudderth. 1976. “De Finetti’s Theorem on Exchangeable Variables.” The American Statistician 30 (4): 188–89.
Heaton, J. B., and N. G. Polson. 2012. “Smart Money, Dumb Money: Learning Type from Price.” Working Paper.
Heaton, J. B., N. G. Polson, and Jan Hendrik Witte. 2016. “Deep Learning for Finance: Deep Portfolios.” Applied Stochastic Models in Business and Industry.
Held, Leonhard, and Chris C. Holmes. 2006. “Bayesian Auxiliary Variable Models for Binary and Multinomial Regression.” Bayesian Analysis 1 (1): 145–68.
Hilgers, Ralf-Dieter, Kit Roes, and Nigel Stallard. 2016. “Efficient Ways Exist to Obtain the Optimal Sample Size in Clinical Trials in Rare Diseases.” Journal of Clinical Epidemiology 80: 68–76.
Hinton, Geoffrey, Nitish Srivastava, and Kevin Swersky. 2012. “Neural Networks for Machine Learning, Lecture 6a: Overview of Mini-Batch Gradient Descent.” Coursera Lecture.
Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. 2015. “Distilling the Knowledge in a Neural Network.” arXiv Preprint arXiv:1503.02531. https://arxiv.org/abs/1503.02531.
Hou, Zhen, Hao Liu, Jiang Bian, Xing He, and Yan Zhuang. 2025. “Enhancing Medical Coding Efficiency Through Domain-Specific Fine-Tuned Large Language Models.” Npj Health Systems 2 (1): 14.
Hsieh, Cheng-Yu, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, and Tomas Pfister. 2023. “Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes.” arXiv Preprint arXiv:2305.02301. https://arxiv.org/abs/2305.02301.
Huang, Gao, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E. Hopcroft, and Kilian Q. Weinberger. 2017. “Snapshot Ensembles: Train 1, Get M for Free.” In International Conference on Learning Representations.
Huber, Peter J. 1985. “Projection Pursuit.” The Annals of Statistics 13 (2): 435–75.
Hyndman, Rob J., and George Athanasopoulos. 2021. Forecasting: Principles and Practice. 3rd ed. edition. Melbourne, Australia: Otexts.
Igelnik, B., and N. Parikh. 2003. “Kolmogorov’s Spline Network.” IEEE Transactions on Neural Networks 14 (4): 725–33.
Immer, Alexander, Matthias Bauer, Vincent Fortuin, Gunnar Rätsch, and Khan Mohammad Emtiyaz. 2021. “Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning.” In International Conference on Machine Learning, 4563–73. PMLR.
Irwin, Neil. 2016. “How to Become a C.E.O.? The Quickest Path Is a Winding One.” The New York Times, September.
Jacobs, Robert A., Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. “Adaptive Mixtures of Local Experts.” Neural Computation 3 (1): 79–87.
Jacquier, Eric, and Nicholas G. Polson. 2013. “Asset Allocation in Finance: A Bayesian Perspective.” In Bayesian Theory and Applications, edited by Paul Damien, Petros Dellaportas, Nicholas G. Polson, and David A. Stephens, 501–15. Oxford University Press.
Januschowski, Tim, Yuyang Wang, Kari Torkkola, Timo Erkkilä, Hilaf Hasson, and Jan Gasthaus. 2022. “Forecasting with Trees.” International Journal of Forecasting, Special Issue: M5 competition, 38 (4): 1473–81.
Jeffreys, Harold. 1998. Theory of Probability. Third Edition, Third Edition. Oxford Classic Texts in the Physical Sciences. Oxford, New York: Oxford University Press.
Jiang, Huiqiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. 2023. LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models.” arXiv Preprint arXiv:2310.05736. https://arxiv.org/abs/2310.05736.
Jiang, Wenxin, and Martin A. Tanner. 1999a. “Hierarchical Mixtures-of-Experts for Generalized Linear Models: Some Results on Denseness and Consistency.” In Proceedings of the Sixteenth International Conference on Machine Learning, 214–22. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
———. 1999b. “On the Identifiability of Mixtures-of-Experts.” Neural Networks 12 (9): 1253–58.
Jiménez-Luna, José, Francesca Grisoni, Nils Weskamp, and Gisbert Schneider. 2020. DrugEx V2: De Novo Design of Drug Molecule by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology.” Journal of Cheminformatics 12 (1): 1–12.
Johannes, Michael S., Nick Polson, and Seung M. Yae. 2009. “Quantile Filtering and Learning.” SSRN Electronic Journal.
Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, et al. 2021. “Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89.
Kaczmarz, Stefan. 1937. “Angenäherte Auflösung von Systemen Linearer Gleichungen.” Bulletin International de l’Académie Polonaise Des Sciences Et Des Lettres 35: 355–57.
kaggle. 2020. “M5 Forecasting - Accuracy.”
Kallenberg, Olav. 1997. Foundations of Modern Probability. 2nd ed. edition. Springer.
Kalman, R. E., and R. S. Bucy. 1961. “New Results in Linear Filtering and Prediction Theory.” Journal of Basic Engineering 83 (1): 95–108.
Kalman, Rudolph Emil. 1960. “A New Approach to Linear Filtering and Prediction Problems.” Transactions of the ASME–Journal of Basic Engineering 82 (Series D): 35–45.
Kelly, J. L. 1956. “A New Interpretation of Information Rate.” Bell System Technical Journal 35 (4): 917–26.
Keskar, Nitish Shirish, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima.” arXiv Preprint arXiv:1609.04836. https://arxiv.org/abs/1609.04836.
Keynes, John Maynard. 1930. “Economic Possibilities for Our Grandchildren.” In Essays in Persuasion, 358–73. W. W. Norton & Company.
Khattab, Omar, and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT.” In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 39–48. ACM.
Kim, Young-Hoon, Jaehyung Shim, Hyoung-Seob Park, et al. 2024. Diagnostic Accuracy of Single-Lead Handheld ECG Devices for Atrial Fibrillation Detection.” Journal of Cardiovascular Electrophysiology 35: 614–21.
Kingma, Diederik, and Jimmy Ba. 2014. “Adam: A Method for Stochastic Optimization.” arXiv Preprint arXiv:1412.6980. https://arxiv.org/abs/1412.6980.
Klartag, Bo’az. 2007. “A Central Limit Theorem for Convex Sets.” Inventiones Mathematicae 168 (1): 91–131.
Koenker, Roger. 2005. Quantile Regression. Econometric Society Monographs. Cambridge: Cambridge University Press.
Kolmogoroff, Andrei. 1931. “Über Die Analytischen Methoden in Der Wahrscheinlichkeitsrechnung.” Mathematische Annalen 104 (1): 415–58.
———. 1933. Grundbegriffe Der Wahrscheinlichkeitsrechnung. Vol. 2. Ergebnisse Der Mathematik Und Ihrer Grenzgebiete. Berlin: Springer.
Kolmogorov, AN. 1956. “On the Representation of Continuous Functions of Several Variables as Superpositions of Functions of Smaller Number of Variables.” In Soviet. Math. Dokl, 108:179–82.
Kolmogorov, Andrey N. 1933. Grundbegriffe Der Wahrscheinlichkeitsrechnung. Berlin: Springer.
Köppen, Mario. 2000. “The Curse of Dimensionality.” 5th Online World Conference on Soft Computing in Industrial Applications (WSC5) 1: 4–8.
LeCun, Yann, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. 2002. “Efficient Backprop.” In Neural Networks: Tricks of the Trade, 9–50. Springer.
Leviathan, Yaniv, Matan Kalman, and Yossi Matias. 2023. “Fast Inference from Transformers via Predictive Sampling.” arXiv Preprint arXiv:2211.17191. https://arxiv.org/abs/2211.17191.
Levina, Elizaveta, and Peter Bickel. 2001. “The Earth Mover’s Distance Is the Mallows Distance: Some Insights from Statistics.” In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, 2:251–56. IEEE.
Lim, Bryan, Sercan Ö Arık, Nicolas Loeff, and Tomas Pfister. 2021. “Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting.” International Journal of Forecasting 37 (4): 1748–64.
Lin, Zhouhan, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. “A Structured Self-attentive Sentence Embedding.” arXiv. https://arxiv.org/abs/1703.03130.
Lindgren, Georg. 1978. “Markov Regime Models for Mixed Distributions and Switching Regressions.” Scandinavian Journal of Statistics 5 (2): 81–91.
Lindley, D. V. 1961. “The Use of Prior Probability Distributions in Statistical Inference and Decisions.” In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, 4.1:453–69. University of California Press.
Linnainmaa, Seppo. 1970. “The Representation of the Cumulative Rounding Error of an Algorithm as a Taylor Expansion of the Local Rounding Errors.” Master’s Thesis (in Finnish), Univ. Helsinki, 6–7.
Liu, Hao, Matei Zaharia, and Pieter Abbeel. 2023. “Ring Attention with Blockwise Transformers for Near-Infinite Context.” arXiv Preprint arXiv:2310.01889. https://arxiv.org/abs/2310.01889.
Liu, Nelson F., Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. “Lost in the Middle: How Language Models Use Long Contexts.” Transactions of the Association for Computational Linguistics 12: 157–73.
Logunov, A. A. 2004. “Henri Poincare and Relativity Theory.” https://arxiv.org/abs/physics/0408077.
Lorentz, George G. 1976. “The 13th Problem of Hilbert.” In Proceedings of Symposia in Pure Mathematics, 28:419–30. American Mathematical Society.
Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 4765–74. Curran Associates, Inc.
Mackay, Charles. 1841. Extraordinary Popular Delusions and the Madness of Crowds. London: Richard Bentley.
MacKay, David JC. 1992. “Bayesian Interpolation.” Neural Computation 4 (3): 415–47.
Maharaj, Shiva, Nick Polson, and Vadim Sokolov. 2023. “Kramnik Vs Nakamura or Bayes Vs P-Value.” {{SSRN Scholarly Paper}}. Rochester, NY.
Markowitz, Harry. 2006. “De Finetti Scoops Markowitz.” Journal of Investment Management 4 (3): 5.
Mehrasa, Nazanin, Yatao Zhong, Frederick Tung, Luke Bornn, and Greg Mori. 2017. “Learning Person Trajectory Representations for Team Activity Analysis.” arXiv Preprint arXiv:1706.00893. https://arxiv.org/abs/1706.00893.
Metropolis, Nicholas. 1987. “The Beginning of the Monte Carlo Method.” Los Alamos Science 15: 125–30.
Metropolis, Nicholas, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953. “Equation of State Calculations by Fast Computing Machines.” The Journal of Chemical Physics 21 (6): 1087–92.
Metropolis, Nicholas, and Stanislaw Ulam. 1949. “The Monte Carlo Method.” Journal of the American Statistical Association 44 (247): 335–41.
Microsoft Research. 2024. GraphRAG: Unlocking LLM Discovery on Narrative Private Data.”
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” arXiv. https://arxiv.org/abs/1301.3781.
Milman, Vitali D, and Gideon Schechtman. 2009. Asymptotic Theory of Finite Dimensional Normed Spaces: Isoperimetric Inequalities in Riemannian Manifolds. Vol. 1200. Springer.
Minsky, Marvin, and Seymour Papert. 1969. Perceptrons: An Introduction to Computational Geometry. Cambridge, MA: MIT Press.
Miotto, Marilù, Nicola Rossberg, and Bennett Kleinberg. 2022. “Who Is GPT-3? An Exploration of Personality, Values and Demographics.” arXiv Preprint arXiv:2209.14338. https://arxiv.org/abs/2209.14338.
Mnih, Volodymyr, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. “Recurrent Models of Visual Attention.” Advances in Neural Information Processing Systems 27: 2204–12.
Morris, Stephen. 1994. “Trade with Heterogeneous Prior Beliefs and No-Trade Theorems.” Econometrica : Journal of the Econometric Society 62 (6): 1327–47.
———. 1996. “Speculative Trade with Rational Beliefs.” Journal of Economic Theory 70 (2): 445–72.
Nadaraya, E. A. 1964. “On Estimating Regression.” Theory of Probability & Its Applications 9 (1): 141–42.
Nakkiran, Preetum, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever. 2021. “Deep Double Descent: Where Bigger Models and More Data Hurt*.” Journal of Statistical Mechanics: Theory and Experiment 2021 (12): 124003.
Nareklishvili, Maria, Nicholas Polson, and Vadim Sokolov. 2022. “Deep Partial Least Squares for Iv Regression.” arXiv Preprint arXiv:2207.02612. https://arxiv.org/abs/2207.02612.
———. 2023a. “Generative Causal Inference,” June. https://arxiv.org/abs/2306.16096.
———. 2023b. “Feature Selection for Personalized Policy Analysis,” July. https://arxiv.org/abs/2301.00251.
Nelder, J. A., and R. W. M. Wedderburn. 1972. “Generalized Linear Models.” Royal Statistical Society. Journal. Series A: General 135 (3): 370–84.
Nesterov, Yurii. 1983. “A Method of Solving a Convex Programming Problem with Convergence Rate O (1/K2).” In Soviet Mathematics Doklady, 27:372–76.
Nicosia, Luca, Giulia Gnocchi, Ilaria Gorini, Massimo Venturini, Federico Fontana, Filippo Pesapane, Ida Abiuso, et al. 2023. “History of Mammography: Analysis of Breast Imaging Diagnostic Achievements over the Last Century.” Healthcare 11 (1596).
Novick, Melvin R., and James E. Grizzle. 1965. “A Bayesian Approach to the Analysis of Data from Clinical Trials.” Journal of the American Statistical Association 60 (309): 81–96.
Ostrovskii, GM, Yu M Volin, and WW Borisov. 1971. “Uber Die Berechnung von Ableitungen.” Wissenschaftliche Zeitschrift Der Technischen Hochschule f Ur Chemie, Leuna-Merseburg 13 (4): 382–84.
Ouyang, Long, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, et al. 2022. “Training Language Models to Follow Instructions with Human Feedback.” Advances in Neural Information Processing Systems 35: 27730–44.
Packer, Charles, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, and Joseph E. Gonzalez. 2023. MemGPT: Towards LLMs as Operating Systems.” arXiv Preprint arXiv:2310.08560. https://arxiv.org/abs/2310.08560.
Pan, Zhenyu, Haozheng Luo, Manling Li, and Han Liu. 2025. “Chain-of-Action: Faithful and Multimodal Question Answering Through Large Language Models.” arXiv. https://arxiv.org/abs/2403.17359.
Park, Joon Sung, Lindsay Popowski, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2024. “Generative Agent Simulations of 1,000 People.” arXiv Preprint arXiv:2411.10109. https://arxiv.org/abs/2411.10109.
Parzen, Emanuel. 2004. “Quantile Probability and Statistical Data Modeling.” Statistical Science 19 (4): 652–62.
Petris, Giovanni. 2010. “An R Package for Dynamic Linear Models.” Journal of Statistical Software 36 (October): 1–16.
Poincaré, Henri. 1898. “La Mesure Du Temps.” Revue de Métaphysique Et de Morale 6 (1): 1–13.
———. 1952. Science and Hypothesis. New York]: Dover Publications.
Polson, Nicholas. 1996. “Convergence of Markov Chain Monte Carlo Algorithms (with Discussion).” Bayesian Statistics 5: 297–321.
Polson, Nicholas G., and James G. Scott. 2012. “Good, Great, or Lucky? Screening for Firms with Sustained Superior Performance Using Heavy-Tailed Priors.” The Annals of Applied Statistics 6 (1): 161–85.
———. 2016. “Mixtures, Envelopes and Hierarchical Duality.” Journal of the Royal Statistical Society Series B: Statistical Methodology 78 (4): 701–27.
Polson, Nicholas G., James G. Scott, and Jesse Windle. 2013. “Bayesian Inference for Logistic Models Using PólyaGamma Latent Variables.” Journal of the American Statistical Association 108 (504): 1339–49.
Polson, Nicholas G., and Steven L. Scott. 2011. “Data Augmentation for Support Vector Machines.” Bayesian Analysis 6 (1): 1–23.
Polson, Nicholas G, and James Scott. 2018. AIQ: How People and Machines Are Smarter Together. St. Martin’s Press.
Polson, Nicholas G., and Vadim Sokolov. 2023. “Generative AI for Bayesian Computation.” https://arxiv.org/abs/2305.14972.
Polson, Nicholas G, Vadim Sokolov, et al. 2017. “Deep Learning: A Bayesian Perspective.” Bayesian Analysis 12 (4): 1275–1304.
Polson, Nicholas, and Vadim Sokolov. 2020. “Deep Learning: Computational Aspects.” Wiley Interdisciplinary Reviews: Computational Statistics 12 (5): e1500.
Polson, Nicholas, Vadim Sokolov, and Jianeng Xu. 2021. “Deep Learning Partial Least Squares.” arXiv Preprint arXiv:2106.14085. https://arxiv.org/abs/2106.14085.
Polson, Nick, Fabrizio Ruggeri, and Vadim Sokolov. 2024. “Generative Bayesian Computation for Maximum Expected Utility.” Entropy. An International and Interdisciplinary Journal of Entropy and Information Studies 26 (12): 1076.
Polson, Nick, and Vadim Sokolov. 2025. “Negative Probability.” Applied Stochastic Models in Business and Industry 41 (1): e2910.
Polson, Nick, Vadim Sokolov, and Jianeng Xu. 2023. “Quantum Bayesian Computation.” Applied Stochastic Models in Business and Industry 39 (6): 869–83.
Polson, Nick, and Jan Hendrik Witte. 2014. “A Bellman View of Jesse Livermore.” arXiv. https://arxiv.org/abs/1407.2642.
Qian, Chen, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, et al. 2024. ChatDev: Communicative Agents for Software Development.” arXiv. https://arxiv.org/abs/2307.07924.
Radford, Alec, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. “Improving Language Understanding by Generative Pre-Training.” OpenAI.
Rafailov, Rafael, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. 2023. “Direct Preference Optimization: Your Language Model Is Secretly a Reward Model.” arXiv. https://arxiv.org/abs/2305.18290.
Raffel, Colin, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.” Journal of Machine Learning Research 21 (140): 1–67.
Ramsey, Frank P. 1926. “Truth and Probability.” Histoy of {{Economic Thought Chapters}}. McMaster University Archive for the History of Economic Thought.
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “"Why Should I Trust You?": Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44. ACM.
Riquelme, Carlos, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, and Neil Houlsby. 2021. “Scaling Vision with Sparse Mixture of Experts.” In Advances in Neural Information Processing Systems, 34:8583–95.
Ritter, Hippolyt, Aleksandar Botev, and David Barber. 2018. “A Scalable Laplace Approximation For Neural Networks.”
Robbins, Herbert, and Sutton Monro. 1951. “A Stochastic Approximation Method.” The Annals of Mathematical Statistics 22 (3): 400–407.
Roberts, Gareth O., and Jeffrey S. Rosenthal. 2001. “Optimal Scaling for Various Metropolis-Hastings Algorithms.” Statistical Science 16 (4): 351–67.
Rosenblatt, F. 1958. “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain.” Psychological Review 65 (6): 386–408.
Rubin, Hal S. Stern, John B. Carlin. 2015. Bayesian Data Analysis. 3rd ed. New York: Chapman and Hall/CRC.
Rumelhart, David E, Geoffrey E Hinton, and Ronald J Williams. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323 (6088): 533.
Salinas, David, Valentin Flunkert, and Jan Gasthaus. 2019. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.” arXiv:1704.04110 [Cs, Stat], February. https://arxiv.org/abs/1704.04110.
Sanh, Victor, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter.” arXiv Preprint arXiv:1910.01108. https://arxiv.org/abs/1910.01108.
Sarthi, Parth, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Manning. 2024. RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval.” arXiv Preprint arXiv:2401.18059. https://arxiv.org/abs/2401.18059.
Schmidhuber, Jürgen. 2015. “Deep Learning in Neural Networks: An Overview.” Neural Networks 61: 85–117.
Schmidt-Hieber, Johannes. 2021. “The KolmogorovArnold Representation Theorem Revisited.” Neural Networks 137 (May): 119–26.
Schwertman, Neil C, AJ Gilks, and J Cameron. 1990. “A Simple Noncalculus Proof That the Median Minimizes the Sum of the Absolute Deviations.” The American Statistician 44 (1): 38–39.
Scott, Steven L. 2002. “Bayesian Methods for Hidden Markov Models.” Journal of the American Statistical Association 97 (457): 337–51.
———. 2015. “Multi-Armed Bandit Experiments in the Online Service Economy.” Applied Stochastic Models in Business and Industry 31 (1): 37–45.
Scott, Steven L. 2013. “Multi-Armed Bandit Experiments.”
———. 2022. BoomSpikeSlab: MCMC for Spike and Slab Regression.”
Scott, Steven L., and Hal R. Varian. 2015. “Bayesian Variable Selection for Nowcasting Economic Time Series.” In Economic Analysis of the Digital Economy, 119–35. University of Chicago Press.
Scott, Steven, and Hal Varian. 2014. “Predicting the Present with Bayesian Structural Time Series.” Int. J. Of Mathematical Modelling and Numerical Optimisation 5 (January): 4–23.
Seidenfeld, Tedd. 1984. “Comments on Causal Decision Theory.” PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1984 (2): 201–12.
Sellke, Thomas, M. J Bayarri, and James O Berger. 2001. “Calibration of ρ Values for Testing Precise Null Hypotheses.” The American Statistician 55 (1): 62–71.
Selvaraju, Ramprasaath R., Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.” In Proceedings of the IEEE International Conference on Computer Vision, 618–26. IEEE.
Shapiro, Sam. 1988. “Selection, Follow-up, and Analysis in the Health Insurance Plan Study: A Randomized Trial with Breast Cancer Screening.” Journal of the National Cancer Institute 80 (14): 1125–32.
Shazeer, Noam, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.” In International Conference on Learning Representations.
Shimony, Abner. 1955. “Coherence and the Axioms of Confirmation.” The Journal of Symbolic Logic 20 (1): 1–28.
Shinn, Noah, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. “Reflexion: Language Agents with Verbal Reinforcement Learning.” https://arxiv.org/abs/2303.11366.
Shiryayev, A. N. 1992. “On Analytical Methods in Probability Theory.” In Selected Works of a. N. Kolmogorov: Volume II Probability Theory and Mathematical Statistics, edited by A. N. Shiryayev, 62–108. Dordrecht: Springer Netherlands.
Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman. 2013. “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps.” arXiv Preprint arXiv:1312.6034. https://arxiv.org/abs/1312.6034.
Simpson, Edward. 2010. “Edward Simpson: Bayes at Bletchley Park.” Significance 7 (2): 76–80.
Singh, Pratyush Kumar, Kathryn A. Farrell-Maupin, and Danial Faghihi. 2024. “A Framework for Strategic Discovery of Credible Neural Network Surrogate Models Under Uncertainty.” arXiv. https://arxiv.org/abs/2403.08901.
Smith, A. F. M. 1975. “A Bayesian Approach to Inference about a Change-Point in a Sequence of Random Variables.” Biometrika 62 (2): 407–16.
Sokolov, Vadim. 2017. “Discussion of Deep Learning for Finance: Deep Portfolios’.” Applied Stochastic Models in Business and Industry 33 (1): 16–18.
Spiegelhalter, David, and Yin-Lam Ng. 2009. “One Match to Go!” Significance 6 (4): 151–53.
Sprecher, David A. 1965. “On the Structure of Continuous Functions of Several Variables.” Transactions of the American Mathematical Society 115: 340–55.
Srivastava, Nitish, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” Journal of Machine Learning Research 15 (1): 1929–58.
Stern, Hal S. 1994. “A Brownian Motion Model for the Progress of Sports Scores.” Journal of the American Statistical Association 89 (427): 1128–34.
Stern, H, Adam Sugano, J Albert, and R Koning. 2007. “Inference about Batter-Pitcher Matchups in Baseball from Small Samples.” Statistical Thinking in Sports, 153–65.
Stigler, Stephen M. 1981. “Gauss and the Invention of Least Squares.” The Annals of Statistics, 465–74.
Stroud, Jonathan R., Peter Müller, and Nicholas G. Polson. 2003. “Nonlinear State-Space Models with State-Dependent Variances.” Journal of the American Statistical Association 98 (462): 377–86.
Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. 2017. “Axiomatic Attribution for Deep Networks.” In Proceedings of the 34th International Conference on Machine Learning, 3319–28. PMLR.
Sutskever, Ilya, James Martens, George Dahl, and Geoffrey Hinton. 2013. “On the Importance of Initialization and Momentum in Deep Learning.” In International Conference on Machine Learning, 1139–47.
Tabar, Laszlo, CJ Gad Fagerberg, Anders Gad, Lennart Baldetorp, Lars H Holmberg, Ove Gröntoft, Ulf Ljungquist, et al. 1985. “Reduction in Mortality from Breast Cancer After Mass Screening with Mammography: Randomised Trial from the Breast Cancer Screening Working Group of the Swedish National Board of Health and Welfare.” The Lancet 325 (8433): 829–32.
Taleb, Nassim Nicholas. 2007. The Black Swan: The Impact of the Highly Improbable. Annotated edition. New York. N.Y: Random House.
Thompson, William R. 1933. “On the Likelihood That One Unknown Probability Exceeds Another in View of the Evidence of Two Samples.” Biometrika 25 (3/4): 285–94.
Tiao, Louis. 2019. “Pólya-Gamma Bayesian Logistic Regression.” Blog post.
Tikhonov, Andrei N. 1963. “Solution of Incorrectly Formulated Problems and the Regularization Method.” Sov Dok 4: 1035–38.
Tikhonov, Andrey Nikolayevich et al. 1943. “On the Stability of Inverse Problems.” In Dokl. Akad. Nauk Sssr, 39:195–98.
Tsai, Yao-Hung Hubert, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. “Transformer Dissection: A Unified Understanding of Transformer’s Attention via the Lens of Kernel.” arXiv. https://arxiv.org/abs/1908.11775.
Turing, A. M. 1950. “Computing Machinery and Intelligence.” Mind; a Quarterly Review of Psychology and Philosophy 59 (236): 433–60.
U.S. Food and Drug Administration. 2010. “Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials.”
Varian, Hal R. 2010. “Computer Mediated Transactions.” American Economic Review 100 (2): 1–10.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. “Attention Is All You Need.” arXiv. https://arxiv.org/abs/1706.03762.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30: 5998–6008.
Veblen, Thorstein. 1899. The Theory of the Leisure Class: An Economic Study of Institutions. New York: Macmillan.
———. 1921. The Engineers and the Price System. New York: B. W. Huebsch.
Vecer, Jan, Frantisek Kopriva, and Tomoyuki Ichiba. 2009. “Estimating the Effect of the Red Card in Soccer: When to Commit an Offense in Exchange for Preventing a Goal Opportunity.” Journal of Quantitative Analysis in Sports 5 (1).
Villar, Sofía S., Jack Bowden, and James Wason. 2015. “Multi-Armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges.” Statistical Science 30 (2): 199–215.
Ville, Jean. 1939. “Étude Critique de La Notion de Collectif.” Thèses de l’entre-Deux-Guerres. PhD thesis, Université de Paris.
Viterbi, A. 1967. “Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm.” IEEE Transactions on Information Theory 13 (2): 260–69.
Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harvard Journal of Law & Technology 31: 841–87.
Wald, Abraham. 1945. “Sequential Tests of Statistical Hypotheses.” The Annals of Mathematical Statistics 16 (2): 117–86.
———. 1947. Sequential Analysis. New York: John Wiley & Sons.
Wang, Shaun. 1996. “Premium Calculation by Transforming the Layer Premium Density.” ASTIN Bulletin 26 (1): 71–92.
Watanabe, Sumio. 2013. “A Widely Applicable Bayesian Information Criterion.” The Journal of Machine Learning Research 14 (1): 867–97.
Watson, Geoffrey S. 1964. “Smooth Regression Analysis.” Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) 26 (4): 359–72.
Wei, Bo, Thomas M. Braun, Roy N. Tamura, and Kelley Kidwell. 2018. “A Small n Sequential Multiple Assignment Randomized Trial Design for Use in Rare Disease Research.” Statistics in Medicine 37 (26): 3836–52.
Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” arXiv. https://arxiv.org/abs/2201.11903.
Werbos, Paul. 1974. “Beyond Regression:" New Tools for Prediction and Analysis in the Behavioral Sciences.” Ph. D. Dissertation, Harvard University.
Werbos, Paul J. 1982. “Applications of Advances in Nonlinear Sensitivity Analysis.” In System Modeling and Optimization, 762–70. Springer.
West, Mike, and Jeff Harrison. 1997. Bayesian Forecasting and Dynamic Models. Springer.
Wiener, Norbert. 1950. The Human Use of Human Beings: Cybernetics and Society. Boston: Houghton Mifflin.
Windle, Jesse, Nicholas G. Polson, and James G. Scott. 2014. “Sampling Polya-Gamma Random Variates: Alternate and Approximate Techniques.” arXiv. https://arxiv.org/abs/1405.0506.
Yaari, Menahem E. 1987. “The Dual Theory of Choice Under Risk.” Econometrica : Journal of the Econometric Society 55 (1): 95–115.
Yang, An, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, et al. 2025. “Qwen2. 5-1m Technical Report.” arXiv Preprint arXiv:2501.15383. https://arxiv.org/abs/2501.15383.
Yao, Shunyu, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. “Tree of Thoughts: Deliberate Problem Solving with Large Language Models.” arXiv. https://arxiv.org/abs/2305.10601.
Ye, Tong, Shijing Si, Jianzong Wang, Ning Cheng, Zhitao Li, and Jing Xiao. 2023. “On the Calibration and Uncertainty with Pólya-Gamma Augmentation for Dialog Retrieval Models.” In Proceedings of the AAAI Conference on Artificial Intelligence. https://arxiv.org/abs/2303.08606.
Ye, Yixin, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, and Pengfei Liu. 2025. LIMO: Less Is More for Reasoning.” arXiv. https://arxiv.org/abs/2502.03387.
Yu, Gyeong-In, Joo Seong Jeong, Geon-Woo Kim, Soo-Jin Jeong, Woosung Lee, and Byung-Gon Chun. 2022. “Orca: A Distributed Serving System for Transformer-based Generative Models.” In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), 527–46.
Zhang, Wei, Katsuyuki Itoh, Jun Tanida, and Yoshiki Ichioka. 1988. “Shift-Invariant Pattern Recognition Neural Network and Its Optical Architecture.” Proceedings of Annual Conference of the Japan Society of Applied Physics.
Zhang, Yichi, Anirban Datta, and Sudipto Banerjee. 2018. “Scalable Gaussian Process Classification with Pólya-Gamma Data Augmentation.” arXiv Preprint arXiv:1802.06383. https://arxiv.org/abs/1802.06383.