References
A. N. Kolmogorov. 1938. “On the Analytic Methods of Probability
Theory.” Rossíiskaya Akademiya Nauk, no. 5:
5–41.
Acemoglu, Daron, and Pascual Restrepo. 2018. “Artificial
Intelligence, Automation and Work.” National Bureau of Economic
Research.
Actor, Jonas. 2018. “Computation for the Kolmogorov
Superposition Theorem.” {{MS Thesis}}, Rice.
Albert, Jim. 1993. “A Statistical Analysis of
Hitting Streaks in Baseball:
Comment.” Journal of the American Statistical
Association 88 (424): 1184–88. https://www.jstor.org/stable/2291255.
Altić, Mirela Slukan. 2013. “Exploring Along the Rome Meridian:
Roger Boscovich and the First Modern Map of the Papal
States.” In History of Cartography:
International Symposium of the ICA, 2012,
71–89. Springer.
Amazon. 2021. “The History of Amazon’s Forecasting
Algorithm.” Amazon Science.
https://www.amazon.science/latest-news/the-history-of-amazons-forecasting-algorithm.
Amit, Yali, Gilles Blanchard, and Kenneth Wilder. 2000. “Multiple
Randomized Classifiers: MRCL.”
Andrews, D. F., and C. L. Mallows. 1974. “Scale
Mixtures of Normal Distributions.”
Journal of the Royal Statistical Society. Series B
(Methodological) 36 (1): 99–102. https://www.jstor.org/stable/2984774.
Arnol’d, Vladimir I. 2006. “Forgotten and Neglected Theories of
Poincaré.” Russian Mathematical
Surveys 61 (1): 1.
Ayala, Orlando, and Patrice Bechard. 2024. “Reducing Hallucination
in Structured Outputs via Retrieval-Augmented
Generation.” In Proceedings of the 2024
Conference of the North American Chapter of
the Association for Computational Linguistics:
Human Language Technologies (Volume 6:
Industry Track), 228–38. Mexico City, Mexico:
Association for Computational Linguistics.
Bach, Francis. 2024. “High-Dimensional Analysis of Double Descent
for Linear Regression with Random Projections.” SIAM Journal
on Mathematics of Data Science 6 (1): 26–50.
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2014. “Neural
Machine Translation by Jointly Learning to
Align and Translate.” arXiv. https://arxiv.org/abs/1409.0473.
Barron, Andrew R. 1993. “Universal Approximation Bounds for
Superpositions of a Sigmoidal Function.” IEEE Transactions on
Information Theory 39 (3): 930–45.
Baum, Leonard E., Ted Petrie, George Soules, and Norman Weiss. 1970.
“A Maximization Technique Occurring in the
Statistical Analysis of Probabilistic
Functions of Markov Chains.” The Annals
of Mathematical Statistics 41 (1): 164–71. https://www.jstor.org/stable/2239727.
Baylor, Denis, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo,
Zakaria Haque, Salem Haykal, et al. 2017. “Tfx: A
Tensorflow-Based Production-Scale Machine Learning Platform.” In
Proceedings of the 23rd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, 1387–95. ACM.
Behnia, Farnaz, Dominik Karbowski, and Vadim Sokolov. 2021. “Deep
Generative Models for Vehicle Speed Trajectories.” arXiv
Preprint arXiv:2112.08361. https://arxiv.org/abs/2112.08361.
Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019.
“Reconciling Modern Machine-Learning Practice and the Classical
Bias–Variance Trade-Off.” Proceedings of the National Academy
of Sciences 116 (32): 15849–54.
Benoit, Dries F., and Dirk Van den Poel. 2012. “Binary Quantile
Regression: A Bayesian Approach Based on the Asymmetric
Laplace Distribution.” Journal of Applied
Econometrics 27 (7): 1174–88.
Berge, Travis, Nitish Sinha, and Michael Smolyansky. 2016. “Which
Market Indicators Best Forecast Recessions?”
FEDS Notes, August.
Bhadra, Anindya, Jyotishka Datta, Nick Polson, Vadim Sokolov, and
Jianeng Xu. 2021. “Merging Two Cultures: Deep and Statistical
Learning.” arXiv Preprint arXiv:2110.11561. https://arxiv.org/abs/2110.11561.
Bird, Steven, Ewan Klein, and Edward Loper. 2009. Natural
Language Processing with Python:
Analyzing Text with the Natural Language
Toolkit. Beijing ; Cambridge Mass.: O’Reilly Media.
Bojarski, Mariusz, Davide Del Testa, Daniel Dworakowski, Bernhard
Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, et al. 2016.
“End to End Learning for Self-Driving Cars.” arXiv
Preprint arXiv:1604.07316. https://arxiv.org/abs/1604.07316.
Bonfiglio, Rita, Annarita Granaglia, Raffaella Giocondo, Manuel Scimeca,
and Elena Bonanno. 2021. “Molecular Aspects and Prognostic
Significance of Microcalcifications in Human Pathology: A
Narrative Review.” International Journal of Molecular
Sciences 22 (120).
Bottou, Léon, Frank E Curtis, and Jorge Nocedal. 2018.
“Optimization Methods for Large-Scale Machine Learning.”
SIAM Review 60 (2): 223–311.
Brillinger, David R. 2012. “A Generalized Linear Model
With ‘Gaussian’ Regressor
Variables.” In Selected Works of
David Brillinger, edited by Peter Guttorp and David
Brillinger, 589–606. Selected Works in
Probability and Statistics. New York, NY:
Springer.
Bryson, Arthur E. 1961. “A Gradient Method for Optimizing
Multi-Stage Allocation Processes.” In Proc. Harvard
Univ. Symposium on Digital Computers and Their
Applications. Vol. 72.
Campagnoli, Patrizia, Sonia Petrone, and Giovanni Petris. 2009.
Dynamic Linear Models with R. New
York, NY: Springer.
Candes, Emmanuel J, and Michael B Wakin. 2008. “An
Introduction To Compressive Sampling. A
Sensing/Sampling Paradigm That Goes Against the Common Knowledge in Data
Aquisition.” IEEE Signal Processing Magazine 25 (21).
Cannon, Alex J. 2018. “Non-Crossing Nonlinear Regression Quantiles
by Monotone Composite Quantile Regression Neural Network, with
Application to Rainfall Extremes.” Stochastic Environmental
Research and Risk Assessment 32 (11): 3207–25.
Carlin, Bradley P, Nicholas G Polson, and David S Stoffer. 1992.
“A Monte Carlo Approach to Nonnormal and Nonlinear
State-Space Modeling.” Journal of the American Statistical
Association 87 (418): 493–500.
Carreira-Perpinán, Miguel A, and Weiran Wang. 2014. “Distributed
Optimization of Deeply Nested Systems.” In
AISTATS, 10–19.
Carter, Chris K, and Robert Kohn. 1994. “On Gibbs
Sampling for State Space Models.” Biometrika 81 (3):
541–53.
Carvalho, Carlos M, Hedibert F Lopes, Nicholas G Polson, and Matt A
Taddy. 2010. “Particle Learning for General Mixtures.”
Bayesian Analysis 5 (4): 709–40.
Carvalho, Carlos M., Nicholas G. Polson, and James G. Scott. 2010.
“The Horseshoe Estimator for Sparse Signals.”
Biometrika, asq017.
Chernozhukov, Victor, Iván Fernández-Val, and Alfred Galichon. 2010.
“Quantile and Probability Curves Without
Crossing.” Econometrica 78 (3): 1093–1125. https://www.jstor.org/stable/40664520.
Chib, Siddhartha. 1998. “Estimation and Comparison of Multiple
Change-Point Models.” Journal of Econometrics 86 (2):
221–41.
Chung, Hyung Won, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William
Fedus, Yunxuan Li, et al. 2022. “Scaling
Instruction-Finetuned Language Models.” arXiv. https://arxiv.org/abs/2210.11416.
Cook, R. Dennis. 2007. “Fisher Lecture: Dimension
Reduction in Regression.” Statistical Science, 1–26. https://www.jstor.org/stable/27645799.
Cootner, Paul H. 1967. The Random Character of Stock Market
Prices. MIT press.
Coppejans, Mark. 2004. “On Kolmogorov’s
Representation of Functions of Several Variables by Functions of One
Variable.” Journal of Econometrics 123 (1): 1–31.
Cover, T., and P. Hart. 1967. “Nearest Neighbor Pattern
Classification.” IEEE Transactions on Information Theory
13 (1): 21–27.
Dabney, Will, Georg Ostrovski, David Silver, and Rémi Munos. 2018.
“Implicit Quantile Networks for Distributional
Reinforcement Learning.” arXiv. https://arxiv.org/abs/1806.06923.
Dabney, Will, Mark Rowland, Marc G. Bellemare, and Rémi Munos. 2017.
“Distributional Reinforcement Learning with
Quantile Regression.” arXiv. https://arxiv.org/abs/1710.10044.
Davison, Anthony Christopher. 2003. Statistical Models. Vol.
11. Cambridge university press.
Dean, Jeffrey, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark
Mao, Andrew Senior, et al. 2012. “Large Scale Distributed Deep
Networks.” In Advances in Neural Information Processing
Systems, 1223–31.
DeGroot, Morris H. 2005. Optimal Statistical Decisions. Wiley
classics library ed. Wiley Classics Library. Hoboken, NJ:
Wiley-Interscience.
Demb, Robert, and David Sprecher. 2021. “A Note on Computing with
Kolmogorov Superpositions Without Iterations.”
Neural Networks 144 (December): 438–42.
Devroye, Luc. 1986. Non-Uniform Random Variate Generation.
Springer Science & Business Media.
Diaconis, Persi, and Frederick and Mosteller. 1989. “Methods for
Studying Coincidences.” Journal of the American
Statistical Association 84 (408): 853–61.
Diaconis, Persi, and David Freedman. 1987. “A Dozen de Finetti-style Results in Search of a
Theory.” In Annales de l’IHP
Probabilités Et Statistiques, 23:397–423.
Diaconis, Persi, and Mehrdad Shahshahani. 1981. “Generating a
Random Permutation with Random Transpositions.” Probability
Theory and Related Fields 57 (2): 159–79.
———. 1984. “On Nonlinear Functions of Linear Combinations.”
SIAM Journal on Scientific and Statistical Computing 5 (1):
175–91.
Diaconis, P., and D. Ylvisaker. 1983. “Quantifying Prior
Opinion.”
Dixon, Mark J., and Stuart G. Coles. 1997. “Modelling
Association Football Scores and Inefficiencies
in the Football Betting Market.” Journal of the
Royal Statistical Society Series C: Applied Statistics 46 (2):
265–80.
Dixon, Matthew F, Nicholas G Polson, and Vadim O Sokolov. 2019.
“Deep Learning for Spatio-Temporal Modeling: Dynamic Traffic Flows
and High Frequency Trading.” Applied Stochastic Models in
Business and Industry 35 (3): 788–807.
Dreyfus, Stuart. 1962. “The Numerical Solution of Variational
Problems.” Journal of Mathematical Analysis and
Applications 5 (1): 30–45.
———. 1973. “The Computational Solution of Optimal Control Problems
with Time Lag.” IEEE Transactions on Automatic Control
18 (4): 383–85.
Efron, Bradley, and Carl Morris. 1975. “Data Analysis Using
Stein’s Estimator and Its
Generalizations.” Journal of the American
Statistical Association 70 (350): 311–19.
———. 1977. “Stein’s Paradox in Statistics.” Scientific
American 236 (5): 119–27.
Enikolopov, Ruben, Vasily Korovkin, Maria Petrova, Konstantin Sonin, and
Alexei Zakharov. 2013. “Field Experiment Estimate of Electoral
Fraud in Russian Parliamentary Elections.”
Proceedings of the National Academy of Sciences 110 (2):
448–52.
Eric Tassone, and Farzan Rohani. 2017. “Our Quest for Robust Time
Series Forecasting at Scale.”
Feller, William. 1971. An Introduction to Probability Theory and Its
Applications. Wiley.
Feynman, Richard. n.d. “Feynman :: Rules of
Chess.”
Fredholm, Ivar. 1903. “Sur Une Classe d’équations
Fonctionnelles.” Acta Mathematica 27 (none): 365–90.
Friedman, Jerome H., and Werner Stuetzle. 1981. “Projection
Pursuit Regression.” Journal of the American
Statistical Association 76 (376): 817–23.
Frühwirth-Schnatter, Sylvia, and Rudolf Frühwirth. 2007.
“Auxiliary Mixture Sampling with Applications to Logistic
Models.” Computational Statistics & Data Analysis 51
(April): 3509–28.
———. 2010. “Data Augmentation and MCMC
for Binary and Multinomial Logit
Models.” In Statistical Modelling and
Regression Structures: Festschrift in
Honour of Ludwig Fahrmeir, 111–32.
Frühwirth-Schnatter, Sylvia, Rudolf Frühwirth, Leonhard Held, and Håvard
Rue. 2008. “Improved Auxiliary Mixture Sampling for Hierarchical
Models of Non-Gaussian Data.” Statistics and
Computing 19 (4): 479.
Gan, Link, and Alan Fritzler. 2016. “How to Become an
Executive.”
García-Arenzana, Nicolás, Eva María Navarrete-Muñoz, Virginia Lope,
Pilar Moreo, Carmen Vidal, Soledad Laso-Pablos, Nieves Ascunce, et al.
2014. “Calorie
Intake, Olive Oil Consumption and Mammographic Density Among
Spanish Women.” International Journal of
Cancer 134 (8): 1916–25.
Gramacy, Robert B., and Nicholas G. Polson. 2012.
“Simulation-Based Regularized Logistic
Regression.” arXiv. https://arxiv.org/abs/1005.3430.
Griewank, Andreas, Kshitij Kulshreshtha, and Andrea Walther. 2012.
“On the Numerical Stability of Algorithmic
Differentiation.” Computing. Archives for Scientific
Computing 94 (2-4): 125–49.
Guan, Xinyu, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu,
Fan Yang, and Mao Yang. 2025. “rStar-Math: Small LLMs Can Master Math
Reasoning with Self-Evolved Deep Thinking.”
arXiv. https://arxiv.org/abs/2501.04519.
Hahn, P. Richard, Jared S. Murray, and Carlos M. Carvalho. 2020.
“Bayesian Regression Tree Models for Causal
Inference: Regularization, Confounding,
and Heterogeneous Effects (with
Discussion).” Bayesian Analysis 15 (3):
965–1056.
Halevy, Alon, Peter Norvig, and Fernando Pereira. 2009. “The
Unreasonable Effectiveness of Data.” IEEE Intelligent
Systems 24 (2): 8–12.
Hardt, Moritz, Ben Recht, and Yoram Singer. 2016. “Train Faster,
Generalize Better: Stability of Stochastic Gradient
Descent.” In International Conference on Machine
Learning, 1225–34. PMLR.
Hastie, Trevor, Andrea Montanari, Saharon Rosset, and Ryan J.
Tibshirani. 2022. “Surprises in High-Dimensional Ridgeless Least
Squares Interpolation.” The Annals of Statistics 50 (2):
949–86.
Held, Leonhard, and Chris C. Holmes. 2006. “Bayesian Auxiliary
Variable Models for Binary and Multinomial Regression.”
Bayesian Analysis 1 (1): 145–68.
Hermann, Jeremy, and Mike Del Balso. 2017. “Meet Michelangelo:
Uber’s Machine Learning Platform.”
Hou, Zhen, Hao Liu, Jiang Bian, Xing He, and Yan Zhuang. 2025.
“Enhancing Medical Coding Efficiency Through Domain-Specific
Fine-Tuned Large Language Models.” Npj Health Systems 2
(1): 14.
Hyndman, Rob J., and George Athanasopoulos. 2021. Forecasting:
Principles and Practice. 3rd ed. edition.
Melbourne, Australia: Otexts.
Igelnik, B., and N. Parikh. 2003. “Kolmogorov’s Spline
Network.” IEEE Transactions on Neural Networks 14 (4):
725–33.
Immer, Alexander, Matthias Bauer, Vincent Fortuin, Gunnar Rätsch, and
Khan Mohammad Emtiyaz. 2021. “Scalable Marginal Likelihood
Estimation for Model Selection in Deep Learning.” In
International Conference on Machine Learning, 4563–73. PMLR.
indeed. 2018. “Jobs of the Future: Emerging Trends in
Artificial Intelligence.”
Irwin, Neil. 2016. “How to Become a
C.E.O.? The Quickest Path
Is a Winding One.” The New York
Times, September.
Iwata, Shigeru. 2001. “Recentered and Rescaled Instrumental
Variable Estimation of Tobit and Probit
Models with Errors in
Variables.” Econometric Reviews 20 (3):
319–35.
Januschowski, Tim, Yuyang Wang, Kari Torkkola, Timo Erkkilä, Hilaf
Hasson, and Jan Gasthaus. 2022. “Forecasting with Trees.”
International Journal of Forecasting, Special
Issue: M5 competition, 38 (4): 1473–81.
kaggle. 2020. “M5 Forecasting -
Accuracy.”
https://kaggle.com/competitions/m5-forecasting-accuracy.
Kallenberg, Olav. 1997. Foundations of Modern
Probability. 2nd ed. edition. Springer.
Kalman, R. E., and R. S. Bucy. 1961. “New Results in
Linear Filtering and Prediction
Theory.” Journal of Basic Engineering 83 (1):
95–108.
Kalman, Rudolph Emil. 1960. “A New Approach to Linear Filtering
and Prediction Problems.” Transactions of the ASME–Journal of
Basic Engineering 82 (Series D): 35–45.
Keskar, Nitish Shirish, Dheevatsa Mudigere, Jorge Nocedal, Mikhail
Smelyanskiy, and Ping Tak Peter Tang. 2016. “On Large-Batch
Training for Deep Learning: Generalization Gap and Sharp
Minima.” arXiv Preprint arXiv:1609.04836. https://arxiv.org/abs/1609.04836.
Keynes, John Maynard. 1921. A Treatise on Probability.
Macmillan.
Kingma, Diederik, and Jimmy Ba. 2014. “Adam: A Method
for Stochastic Optimization.” arXiv Preprint
arXiv:1412.6980. https://arxiv.org/abs/1412.6980.
Klartag, Bo’az. 2007. “A Central Limit Theorem for Convex
Sets.” Inventiones Mathematicae 168 (1): 91–131.
Kolmogoroff, Andrei. 1931. “Über Die Analytischen
Methoden in Der
Wahrscheinlichkeitsrechnung.” Mathematische
Annalen 104 (1): 415–58.
Kolmogorov, AN. 1942. “Definition of Center of Dispersion and
Measure of Accuracy from a Finite Number of Observations (in
Russian).” Izv. Akad. Nauk SSSR Ser. Mat.
6: 3–32.
———. 1956. “On the Representation of Continuous Functions of
Several Variables as Superpositions of Functions of Smaller Number of
Variables.” In Soviet. Math.
Dokl, 108:179–82.
Kreps, David. 1988. Notes On The Theory Of Choice.
Boulder: Westview Press.
Levina, Elizaveta, and Peter Bickel. 2001. “The Earth Mover’s
Distance Is the Mallows Distance: Some Insights from
Statistics.” In Proceedings Eighth IEEE
International Conference on Computer Vision. ICCV
2001, 2:251–56. IEEE.
Lin, Zhouhan, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing
Xiang, Bowen Zhou, and Yoshua Bengio. 2017. “A Structured Self-attentive Sentence
Embedding.” arXiv. https://arxiv.org/abs/1703.03130.
Lindgren, Georg. 1978. “Markov Regime Models for
Mixed Distributions and Switching
Regressions.” Scandinavian Journal of Statistics
5 (2): 81–91. https://www.jstor.org/stable/4615692.
Lindley, D. V. 1961. “The Use of Prior
Probability Distributions in Statistical Inference
and Decisions.” In Proceedings of the
Fourth Berkeley Symposium on Mathematical
Statistics and Probability, Volume 1:
Contributions to the Theory of
Statistics, 4.1:453–69. University of California
Press.
Linnainmaa, Seppo. 1970. “The Representation of the Cumulative
Rounding Error of an Algorithm as a Taylor Expansion of the
Local Rounding Errors.” Master’s Thesis (in Finnish), Univ.
Helsinki, 6–7.
Logan, John A. 1983. “A Multivariate Model for Mobility
Tables.” American Journal of Sociology 89 (2): 324–49.
Logunov, A. A. 2004. “Henri Poincare and Relativity
Theory.” https://arxiv.org/abs/physics/0408077.
Lorentz, George G. 1976. “The 13th Problem of
Hilbert.” In Proceedings of
Symposia in Pure Mathematics, 28:419–30.
American Mathematical Society.
MacKay, David JC. 1992. “Bayesian Interpolation.”
Neural Computation 4 (3): 415–47.
Maharaj, Shiva, Nick Polson, and Vadim Sokolov. 2023. “Kramnik Vs
Nakamura or Bayes Vs p-Value.” {{SSRN
Scholarly Paper}}. Rochester, NY.
Malthouse, Edward, Richard Mah, and Ajit Tamhane. 1997. “Nonlinear
Partial Least Squares.” Computers & Chemical
Engineering 12 (April): 875–90.
Mehrasa, Nazanin, Yatao Zhong, Frederick Tung, Luke Bornn, and Greg
Mori. 2017. “Learning Person Trajectory Representations for Team
Activity Analysis.” arXiv Preprint arXiv:1706.00893. https://arxiv.org/abs/1706.00893.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013.
“Efficient Estimation of Word
Representations in Vector Space.” arXiv. https://arxiv.org/abs/1301.3781.
Milman, Vitali D, and Gideon Schechtman. 2009. Asymptotic Theory of
Finite Dimensional Normed Spaces: Isoperimetric
Inequalities in Riemannian Manifolds. Vol. 1200. Springer.
Nadaraya, E. A. 1964. “On Estimating
Regression.” Theory of Probability & Its
Applications 9 (1): 141–42.
Naik, Prasad, and Chih-Ling Tsai. 2000. “Partial Least
Squares Estimator for Single-Index Models.”
Journal of the Royal Statistical Society. Series B (Statistical
Methodology) 62 (4): 763–71. https://www.jstor.org/stable/2680619.
Nakkiran, Preetum, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak,
and Ilya Sutskever. 2021. “Deep Double Descent: Where Bigger
Models and More Data Hurt*.” Journal of Statistical
Mechanics: Theory and Experiment 2021 (12): 124003.
Nareklishvili, Maria, Nicholas Polson, and Vadim Sokolov. 2022.
“Deep Partial Least Squares for Iv Regression.” arXiv
Preprint arXiv:2207.02612. https://arxiv.org/abs/2207.02612.
———. 2023a. “Generative Causal Inference,”
June. https://arxiv.org/abs/2306.16096.
———. 2023b. “Feature Selection for Personalized
Policy Analysis,” July. https://arxiv.org/abs/2301.00251.
Nesterov, Yurii. 1983. “A Method of Solving a Convex Programming
Problem with Convergence Rate O (1/K2).” In
Soviet Mathematics Doklady, 27:372–76.
———. 2013. Introductory Lectures on Convex Optimization:
A Basic Course. Vol. 87. Springer Science &
Business Media.
Nicosia, Luca, Giulia Gnocchi, Ilaria Gorini, Massimo Venturini,
Federico Fontana, Filippo Pesapane, Ida Abiuso, et al. 2023.
“History of Mammography: Analysis of Breast Imaging
Diagnostic Achievements over the Last Century.”
Healthcare 11 (1596).
Ostrovskii, GM, Yu M Volin, and WW Borisov. 1971. “Uber Die
Berechnung von Ableitungen.” Wissenschaftliche Zeitschrift
Der Technischen Hochschule f Ur Chemie, Leuna-Merseburg 13 (4):
382–84.
Ouyang, Long, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright,
Pamela Mishkin, Chong Zhang, et al. 2022. “Training Language
Models to Follow Instructions with Human Feedback.” Advances
in Neural Information Processing Systems 35: 27730–44.
Pan, Zhenyu, Haozheng Luo, Manling Li, and Han Liu. 2025.
“Chain-of-Action: Faithful and
Multimodal Question Answering Through Large Language
Models.” arXiv. https://arxiv.org/abs/2403.17359.
Parzen, Emanuel. 2004. “Quantile Probability and
Statistical Data Modeling.” Statistical
Science 19 (4): 652–62. https://www.jstor.org/stable/4144436.
Petris, Giovanni. 2010. “An R Package for
Dynamic Linear Models.” Journal of Statistical
Software 36 (October): 1–16.
Poincaré, Henri. 1898. “La Mesure Du Temps.” Revue de
métaphysique Et de Morale 6 (1): 1–13.
Polson, Nicholas G., James G. Scott, and Jesse Windle. 2013.
“Bayesian Inference for Logistic
Models Using Pólya–Gamma Latent
Variables.” Journal of the American Statistical
Association 108 (504): 1339–49.
Polson, Nicholas G, and James Scott. 2018. AIQ: How
People and Machines Are Smarter Together. St. Martin’s Press.
Polson, Nicholas G., and Vadim Sokolov. 2023. “Generative
AI for Bayesian Computation.” https://arxiv.org/abs/2305.14972.
Polson, Nicholas G, Vadim Sokolov, et al. 2017. “Deep
Learning: A Bayesian Perspective.”
Bayesian Analysis 12 (4): 1275–1304.
Polson, Nicholas, and Steven Scott. 2011. “Data
Augmentation for Support Vector
Machines.” Bayesian Analysis 6 (March).
Polson, Nicholas, and Vadim Sokolov. 2020. “Deep Learning:
Computational Aspects.” Wiley Interdisciplinary
Reviews: Computational Statistics 12 (5): e1500.
Polson, Nicholas, Vadim Sokolov, and Jianeng Xu. 2021. “Deep
Learning Partial Least Squares.” arXiv Preprint
arXiv:2106.14085. https://arxiv.org/abs/2106.14085.
Polson, Nick, Fabrizio Ruggeri, and Vadim Sokolov. 2024.
“Generative Bayesian Computation for Maximum
Expected Utility.” Entropy 26 (12): 1076.
Poplin, Ryan, Avinash V Varadarajan, Katy Blumer, Yun Liu, Michael V
McConnell, Greg S Corrado, Lily Peng, and Dale R Webster. 2018.
“Prediction of Cardiovascular Risk Factors from Retinal Fundus
Photographs via Deep Learning.” Nature Biomedical
Engineering 2 (3): 158.
Qian, Chen, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li,
Cheng Yang, et al. 2024. “ChatDev:
Communicative Agents for Software
Development.” arXiv. https://arxiv.org/abs/2307.07924.
Ritter, Hippolyt, Aleksandar Botev, and David Barber. 2018. “A
Scalable Laplace Approximation For Neural Networks.”
Robbins, Herbert, and Sutton Monro. 1951. “A Stochastic
Approximation Method.” The Annals of Mathematical
Statistics 22 (3): 400–407.
Rubin, Hal S. Stern, John B. Carlin. 2015. Bayesian Data
Analysis. 3rd ed. New York: Chapman and
Hall/CRC.
Rumelhart, David E, Geoffrey E Hinton, and Ronald J Williams. 1986.
“Learning Representations by Back-Propagating Errors.”
Nature 323 (6088): 533.
Schmidhuber, Jürgen. 2015. “Deep Learning in Neural Networks:
An Overview.” Neural Networks 61: 85–117.
Schmidt-Hieber, Johannes. 2021. “The
Kolmogorov–Arnold Representation Theorem
Revisited.” Neural Networks 137 (May): 119–26.
Schwertman, Neil C, AJ Gilks, and J Cameron. 1990. “A Simple
Noncalculus Proof That the Median Minimizes the Sum of the Absolute
Deviations.” The American Statistician 44 (1): 38–39.
Scott, Steven L. 2002. “Bayesian Methods for
Hidden Markov Models.” Journal of the American
Statistical Association 97 (457): 337–51.
———. 2015. “Multi-Armed Bandit Experiments in the Online Service
Economy.” Applied Stochastic Models in Business and
Industry 31 (1): 37–45.
Scott, Steven L. 2022. “BoomSpikeSlab:
MCMC for Spike and Slab
Regression.”
Scott, Steven L., and Hal R. Varian. 2015. “Bayesian
Variable Selection for Nowcasting Economic Time
Series.” In Economic Analysis of the
Digital Economy, 119–35. University of Chicago Press.
Scott, Steven, and Hal Varian. 2014. “Predicting the
Present with Bayesian Structural Time
Series.” Int. J. Of Mathematical Modelling and
Numerical Optimisation 5 (January): 4–23.
Sean J. Taylor, and Ben Letham. 2017. “Prophet: Forecasting at
Scale - Meta Research.” Meta Research.
https://research.facebook.com/blog/2017/2/prophet-forecasting-at-scale/.
Shen, Changyu, Enrico G Ferro, Huiping Xu, Daniel B Kramer, Rushad
Patell, and Dhruv S Kazi. 2021. “Underperformance of Contemporary
Phase III Oncology Trials and Strategies for
Improvement.” Journal of the National Comprehensive Cancer
Network 19 (9): 1072–78.
Shiryayev, A. N. 1992. “On Analytical Methods in Probability
Theory.” In Selected Works of a. N.
Kolmogorov: Volume II Probability Theory and
Mathematical Statistics, edited by A. N. Shiryayev, 62–108.
Dordrecht: Springer Netherlands.
Silver, David, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou,
Matthew Lai, Arthur Guez, Marc Lanctot, et al. 2017. “Mastering
Chess and Shogi by Self-Play with
a General Reinforcement Learning Algorithm.” arXiv.
https://arxiv.org/abs/1712.01815.
Simpson, Edward. 2010. “Edward Simpson:
Bayes at Bletchley Park.”
Significance 7 (2): 76–80.
Singh, Pratyush Kumar, Kathryn A. Farrell-Maupin, and Danial Faghihi.
2024. “A Framework for Strategic
Discovery of Credible Neural Network Surrogate
Models Under Uncertainty.” arXiv. https://arxiv.org/abs/2403.08901.
Smith, A. F. M. 1975. “A Bayesian Approach to
Inference about a Change-Point in a
Sequence of Random Variables.”
Biometrika 62 (2): 407–16. https://www.jstor.org/stable/2335381.
Sokolov, Vadim. 2017. “Discussion of ‘Deep
Learning for Finance: Deep Portfolios’.” Applied
Stochastic Models in Business and Industry 33 (1): 16–18.
Spiegelhalter, David, and Yin-Lam Ng. 2009. “One Match to
Go!” Significance 6 (4): 151–53.
Stein, Charles. 1964. “Inadmissibility of the Usual Estimator for
the Variance of a Normal Distribution with Unknown Mean.”
Annals of the Institute of Statistical Mathematics 16 (1):
155–60.
Stern, H, Adam Sugano, J Albert, and R Koning. 2007. “Inference
about Batter-Pitcher Matchups in Baseball from Small Samples.”
Statistical Thinking in Sports, 153–65.
Stigler, Stephen M. 1981. “Gauss and the Invention of Least
Squares.” The Annals of Statistics, 465–74.
Sun, Duxin, Wei Gao, Hongxiang Hu, and Simon Zhou. 2022. “Why 90%
of Clinical Drug Development Fails and How to Improve It?”
Acta Pharmaceutica Sinica B 12 (7): 3049–62.
Sutskever, Ilya, James Martens, George Dahl, and Geoffrey Hinton. 2013.
“On the Importance of Initialization and Momentum in Deep
Learning.” In International Conference on Machine
Learning, 1139–47.
Taleb, Nassim Nicholas. 2007. The Black Swan: The
Impact of the Highly Improbable. Annotated
edition. New York. N.Y: Random House.
Tarone, Robert E. 1982. “The Use of Historical Control Information
in Testing for a Trend in Proportions.” Biometrics. Journal
of the International Biometric Society, 215–20.
Tesauro, Gerald. 1995. “Temporal Difference Learning and
TD-Gammon.” Communications of the ACM 38
(3): 58–68.
Tiao, Louis. 2019. “Pólya-Gamma Bayesian
Logistic Regression.” Blog post.
Tikhonov, Andrei N. 1963. “Solution of Incorrectly Formulated
Problems and the Regularization Method.” Sov Dok 4:
1035–38.
Tikhonov, Andrey Nikolayevich et al. 1943. “On the Stability of
Inverse Problems.” In Dokl. Akad. Nauk Sssr, 39:195–98.
Tsai, Yao-Hung Hubert, Shaojie Bai, Makoto Yamada, Louis-Philippe
Morency, and Ruslan Salakhutdinov. 2019. “Transformer
Dissection: A Unified Understanding of
Transformer’s Attention via the
Lens of Kernel.” arXiv. https://arxiv.org/abs/1908.11775.
Varian, Hal R. 2010. “Computer Mediated
Transactions.” American Economic Review 100 (2):
1–10.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion
Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023.
“Attention Is All You Need.” arXiv. https://arxiv.org/abs/1706.03762.
Vecer, Jan, Frantisek Kopriva, and Tomoyuki Ichiba. 2009.
“Estimating the Effect of the Red Card
in Soccer: When to Commit an
Offense in Exchange for
Preventing a Goal Opportunity.”
Journal of Quantitative Analysis in Sports 5 (1).
Viterbi, A. 1967. “Error Bounds for Convolutional Codes and an
Asymptotically Optimum Decoding Algorithm.” IEEE Transactions
on Information Theory 13 (2): 260–69.
Watanabe, Sumio. 2013. “A Widely Applicable Bayesian
Information Criterion.” The Journal of Machine Learning
Research 14 (1): 867–97.
Watson, Geoffrey S. 1964. “Smooth Regression
Analysis.” Sankhyā: The Indian Journal of
Statistics, Series A (1961-2002) 26 (4): 359–72. https://www.jstor.org/stable/25049340.
Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter,
Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023.
“Chain-of-Thought Prompting Elicits Reasoning in
Large Language Models.” arXiv. https://arxiv.org/abs/2201.11903.
Werbos, Paul. 1974. “Beyond Regression:" New Tools for Prediction
and Analysis in the Behavioral Sciences.” Ph. D.
Dissertation, Harvard University.
Werbos, Paul J. 1982. “Applications of Advances in Nonlinear
Sensitivity Analysis.” In System Modeling and
Optimization, 762–70. Springer.
West, Mike, and Jeff Harrison. 1997. Bayesian Forecasting and
Dynamic Models. Springer.
Windle, Jesse. 2023. “BayesLogit:
Bayesian Logistic Regression.” R package version
2.1.
Windle, Jesse, Nicholas G. Polson, and James G. Scott. 2014.
“Sampling Polya-Gamma Random Variates: Alternate and
Approximate Techniques.” arXiv. https://arxiv.org/abs/1405.0506.
Wojna, Zbigniew, Alex Gorban, Dar-Shyang Lee, Kevin Murphy, Qian Yu,
Yeqing Li, and Julian Ibarz. 2017. “Attention-Based Extraction of
Structured Information from Street View Imagery.” arXiv
Preprint arXiv:1704.03549. https://arxiv.org/abs/1704.03549.
Wold, Herman. 1975/ed. “Soft Modelling by
Latent Variables: The Non-Linear Iterative Partial
Least Squares (NIPALS)
Approach.” Journal of Applied Probability
12 (S1): 117–42.
Yaari, Menahem E. 1987. “The Dual Theory of
Choice Under Risk.”
Econometrica 55 (1): 95–115. https://www.jstor.org/stable/1911158.
Yang, An, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang,
Jiandong Jiang, et al. 2025. “Qwen2. 5-1m Technical
Report.” arXiv Preprint arXiv:2501.15383. https://arxiv.org/abs/2501.15383.
Yao, Shunyu, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths,
Yuan Cao, and Karthik Narasimhan. 2023. “Tree of
Thoughts: Deliberate Problem Solving with
Large Language Models.” arXiv. https://arxiv.org/abs/2305.10601.
Ye, Yixin, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, and Pengfei
Liu. 2025. “LIMO: Less Is
More for Reasoning.” arXiv. https://arxiv.org/abs/2502.03387.
Zeiler, Matthew D. 2012. “ADADELTA: An Adaptive
Learning Rate Method.” arXiv Preprint arXiv:1212.5701.
https://arxiv.org/abs/1212.5701.
Zhang, Yichi, Anirban Datta, and Sudipto Banerjee. 2018. “Scalable
Gaussian Process Classification with Pólya-Gamma Data
Augmentation.” arXiv Preprint arXiv:1802.06383. https://arxiv.org/abs/1802.06383.