References

A. N. Kolmogorov. 1938. “On the Analytic Methods of Probability Theory.” Rossíiskaya Akademiya Nauk, no. 5: 5–41.
Acemoglu, Daron, and Pascual Restrepo. 2018. “Artificial Intelligence, Automation and Work.” National Bureau of Economic Research.
Actor, Jonas. 2018. “Computation for the Kolmogorov Superposition Theorem.” {{MS Thesis}}, Rice.
Albert, Jim. 1993. “A Statistical Analysis of Hitting Streaks in Baseball: Comment.” Journal of the American Statistical Association 88 (424): 1184–88. https://www.jstor.org/stable/2291255.
Altić, Mirela Slukan. 2013. “Exploring Along the Rome Meridian: Roger Boscovich and the First Modern Map of the Papal States.” In History of Cartography: International Symposium of the ICA, 2012, 71–89. Springer.
Amazon. 2021. “The History of Amazon’s Forecasting Algorithm.” Amazon Science. https://www.amazon.science/latest-news/the-history-of-amazons-forecasting-algorithm.
Amit, Yali, Gilles Blanchard, and Kenneth Wilder. 2000. “Multiple Randomized Classifiers: MRCL.”
Andrews, D. F., and C. L. Mallows. 1974. “Scale Mixtures of Normal Distributions.” Journal of the Royal Statistical Society. Series B (Methodological) 36 (1): 99–102. https://www.jstor.org/stable/2984774.
Arnol’d, Vladimir I. 2006. “Forgotten and Neglected Theories of Poincaré.” Russian Mathematical Surveys 61 (1): 1.
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2014. “Neural Machine Translation by Jointly Learning to Align and Translate.” arXiv. https://arxiv.org/abs/1409.0473.
Baum, Leonard E., Ted Petrie, George Soules, and Norman Weiss. 1970. “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains.” The Annals of Mathematical Statistics 41 (1): 164–71. https://www.jstor.org/stable/2239727.
Baylor, Denis, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, et al. 2017. “Tfx: A Tensorflow-Based Production-Scale Machine Learning Platform.” In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1387–95. ACM.
Behnia, Farnaz, Dominik Karbowski, and Vadim Sokolov. 2021. “Deep Generative Models for Vehicle Speed Trajectories.” arXiv Preprint arXiv:2112.08361. https://arxiv.org/abs/2112.08361.
Benoit, Dries F., and Dirk Van den Poel. 2012. “Binary Quantile Regression: A Bayesian Approach Based on the Asymmetric Laplace Distribution.” Journal of Applied Econometrics 27 (7): 1174–88.
Berge, Travis, Nitish Sinha, and Michael Smolyansky. 2016. “Which Market Indicators Best Forecast Recessions?” FEDS Notes, August.
Bertsimas, Dimitris, Angela King, and Rahul Mazumder. 2016. “Best Subset Selection via a Modern Optimization Lens.” The Annals of Statistics 44 (2): 813–52.
Bhadra, Anindya, Jyotishka Datta, Nick Polson, Vadim Sokolov, and Jianeng Xu. 2021. “Merging Two Cultures: Deep and Statistical Learning.” arXiv Preprint arXiv:2110.11561. https://arxiv.org/abs/2110.11561.
Bojarski, Mariusz, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, et al. 2016. “End to End Learning for Self-Driving Cars.” arXiv Preprint arXiv:1604.07316. https://arxiv.org/abs/1604.07316.
Bonfiglio, Rita, Annarita Granaglia, Raffaella Giocondo, Manuel Scimeca, and Elena Bonanno. 2021. “Molecular Aspects and Prognostic Significance of Microcalcifications in Human Pathology: A Narrative Review.” International Journal of Molecular Sciences 22 (120).
Bottou, Léon, Frank E Curtis, and Jorge Nocedal. 2018. “Optimization Methods for Large-Scale Machine Learning.” SIAM Review 60 (2): 223–311.
Box, George E. P., and George C. Tiao. 1992. Bayesian Inference in Statistical Analysis. New York: Wiley-Interscience.
Brillinger, David R. 2012. “A Generalized Linear Model With Gaussian Regressor Variables.” In Selected Works of David Brillinger, edited by Peter Guttorp and David Brillinger, 589–606. Selected Works in Probability and Statistics. New York, NY: Springer.
Bryson, Arthur E. 1961. “A Gradient Method for Optimizing Multi-Stage Allocation Processes.” In Proc. Harvard Univ. Symposium on Digital Computers and Their Applications. Vol. 72.
Campagnoli, Patrizia, Sonia Petrone, and Giovanni Petris. 2009. Dynamic Linear Models with R. New York, NY: Springer.
Cannon, Alex J. 2018. “Non-Crossing Nonlinear Regression Quantiles by Monotone Composite Quantile Regression Neural Network, with Application to Rainfall Extremes.” Stochastic Environmental Research and Risk Assessment 32 (11): 3207–25.
Carreira-Perpinán, Miguel A, and Weiran Wang. 2014. “Distributed Optimization of Deeply Nested Systems.” In AISTATS, 10–19.
Carvalho, Carlos M, Hedibert F Lopes, Nicholas G Polson, and Matt A Taddy. 2010. “Particle Learning for General Mixtures.” Bayesian Analysis 5 (4): 709–40.
Carvalho, Carlos M., Nicholas G. Polson, and James G. Scott. 2010. “The Horseshoe Estimator for Sparse Signals.” Biometrika, asq017.
Chernozhukov, Victor, Iván Fernández-Val, and Alfred Galichon. 2010. “Quantile and Probability Curves Without Crossing.” Econometrica 78 (3): 1093–1125. https://www.jstor.org/stable/40664520.
Chib, Siddhartha. 1998. “Estimation and Comparison of Multiple Change-Point Models.” Journal of Econometrics 86 (2): 221–41.
Cook, R. Dennis. 2007. “Fisher Lecture: Dimension Reduction in Regression.” Statistical Science, 1–26. https://www.jstor.org/stable/27645799.
Cootner, Paul H. 1967. The Random Character of Stock Market Prices. MIT press.
Coppejans, Mark. 2004. “On Kolmogorov’s Representation of Functions of Several Variables by Functions of One Variable.” Journal of Econometrics 123 (1): 1–31.
Dabney, Will, Georg Ostrovski, David Silver, and Rémi Munos. 2018. “Implicit Quantile Networks for Distributional Reinforcement Learning.” arXiv. https://arxiv.org/abs/1806.06923.
Dabney, Will, Mark Rowland, Marc G. Bellemare, and Rémi Munos. 2017. “Distributional Reinforcement Learning with Quantile Regression.” arXiv. https://arxiv.org/abs/1710.10044.
Davison, Anthony Christopher. 2003. Statistical Models. Vol. 11. Cambridge university press.
Dean, Jeffrey, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, et al. 2012. “Large Scale Distributed Deep Networks.” In Advances in Neural Information Processing Systems, 1223–31.
Demb, Robert, and David Sprecher. 2021. “A Note on Computing with Kolmogorov Superpositions Without Iterations.” Neural Networks 144 (December): 438–42.
Devroye, Luc. 1986. Non-Uniform Random Variate Generation. Springer Science & Business Media.
Diaconis, Persi, and David Freedman. 1987. “A Dozen de Finetti-style Results in Search of a Theory.” In Annales de l’IHP Probabilités Et Statistiques, 23:397–423.
Diaconis, Persi, and Mehrdad Shahshahani. 1981. “Generating a Random Permutation with Random Transpositions.” Probability Theory and Related Fields 57 (2): 159–79.
———. 1984. “On Nonlinear Functions of Linear Combinations.” SIAM Journal on Scientific and Statistical Computing 5 (1): 175–91.
Diaconis, P., and D. Ylvisaker. 1983. “Quantifying Prior Opinion.”
Dixon, Mark J., and Stuart G. Coles. 1997. “Modelling Association Football Scores and Inefficiencies in the Football Betting Market.” Journal of the Royal Statistical Society Series C: Applied Statistics 46 (2): 265–80.
Dixon, Matthew F, Nicholas G Polson, and Vadim O Sokolov. 2019. “Deep Learning for Spatio-Temporal Modeling: Dynamic Traffic Flows and High Frequency Trading.” Applied Stochastic Models in Business and Industry 35 (3): 788–807.
Dreyfus, Stuart. 1962. “The Numerical Solution of Variational Problems.” Journal of Mathematical Analysis and Applications 5 (1): 30–45.
———. 1973. “The Computational Solution of Optimal Control Problems with Time Lag.” IEEE Transactions on Automatic Control 18 (4): 383–85.
Efron, Bradley, and Carl Morris. 1977. “Stein’s Paradox in Statistics.” Scientific American 236 (5): 119–27.
Enikolopov, Ruben, Vasily Korovkin, Maria Petrova, Konstantin Sonin, and Alexei Zakharov. 2013. “Field Experiment Estimate of Electoral Fraud in Russian Parliamentary Elections.” Proceedings of the National Academy of Sciences 110 (2): 448–52.
Eric Tassone, and Farzan Rohani. 2017. “Our Quest for Robust Time Series Forecasting at Scale.”
Feller, William. 1971. An Introduction to Probability Theory and Its Applications. Wiley.
Feynman, Richard. n.d. “Feynman :: Rules of Chess.”
Frank, Ildiko E., and Jerome H. Friedman. 1993. “A Statistical View of Some Chemometrics Regression Tools.” Technometrics 35 (2): 109–35. https://www.jstor.org/stable/1269656.
Fredholm, Ivar. 1903. “Sur Une Classe d’équations Fonctionnelles.” Acta Mathematica 27 (none): 365–90.
Friedman, Jerome H., and Werner Stuetzle. 1981. “Projection Pursuit Regression.” Journal of the American Statistical Association 76 (376): 817–23.
Frühwirth-Schnatter, Sylvia, and Rudolf Frühwirth. 2007. “Auxiliary Mixture Sampling with Applications to Logistic Models.” Computational Statistics & Data Analysis 51 (April): 3509–28.
———. 2010. “Data Augmentation and MCMC for Binary and Multinomial Logit Models.” In Statistical Modelling and Regression Structures: Festschrift in Honour of Ludwig Fahrmeir, 111–32.
Frühwirth-Schnatter, Sylvia, Rudolf Frühwirth, Leonhard Held, and Håvard Rue. 2008. “Improved Auxiliary Mixture Sampling for Hierarchical Models of Non-Gaussian Data.” Statistics and Computing 19 (4): 479.
Gan, Link, and Alan Fritzler. 2016. “How to Become an Executive.”
García-Arenzana, Nicolás, Eva María Navarrete-Muñoz, Virginia Lope, Pilar Moreo, Carmen Vidal, Soledad Laso-Pablos, Nieves Ascunce, et al. 2014. Calorie Intake, Olive Oil Consumption and Mammographic Density Among Spanish Women.” International Journal of Cancer 134 (8): 1916–25.
George, Edward I., and Robert E. and McCulloch. 1993. “Variable Selection via Gibbs Sampling.” Journal of the American Statistical Association 88 (423): 881–89.
Gramacy, Robert B., and Nicholas G. Polson. 2012. “Simulation-Based Regularized Logistic Regression.” arXiv. https://arxiv.org/abs/1005.3430.
Griewank, Andreas, Kshitij Kulshreshtha, and Andrea Walther. 2012. “On the Numerical Stability of Algorithmic Differentiation.” Computing. Archives for Scientific Computing 94 (2-4): 125–49.
Hahn, P. Richard, Jared S. Murray, and Carlos M. Carvalho. 2020. “Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion).” Bayesian Analysis 15 (3): 965–1056.
Halevy, Alon, Peter Norvig, and Fernando Pereira. 2009. “The Unreasonable Effectiveness of Data.” IEEE Intelligent Systems 24 (2): 8–12.
Hardt, Moritz, Ben Recht, and Yoram Singer. 2016. “Train Faster, Generalize Better: Stability of Stochastic Gradient Descent.” In International Conference on Machine Learning, 1225–34. PMLR.
Held, Leonhard, and Chris C. Holmes. 2006. “Bayesian Auxiliary Variable Models for Binary and Multinomial Regression.” Bayesian Analysis 1 (1): 145–68.
Hermann, Jeremy, and Mike Del Balso. 2017. “Meet Michelangelo: Uber’s Machine Learning Platform.”
Huang, Jian, Joel L. Horowitz, and Shuangge Ma. 2008. “Asymptotic Properties of Bridge Estimators in Sparse High-Dimensional Regression Models.” The Annals of Statistics 36 (2): 587–613.
Hyndman, Rob J., and George Athanasopoulos. 2021. Forecasting: Principles and Practice. 3rd ed. edition. Melbourne, Australia: Otexts.
Igelnik, B., and N. Parikh. 2003. “Kolmogorov’s Spline Network.” IEEE Transactions on Neural Networks 14 (4): 725–33.
indeed. 2018. “Jobs of the Future: Emerging Trends in Artificial Intelligence.”
Irwin, Neil. 2016. “How to Become a C.E.O.? The Quickest Path Is a Winding One.” The New York Times, September.
Iwata, Shigeru. 2001. “Recentered and Rescaled Instrumental Variable Estimation of Tobit and Probit Models with Errors in Variables.” Econometric Reviews 20 (3): 319–35.
Januschowski, Tim, Yuyang Wang, Kari Torkkola, Timo Erkkilä, Hilaf Hasson, and Jan Gasthaus. 2022. “Forecasting with Trees.” International Journal of Forecasting, Special Issue: M5 competition, 38 (4): 1473–81.
Jeffreys, Harold. 1998. Theory of Probability. Third Edition, Third Edition. Oxford Classic Texts in the Physical Sciences. Oxford, New York: Oxford University Press.
kaggle. 2020. “M5 Forecasting - Accuracy.” https://kaggle.com/competitions/m5-forecasting-accuracy.
Kallenberg, Olav. 1997. Foundations of Modern Probability. 2nd ed. edition. Springer.
Keskar, Nitish Shirish, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima.” arXiv Preprint arXiv:1609.04836. https://arxiv.org/abs/1609.04836.
Keynes, John Maynard. 1921. A Treatise on Probability. Macmillan.
Kingma, Diederik, and Jimmy Ba. 2014. “Adam: A Method for Stochastic Optimization.” arXiv Preprint arXiv:1412.6980. https://arxiv.org/abs/1412.6980.
Klartag, Bo’az. 2007. “A Central Limit Theorem for Convex Sets.” Inventiones Mathematicae 168 (1): 91–131.
Kolmogoroff, Andrei. 1931. Über Die Analytischen Methoden in Der Wahrscheinlichkeitsrechnung.” Mathematische Annalen 104 (1): 415–58.
Kolmogorov, AN. 1942. “Definition of Center of Dispersion and Measure of Accuracy from a Finite Number of Observations (in Russian).” Izv. Akad. Nauk SSSR Ser. Mat. 6: 3–32.
———. 1956. “On the Representation of Continuous Functions of Several Variables as Superpositions of Functions of Smaller Number of Variables.” In Soviet. Math. Dokl, 108:179–82.
Kreps, David. 1988. Notes On The Theory Of Choice. Boulder: Westview Press.
Levina, Elizaveta, and Peter Bickel. 2001. “The Earth Mover’s Distance Is the Mallows Distance: Some Insights from Statistics.” In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, 2:251–56. IEEE.
Lin, Zhouhan, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. “A Structured Self-attentive Sentence Embedding.” arXiv. https://arxiv.org/abs/1703.03130.
Lindgren, Georg. 1978. “Markov Regime Models for Mixed Distributions and Switching Regressions.” Scandinavian Journal of Statistics 5 (2): 81–91. https://www.jstor.org/stable/4615692.
Linnainmaa, Seppo. 1970. “The Representation of the Cumulative Rounding Error of an Algorithm as a Taylor Expansion of the Local Rounding Errors.” Master’s Thesis (in Finnish), Univ. Helsinki, 6–7.
Logan, John A. 1983. “A Multivariate Model for Mobility Tables.” American Journal of Sociology 89 (2): 324–49.
Logunov, A. A. 2004. “Henri Poincare and Relativity Theory.” https://arxiv.org/abs/physics/0408077.
Lorentz, George G. 1976. “The 13th Problem of Hilbert.” In Proceedings of Symposia in Pure Mathematics, 28:419–30. American Mathematical Society.
Maharaj, Shiva, Nick Polson, and Vadim Sokolov. 2023. “Kramnik Vs Nakamura or Bayes Vs p-Value.” {{SSRN Scholarly Paper}}. Rochester, NY.
Malthouse, Edward, Richard Mah, and Ajit Tamhane. 1997. “Nonlinear Partial Least Squares.” Computers & Chemical Engineering 12 (April): 875–90.
Mazumder, Rahul, Friedman, and Trevor and Hastie. 2011. SparseNet: Coordinate Descent With Nonconvex Penalties.” Journal of the American Statistical Association 106 (495): 1125–38.
Mehrasa, Nazanin, Yatao Zhong, Frederick Tung, Luke Bornn, and Greg Mori. 2017. “Learning Person Trajectory Representations for Team Activity Analysis.” arXiv Preprint arXiv:1706.00893. https://arxiv.org/abs/1706.00893.
Milman, Vitali D, and Gideon Schechtman. 2009. Asymptotic Theory of Finite Dimensional Normed Spaces: Isoperimetric Inequalities in Riemannian Manifolds. Vol. 1200. Springer.
Mitchell, T. J., and J. J. Beauchamp. 1988. “Bayesian Variable Selection in Linear Regression.” Journal of the American Statistical Association 83 (404): 1023–32. https://www.jstor.org/stable/2290129.
Nadaraya, E. A. 1964. “On Estimating Regression.” Theory of Probability & Its Applications 9 (1): 141–42.
Naik, Prasad, and Chih-Ling Tsai. 2000. “Partial Least Squares Estimator for Single-Index Models.” Journal of the Royal Statistical Society. Series B (Statistical Methodology) 62 (4): 763–71. https://www.jstor.org/stable/2680619.
Nareklishvili, Maria, Nicholas Polson, and Vadim Sokolov. 2022. “Deep Partial Least Squares for Iv Regression.” arXiv Preprint arXiv:2207.02612. https://arxiv.org/abs/2207.02612.
———. 2023a. “Generative Causal Inference,” June. https://arxiv.org/abs/2306.16096.
———. 2023b. “Feature Selection for Personalized Policy Analysis,” July. https://arxiv.org/abs/2301.00251.
Nesterov, Yurii. 1983. “A Method of Solving a Convex Programming Problem with Convergence Rate O (1/K2).” In Soviet Mathematics Doklady, 27:372–76.
———. 2013. Introductory Lectures on Convex Optimization: A Basic Course. Vol. 87. Springer Science & Business Media.
Nicosia, Luca, Giulia Gnocchi, Ilaria Gorini, Massimo Venturini, Federico Fontana, Filippo Pesapane, Ida Abiuso, et al. 2023. “History of Mammography: Analysis of Breast Imaging Diagnostic Achievements over the Last Century.” Healthcare 11 (1596).
Ostrovskii, GM, Yu M Volin, and WW Borisov. 1971. “Uber Die Berechnung von Ableitungen.” Wissenschaftliche Zeitschrift Der Technischen Hochschule f Ur Chemie, Leuna-Merseburg 13 (4): 382–84.
Parzen, Emanuel. 2004. “Quantile Probability and Statistical Data Modeling.” Statistical Science 19 (4): 652–62. https://www.jstor.org/stable/4144436.
Petris, Giovanni. 2010. “An R Package for Dynamic Linear Models.” Journal of Statistical Software 36 (October): 1–16.
Poincaré, Henri. 1898. “La Mesure Du Temps.” Revue de métaphysique Et de Morale 6 (1): 1–13.
Polson, Nicholas G., and James G. Scott. 2011. “Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction.” In Bayesian Statistics 9, edited by José M. Bernardo, M. J. Bayarri, James O. Berger, A. P. Dawid, David Heckerman, Adrian F. M. Smith, and Mike West, 0. Oxford University Press.
Polson, Nicholas G., James G. Scott, and Jesse Windle. 2013. “Bayesian Inference for Logistic Models Using PólyaGamma Latent Variables.” Journal of the American Statistical Association 108 (504): 1339–49.
Polson, Nicholas G, and James Scott. 2018. AIQ: How People and Machines Are Smarter Together. St. Martin’s Press.
Polson, Nicholas G., and Vadim Sokolov. 2023. “Generative AI for Bayesian Computation,” June. https://arxiv.org/abs/2305.14972.
Polson, Nicholas G, Vadim Sokolov, et al. 2017. “Deep Learning: A Bayesian Perspective.” Bayesian Analysis 12 (4): 1275–1304.
Polson, Nicholas, and Steven Scott. 2011. “Data Augmentation for Support Vector Machines.” Bayesian Analysis 6 (March).
Polson, Nicholas, and Vadim Sokolov. 2020. “Deep Learning: Computational Aspects.” Wiley Interdisciplinary Reviews: Computational Statistics 12 (5): e1500.
Polson, Nicholas, Vadim Sokolov, and Jianeng Xu. 2021. “Deep Learning Partial Least Squares.” arXiv Preprint arXiv:2106.14085. https://arxiv.org/abs/2106.14085.
Polson, Nick, Fabrizio Ruggeri, and Vadim Sokolov. 2024. “Generative Bayesian Computation for Maximum Expected Utility.” Entropy 26 (12): 1076.
Poplin, Ryan, Avinash V Varadarajan, Katy Blumer, Yun Liu, Michael V McConnell, Greg S Corrado, Lily Peng, and Dale R Webster. 2018. “Prediction of Cardiovascular Risk Factors from Retinal Fundus Photographs via Deep Learning.” Nature Biomedical Engineering 2 (3): 158.
Robbins, Herbert, and Sutton Monro. 1951. “A Stochastic Approximation Method.” The Annals of Mathematical Statistics 22 (3): 400–407.
Rubin, Hal S. Stern, John B. Carlin. 2015. Bayesian Data Analysis. 3rd ed. New York: Chapman and Hall/CRC.
Rumelhart, David E, Geoffrey E Hinton, and Ronald J Williams. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323 (6088): 533.
Schmidhuber, Jürgen. 2015. “Deep Learning in Neural Networks: An Overview.” Neural Networks 61: 85–117.
Schmidt-Hieber, Johannes. 2021. “The KolmogorovArnold Representation Theorem Revisited.” Neural Networks 137 (May): 119–26.
Schwertman, Neil C, AJ Gilks, and J Cameron. 1990. “A Simple Noncalculus Proof That the Median Minimizes the Sum of the Absolute Deviations.” The American Statistician 44 (1): 38–39.
Scott, Steven L. 2015. “Multi-Armed Bandit Experiments in the Online Service Economy.” Applied Stochastic Models in Business and Industry 31 (1): 37–45.
Scott, Steven L. 2022. BoomSpikeSlab: MCMC for Spike and Slab Regression.”
Scott, Steven L., and Hal R. Varian. 2015. “Bayesian Variable Selection for Nowcasting Economic Time Series.” In Economic Analysis of the Digital Economy, 119–35. University of Chicago Press.
Scott, Steven, and Hal Varian. 2014. “Predicting the Present with Bayesian Structural Time Series.” Int. J. Of Mathematical Modelling and Numerical Optimisation 5 (January): 4–23.
Sean J. Taylor, and Ben Letham. 2017. “Prophet: Forecasting at Scale - Meta Research.” Meta Research. https://research.facebook.com/blog/2017/2/prophet-forecasting-at-scale/.
Shen, Changyu, Enrico G Ferro, Huiping Xu, Daniel B Kramer, Rushad Patell, and Dhruv S Kazi. 2021. “Underperformance of Contemporary Phase III Oncology Trials and Strategies for Improvement.” Journal of the National Comprehensive Cancer Network 19 (9): 1072–78.
Shiryayev, A. N. 1992. “On Analytical Methods in Probability Theory.” In Selected Works of a. N. Kolmogorov: Volume II Probability Theory and Mathematical Statistics, edited by A. N. Shiryayev, 62–108. Dordrecht: Springer Netherlands.
Silver, David, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, et al. 2017. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.” arXiv. https://arxiv.org/abs/1712.01815.
Simpson, Edward. 2010. “Edward Simpson: Bayes at Bletchley Park.” Significance 7 (2): 76–80.
Smith, A. F. M. 1975. “A Bayesian Approach to Inference about a Change-Point in a Sequence of Random Variables.” Biometrika 62 (2): 407–16. https://www.jstor.org/stable/2335381.
Sokolov, Vadim. 2017. “Discussion of Deep Learning for Finance: Deep Portfolios’.” Applied Stochastic Models in Business and Industry 33 (1): 16–18.
Spiegelhalter, David, and Yin-Lam Ng. 2009. “One Match to Go!” Significance 6 (4): 151–53.
Stein, Charles. 1964. “Inadmissibility of the Usual Estimator for the Variance of a Normal Distribution with Unknown Mean.” Annals of the Institute of Statistical Mathematics 16 (1): 155–60.
Stern, H, Adam Sugano, J Albert, and R Koning. 2007. “Inference about Batter-Pitcher Matchups in Baseball from Small Samples.” Statistical Thinking in Sports, 153–65.
Stigler, Stephen M. 1981. “Gauss and the Invention of Least Squares.” The Annals of Statistics, 465–74.
Sun, Duxin, Wei Gao, Hongxiang Hu, and Simon Zhou. 2022. “Why 90% of Clinical Drug Development Fails and How to Improve It?” Acta Pharmaceutica Sinica B 12 (7): 3049–62.
Sutskever, Ilya, James Martens, George Dahl, and Geoffrey Hinton. 2013. “On the Importance of Initialization and Momentum in Deep Learning.” In International Conference on Machine Learning, 1139–47.
Taleb, Nassim Nicholas. 2007. The Black Swan: The Impact of the Highly Improbable. Annotated edition. New York. N.Y: Random House.
Tarone, Robert E. 1982. “The Use of Historical Control Information in Testing for a Trend in Proportions.” Biometrics. Journal of the International Biometric Society, 215–20.
Tesauro, Gerald. 1995. “Temporal Difference Learning and TD-Gammon.” Communications of the ACM 38 (3): 58–68.
Tiao, Louis. 2019. “Pólya-Gamma Bayesian Logistic Regression.” Blog post.
Tsai, Yao-Hung Hubert, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. “Transformer Dissection: A Unified Understanding of Transformer’s Attention via the Lens of Kernel.” arXiv. https://arxiv.org/abs/1908.11775.
Varian, Hal R. 2010. “Computer Mediated Transactions.” American Economic Review 100 (2): 1–10.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. “Attention Is All You Need.” arXiv. https://arxiv.org/abs/1706.03762.
Vecer, Jan, Frantisek Kopriva, and Tomoyuki Ichiba. 2009. “Estimating the Effect of the Red Card in Soccer: When to Commit an Offense in Exchange for Preventing a Goal Opportunity.” Journal of Quantitative Analysis in Sports 5 (1).
Viterbi, A. 1967. “Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm.” IEEE Transactions on Information Theory 13 (2): 260–69.
Watson, Geoffrey S. 1964. “Smooth Regression Analysis.” Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) 26 (4): 359–72. https://www.jstor.org/stable/25049340.
Werbos, Paul. 1974. “Beyond Regression:" New Tools for Prediction and Analysis in the Behavioral Sciences.” Ph. D. Dissertation, Harvard University.
Werbos, Paul J. 1982. “Applications of Advances in Nonlinear Sensitivity Analysis.” In System Modeling and Optimization, 762–70. Springer.
Windle, Jesse. 2023. BayesLogit: Bayesian Logistic Regression.” R package version 2.1.
Windle, Jesse, Nicholas G. Polson, and James G. Scott. 2014. “Sampling Polya-Gamma Random Variates: Alternate and Approximate Techniques.” arXiv. https://arxiv.org/abs/1405.0506.
Wojna, Zbigniew, Alex Gorban, Dar-Shyang Lee, Kevin Murphy, Qian Yu, Yeqing Li, and Julian Ibarz. 2017. “Attention-Based Extraction of Structured Information from Street View Imagery.” arXiv Preprint arXiv:1704.03549. https://arxiv.org/abs/1704.03549.
Wold, Herman. 1975/ed. “Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (NIPALS) Approach.” Journal of Applied Probability 12 (S1): 117–42.
Yaari, Menahem E. 1987. “The Dual Theory of Choice Under Risk.” Econometrica 55 (1): 95–115. https://www.jstor.org/stable/1911158.
Zeiler, Matthew D. 2012. ADADELTA: An Adaptive Learning Rate Method.” arXiv Preprint arXiv:1212.5701. https://arxiv.org/abs/1212.5701.
Zhang, Yichi, Anirban Datta, and Sudipto Banerjee. 2018. “Scalable Gaussian Process Classification with Pólya-Gamma Data Augmentation.” arXiv Preprint arXiv:1802.06383. https://arxiv.org/abs/1802.06383.