\(\DeclareMathOperator*{\argmin}{argmin}\) \(\DeclareMathOperator*{\argmax}{argmax}\) \(\newcommand{\E}{\mathbf{E}}\) \(\newcommand{\V}{\mathbf{Var}}\) \(\newcommand{\cov}{\mathbf{Cov}}\) \(\newcommand{\P}{\mathbf{P}}\)

References

Demmler, A., and C. Reinsch. 1975. “Oscillation Matrices with Spline Smoothing.” Numerische Mathematik 24 (5): 375–82.
Deng, Henry, and Hadley Wickham. 2011. “Density Estimation with R.” https://vita.had.co.nz/papers/density-estimation.pdf.
Devroye, Luc, Laszlo Gyorfi, Adam Krzyzak, and Gabor Lugosi. 1994. “On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates.” The Annals of Statistics 22 (3): 1371–85.
Evans, Lawrence C., and Ronald F. Gariepy. 2015. Measure Theory and Fine Properties of Functions. CRC Press.
Fernandes, Kelwin, Pedro Vinagre, Paulo Cortez, and Pedro Sernadela. 2015. Online News Popularity.” UCI Machine Learning Repository.
Fernandes, K., P. Vinagre, and P. Cortez. 2015. “A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News.” In Proceedings If the 17th EPIA 2015 - Portuguese Conference on Artificial Intelligence.
Gall, J. F. L. 2022. Measure Theory, Probability, and Stochastic Processes. Graduate Texts in Mathematics. Springer International Publishing.
Gilks, W. R., and P. Wild. 1992. “Adaptive Rejection Sampling for Gibbs Sampling.” Journal of the Royal Statistical Society. Series C (Applied Statistics) 41 (2): 337–48.
Goh, Gabriel. 2017. “Why Momentum Really Works.” Distill. https://doi.org/10.23915/distill.00006.
Gürbüzbalaban, M., A. Ozdaglar, and P. A. Parrilo. 2019. “Why Random Reshuffling Beats Stochastic Gradient Descent.” Mathematical Programming.
Kiefer, Nicholas M. 1978. “Discrete Parameter Variation: Efficient Estimation of a Switching Regression Model.” Econometrica 46 (2): 427–34.
Kingma, Diederik P., and Jimmy Ba. 2014. “Adam: A Method for Stochastic Optimization.” In Proceedings of the 3rd International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1412.6980.
Knuth, Donald E. 1974. “Structured Programming with Go to Statements.” ACM Comput. Surv. 6 (4): 261–301. https://doi.org/10.1145/356635.356640.
L’Ecuyer, Pierre, and Richard Simard. 2007. “TestU01: A c Library for Empirical Testing of Random Number Generators.” ACM Trans. Math. Softw. 33 (4): 22:1–40. https://doi.org/10.1145/1268776.1268777.
Lai, Tze Leung. 2003. “Stochastic Approximation: Invited Paper.” Ann. Statist. 31 (2): 391–406.
Lauritzen, Steffen. 2023. Fundamentals of Mathematical Statistics. Chapman & Hall.
Marsaglia, George. 2003. “Xorshift RNGs.” Journal of Statistical Software 8 (14): 1–6. https://doi.org/10.18637/jss.v008.i14.
Marsaglia, George, and Wai Wan Tsang. 2000. “A Simple Method for Generating Gamma Variables.” ACM Trans. Math. Softw. 26 (3): 363–72. https://doi.org/10.1145/358407.358414.
Nocedal, Jorge, and Stephen J. Wright. 2006. Numerical Optimization. Second. Springer Series in Operations Research and Financial Engineering. New York: Springer.
O’Neill, Melissa E. 2014. “PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation.” HMC-CS-2014-0905. Claremont, CA: Harvey Mudd College.
———. 2018. “Specific Problems with Other RNGs.” https://www.pcg-random.org/other-rngs.html.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. MIT Press.
Robbins, Herbert, and Sutton Monro. 1951. “A Stochastic Approximation Method.” Ann. Math. Statist. 22 (3): 400–407.
Sheather, S. J., and M. C. Jones. 1991. “A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation.” Journal of the Royal Statistical Society. Series B (Methodological) 53 (3): 683–90.
Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. Chapman; Hall/CRC.
Tsybakov, A. B. 2009. Introduction to Nonparametric Estimation. Springer Series in Statistics. Springer, New York.
Vinther, B. M., K. K. Andersen, P. D. Jones, K. R. Briffa, and J. Cappelen. 2006. “Extending Greenland Temperature Records into the Late Eighteenth Century.” Journal of Geophysical Research: Atmospheres 111 (D11).
Wickham, Hadley. 2019. Advanced r. Second. Chapman; Hall/CRC. https://doi.org/10.1201/9781351201315.
Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science. O’Reilly Media, Inc.
Widrow, Bernard, and Marcian E. Hoff. 1960. “Adaptive Switching Circuits.” Stanford University.