This is a draft of a book on computational statistics, which is being developed specifically for the master’s education in statistics at University of Copenhagen.

A solid mathematical background is assumed throughout, while the computer science prerequisites are more modest. To be specific, the reader is expected to have a reasonable command of mathematical analysis, linear algebra and mathematical statistics, as exemplified by maximum likelihood estimation of multivariate parameters and asymptotic properties of multivariate estimators. The reader is expected to have an understanding of what an algorithm is, how numerical computations differ from symbolic computations, and be able to write small computer programs.

The intention of the material is to serve as a pedagogical introduction to computational statistics. No claim is made that the material is comprehensive or even representative, nor does it purport computational statistics as a single coherent field with a unifying theoretical foundation. This introduction is driven by statistical examples with the unifying theme being an experimental approach to solving computational problems in statistics.

Contemporary challenges in computational statistics revolve around large scale computations, either because the amount of data is massive or because we want to apply ever more complicated and sophisticated models and methods for the analysis and visualization of data. The examples treated are all of a rather modest complexity compared to these challenges. This is deliberate! A solid understanding of how to solve simpler problems is seen as a prerequisite for solving complex problems. It is the hope that this material provides the reader with a foundation in computational statistics that will subsequently make him or her able to develop novel solutions to problems in computational statistics.

The book is based on R for several reasons. First of all the target audience of statisticians is expected to be familiar with R, and they should learn how to use their programming language in an optimal way. This includes knowledge of the infrastructure supported by R and RStudio for supporting good software development and for testing, benchmarking and profiling code. In addition, this infrastructure combined with R Markdown and bookdown makes it a bliss to write a book that systematically integrates code and software development into the theory. Finally, though the R implementation itself is a relatively slow interpreter, it is possible to write performant code by a proper use of R as a very high-level programming language.