\(\DeclareMathOperator*{\argmin}{argmin}\) \(\DeclareMathOperator*{\argmax}{argmax}\) \(\newcommand{\E}{\mathbf{E}}\) \(\newcommand{\V}{\mathbf{Var}}\) \(\newcommand{\cov}{\mathbf{Cov}}\) \(\newcommand{\P}{\mathbf{P}}\)

Preface

This book was developed for a graduate level course in computational statistics. It is used as the primary material for such a course within the MSc program in statistics at University of Copenhagen.

The book assumes a mathematical background, and the reader is expected to have a reasonable command of mathematical analysis, linear algebra and mathematical statistics – exemplified by the theory of maximum likelihood estimation of multivariate parameters and asymptotic properties of multivariate estimators. The reader is also expected to have an understanding of what an algorithm is, how numerical computations differ from symbolic computations, and to be able to write small computer programs.

The material covered is not supposed to be a comprehensive treatment of computational statistics. It is intended to be a pedagogical introduction to some core aspects of computational statistics that bridges the gap between theory and implementation. The presentation is driven by a selection of statistical examples and their computational challenges. The examples are tied together by practical and experimental approaches to solving these computational challenges.

Contemporary research in computational statistics revolves around large scale computations, either because the amount of data is massive or because we want to apply ever more complicated and sophisticated models and methods for the analysis and visualization of data. Compared to these research challenges, the examples treated in this book are of a modest complexity. They serve as a means to learn the fundamental computational craftsmanship that is needed when more complex problems are to be solved.

The book is based on R for several reasons. First of all, the target audience of statisticians is expected to be familiar with R, and they should learn how to use their programming language in an optimal way. This includes knowledge of the infrastructure offered by R and RStudio that supports good software development. In addition, this infrastructure was used extensively, in combination with R Markdown and bookdown, for writing this book, which systematically integrates code and software development with the theory. Finally, it is possible to write efficient R code by a proper use of R as a high-level programming language or by interfacing compiled code via the Rcpp package. Statisticians who program in R should master these skills.

1 Introduction