A.5 Exercises


Exercise A.1 Explain the result of evaluating the following R expression.
(0.1 + 0.1 + 0.1) > 0.3
## [1] TRUE

Exercise A.2 Write a function that takes a numeric vector x and a threshold value h as arguments and returns the vector of all values in x greater than h. Test the function on seq(0, 1, 0.1) with threshold 0.3. Have the example from Exercise A.1 in mind.

Exercise A.3 Investigate how your function from Exercise A.2 treats missing values (NA), infinite values (Inf and -Inf) and the special value “Not a Number” (NaN). Rewrite your function (if necessary) to exclude all or some of such values from x.

Hint: The functions is.na, is.nan and is.finite are useful.

Histograms with non-equidistant breaks

The following three exercises will use a data set consisting of measurements of infrared emissions from objects outside of our galax. We will focus on the variable F12, which is the total 12 micron band flux density.

infrared <- read.table("data/infrared.txt", header = TRUE)
F12 <- infrared$F12

The purpose of this exercise is two-fold. First, you will get familiar with the data and see how different choices of visualizations using histograms can affect your interpretation of the data. Second, you will learn more about how to write functions in R and gain a better understanding of how they work.

Exercise A.4 Plot a histogram of log(F12) using the default value of the argument breaks. Experiment with alternative values of breaks.

Exercise A.5 Write your own function, called my_breaks, which takes two arguments, x (a vector) and h (a positive integer). Let h have default value 5. The function should first sort x into increasing order and then return the vector that: starts with the smallest entry in x; contains every \(h\)th unique entry from the sorted x; ends with the largest entry in x.

For example, if h = 2 and x = c(1, 3, 2, 5, 10, 11, 1, 1, 3) the function should return c(1, 3, 10, 11). To see this, first sort x, which gives the vector c(1, 1, 1, 2, 3, 3, 5, 10, 11), whose unique values are c(1, 2, 3, 5, 10, 11). Every second unique entry is c(1, 3, 10), and then the largest entry 11 is concatenated.

Hint: The functions sort and unique can be useful.

Use your function to construct breakpoints for the histogram for different values of h, and compare with the histograms obtained in Exercise A.4.

Exercise A.6 If there are no ties in the data set, the function above will produce breakpoints with h observations in the interval between two consecutive breakpoints (except the last two perhaps). If there are ties, the function will by construction return unique breakpoints, but there may be more than h observations in some intervals.

The intention is now to rewrite my_breaks so that if possible each interval contains h observations.

Modify the my_breaks function with this intention and so that is has the following properties:

  • All breakpoints must be unique.
  • The range of the breakpoints must cover the range of x.
  • For two subsequent breakpoints, \(a\) and \(b\), there must be at least h observations in the interval \((a,b],\) provided h < length(x). (With the exception that for the first two breakpoints, the interval is \([a,b].\))

Functions and objects

The following exercises build on having implemented a function that computes breakpoints for a histogram either as in Exercise A.5 or as in Exercise A.6.

Exercise A.7 Write a function called my_hist, which takes a single argument h and plots a histogram of log(F12). Extend the implementation so that any additional argument specified when calling my_hist is passed on to hist. Investigate and explain what happens when executing the following function calls.
my_hist(h = 5, freq = TRUE)
my_hist(h = 0)

Exercise A.8 Modify your my_hist function so that it returns an object of class my_histogram, which is not plotted. Write a print method for objects of this class, which prints just the number of cells.

Hint: It can be useful to know about the function cat.

How can you assign a class label to the returned object so that it is printed using your new print method, but it is still plotted as a histogram when given as argument to plot?
Exercise A.9 Write a summary method that returns a data frame with two columns containing the midpoints of the cells and the counts.
Exercise A.10 Write a new plot method for objects of class my_histogram that uses ggplot2 for plotting the histogram.

Functions and environments

The following exercises assume that you have implemented a my_hist function as in Exercise A.7.

Exercise A.11 What happens if you remove that data and call my_hist subsequently? What is the environment of my_hist? Change it to a new environment, and assign (using the function assign) the data to a variable with an appropriate name in that environment. Once this is done, check what now happens when calling my_hist after the data is removed from the global environment.

Exercise A.12 Write a function that takes an argument x (the data) and returns a function, where the returned function takes an argument h (just as my_hist) and plots a histogram (just as my_hist). Because the return value is a function, we may refer to the function as a function factory.

What is the environment of the function created by the function factory? What is in the environment? Does it have any effect when calling the function whether the data is altered or removed from the global environment?

Exercise A.13 Evaluate the following function call:

tmp <- my_hist(10, plot = FALSE)

What is the type and class of tmp? What happens when plot(tmp, col = "red") is executed? How can you find help on what plot does with an object of this class? Specifically, how do you find the documentation for the argument col, which is not an argument of plot?