## A.5 Exercises

### Functions

**Exercise A.1**Explain the result of evaluating the following R expression.

`## [1] TRUE`

**Exercise A.2 **Write a function that takes a numeric vector `x`

and a threshold value `h`

as arguments and returns the vector of all values in `x`

greater than `h`

.
Test the function on `seq(0, 1, 0.1)`

with threshold 0.3. Have the example
from Exercise A.1 in mind.

**Exercise A.3 **Investigate how your function from Exercise A.2
treats missing values (`NA`

), infinite values
(`Inf`

and `-Inf`

) and the special value “Not a Number” (`NaN`

). Rewrite your
function (if necessary) to exclude all or some of such values from `x`

.

*Hint: The functions is.na, is.nan and is.finite are useful.*

### Histograms with non-equidistant breaks

The following three exercises will use a data set consisting of measurements of infrared emissions from objects outside of our galax. We will focus on the variable F12, which is the total 12 micron band flux density.

The purpose of this exercise is two-fold. First, you will get familiar with the data and see how different choices of visualizations using histograms can affect your interpretation of the data. Second, you will learn more about how to write functions in R and gain a better understanding of how they work.

**Exercise A.4 **Plot a histogram of `log(F12)`

using the default value of the argument `breaks`

. Experiment with alternative values of `breaks`

.

**Exercise A.5 **Write your own function, called `my_breaks`

, which takes two arguments, `x`

(a vector) and `h`

(a positive integer). Let `h`

have default value `5`

. The function should first sort
`x`

into increasing order and then return the vector that: starts with the smallest entry in `x`

;
contains every \(h\)th unique entry from the sorted `x`

; ends with the largest entry in `x`

.

For example, if `h = 2`

and `x = c(1, 3, 2, 5, 10, 11, 1, 1, 3)`

the function should return `c(1, 3, 10, 11)`

. To see this, first sort `x`

, which gives the vector `c(1, 1, 1, 2, 3, 3, 5, 10, 11)`

, whose unique
values are `c(1, 2, 3, 5, 10, 11)`

. Every second unique entry is `c(1, 3, 10)`

, and then the largest entry `11`

is concatenated.

*Hint: The functions sort and unique can be useful.*

Use your function to construct *breakpoints* for the histogram for different values of `h`

, and compare with the histograms obtained in Exercise A.4.

**Exercise A.6 **If there are no ties in the data set, the function above will produce breakpoints
with `h`

observations in the interval between two consecutive breakpoints
(except the last two perhaps). If there are ties, the function will by construction
return unique breakpoints, but there may be
more than `h`

observations in some intervals.

*The intention is now to rewrite my_breaks so that if possible each interval
contains h observations.*

Modify the `my_breaks`

function with this intention and so that is has the
following properties:

- All breakpoints must be unique.
- The range of the breakpoints must cover the range of
`x`

. - For two subsequent breakpoints, \(a\) and \(b\), there must be at least
`h`

observations in the interval \((a,b],\) provided`h < length(x)`

. (With the exception that for the first two breakpoints, the interval is \([a,b].\))

### Functions and objects

The following exercises build on having implemented a function that computes breakpoints for a histogram either as in Exercise A.5 or as in Exercise A.6.

**Exercise A.7**Write a function called

`my_hist`

, which takes a single argument `h`

and plots a
histogram of `log(F12)`

. Extend
the implementation so that any additional argument specified when calling `my_hist`

is passed on to `hist`

. Investigate and explain what happens when executing
the following function calls.
**Exercise A.8 **Modify your `my_hist`

function so that it returns an object of class `my_histogram`

,
which is not plotted. Write a print method for objects of this class,
which prints just the number of cells.

*Hint: It can be useful to know about the function cat.*

`plot`

?
**Exercise A.9**Write a

`summary`

method that returns a data frame with two columns containing the midpoints of the cells and the counts.
**Exercise A.10**Write a new

`plot`

method for objects of class `my_histogram`

that uses `ggplot2`

for plotting the histogram.
### Functions and environments

The following exercises assume that you have implemented a `my_hist`

function
as in Exercise A.7.

**Exercise A.11 **What happens if you remove that data and call `my_hist`

subsequently?
What is the environment of `my_hist`

? Change it to a new environment, and assign
(using the function `assign`

) the data to a
variable with an appropriate name in that environment. Once this is done,
check what now happens when calling `my_hist`

after
the data is removed from the global environment.

**Exercise A.12 **Write a function that takes an argument `x`

(the data) and
returns a function, where the returned function
takes an argument `h`

(just as `my_hist`

) and plots a histogram (just as `my_hist`

).
Because the return value is a function, we may refer to the function
as a function factory.

What is the environment of the function created by the function factory? What is in the environment? Does it have any effect when calling the function whether the data is altered or removed from the global environment?

**Exercise A.13 **Evaluate the following function call:

What is the type and class of `tmp`

? What happens when `plot(tmp, col = "red")`

is executed? How can you find help on what `plot`

does with an
object of this class? Specifically, how do you find the documentation for the
argument `col`

, which is not an argument of `plot`

?