Chapter 2 R programming

R is typically called a statistical programming language because it can not only conduct existing data analysis through the exiting routines but also allow a user to write their own program to conduct a new analysis.

2.1 Function

We may view a function as the minimum unit of R. R base includes all the functions that can be used to write more complex functions as building blocks. A function can be very simple and do a trivial task. For example, the function one.plus.one below simply does 1+1 in R.

one.plus.one <- function(){
  1+1
}

one.plus.one()
## [1] 2

test <- one.plus.one()
test
## [1] 2

From this example, we can observe the following.

one.plus.one is the name of the function.
The name is followed by <- meaning assign the analysis to it.
To construct a function, we need to use the R keyword function(). Note () is required.
The task the function does is included in a pair of braces { }.
To use the function, we need to type its name and () is also required.
The function can be used like any other R base functions.

2.1.1 Function with input

We can include the input in a function. For example, take a look at the function plus.one.

plus.one <- function(x){
  x + 1
}

plus.one(4)
## [1] 5
plus.one(x=7)
## [1] 8

In this function, we have a parameter x, often called an argument. It represents a specific input.
In using the function, we need to provide an input.

Default values can be used in writing a function. In this situation, when no input is used, the default values will be used.

plus.one <- function(x = 0){
  x + 1
}

plus.one()
## [1] 1
plus.one(4)
## [1] 5

When left empty, the default value 0 was used to do the calculation.

Many inputs can be used.

# Pythagorean equation
pythagorean <- function(a=6, b=8){
  sqrt(a^2 + b^2)
}

pythagorean()
## [1] 10
pythagorean(1, 1)
## [1] 1.414214
pythagorean(b=10, a=5)
## [1] 11.18034

With multiple inputs, the default order is the order of the parameters used in the function.
With the input name, the order does not matter. The function will match the input name.

2.1.2 Return values of a function

An R function typically takes the input, does the calculation, and returns the results. In a function, we can simply return a value using its name. This is equivalent to use the print function.

# Pythagorean equation
pythagorean <- function(a=6, b=8){
  c <- sqrt(a^2 + b^2)
  c
}

pythagorean()
## [1] 10

pythagorean.print <- function(a=6, b=8){
  c <- sqrt(a^2 + b^2)
  print(c)
}

pythagorean()
## [1] 10

For fancier output, one might use the function cat().

# Pythagorean equation
pythagorean <- function(a=6, b=8){
  c <- sqrt(a^2 + b^2)
  
  ## output 1
  cat("The input is a = ", a, ", b = ", b, "\n", sep="")
  cat("The ouput is c = ", c, "\n", sep="")
  
  ## output 2
  cat("The input is a = ", a, ", b = ", b, sep="")
  cat("The ouput is c = ", c, "\n", sep="")
  
  ## Output 3
  cat("The input is a = ", a, ", b = ", b, "\n")
  cat("The ouput is c = ", c, "\n")
}

out1 <- pythagorean()
## The input is a = 6, b = 8
## The ouput is c = 10
## The input is a = 6, b = 8The ouput is c = 10
## The input is a =  6 , b =  8 
## The ouput is c =  10

The function collects the values together.
\n means changing to a new line.
sep tells how to collect the values. Here, nothing was used.

2.1.2.1 Return multiple values

To return multiple values, we can use a list.

# Pythagorean equation
pythagorean <- function(a=6, b=8){
  c <- sqrt(a^2 + b^2)
  list(ainput = a, b = b, c = c)
}

pythagorean()
## $ainput
## [1] 6
## 
## $b
## [1] 8
## 
## $c
## [1] 10

In the list, the first a is the name of the output and the second a is the value. The name can be something else.
The list can have any R object, be a scalar, vector or matrix.

We often save the output of a function to an R object.

# Pythagorean equation
pythagorean <- function(a=6, b=8){
  c <- sqrt(a^2 + b^2)
  list(a = a, b = b, c = c)
}

pythagorean <- function(a=6, b=8){
  c <- sqrt(a^2 + b^2)
  list(a, b, c)
}


calculate1 <- pythagorean()

calculate1
## [[1]]
## [1] 6
## 
## [[2]]
## [1] 8
## 
## [[3]]
## [1] 10

calculate1$a
## NULL

calculate1[[2]]
## [1] 8

Since the returned output is a list, one can use operations working for a list.
By default, the output does not print when assigning to an object.
To print, one can put () around it.

# Pythagorean equation
pythagorean <- function(a=6, b=8){
  c <- sqrt(a^2 + b^2)
  list(a = a, b = b, c = c)
}

(calculate1 <- pythagorean())
## $a
## [1] 6
## 
## $b
## [1] 8
## 
## $c
## [1] 10

2.2 Control flow or flow of control

For more sophisticated programming, control flow is needed. Simply speaking, control flows concerns the order of tasks to execute or evaluate. The most useful control flow includes

Executing a set of statements only if some condition is met, e.g., use if and else.
Executing a set of statements zero or more times, until some condition is met, e.g., through a for or while loop.

2.2.1 if … else …

We can use if to control whether to execute a statement. This is done based on a boolean condition. If it is true, the program does one thing; if it is false, the program does nothing or something else.

In R, the boolean condition is usually formed by the following relational operators.

<: less than
>: greater than
<=: less than or equal
>=: greater than or equal
==: equal
!=: no equal

The relational operators, usually expressed as conditions, will return one of two possible values: TRUE or FALSE.

a <- 3
b <- 4

a < b
## [1] TRUE
a > b
## [1] FALSE
a <= b
## [1] TRUE
a >= b
## [1] FALSE
a == b
## [1] FALSE
a != b
## [1] TRUE

Logical operators can be used on multiple conditions.

!: not
x & y and x && y: both
x | y and x || y: either
xor(x,y): one is TRUE, opposite of |

a <- TRUE
b <- FALSE

!a
## [1] FALSE
a & b
## [1] FALSE
a | b
## [1] TRUE
xor(a, b)
## [1] TRUE

The conditional control can be used in R statements or a function. We show how to use it in a function.

compare.ab <- function(a=0, b=0){
   if (a > b){
    cat('a is greater than b', "\n")
  }else{ 
    cat('a is less than or equal to b', "\n")
  }
}

compare.ab(3, 4)
## a is less than or equal to b
compare.ab(5, 5)
## a is less than or equal to b

if ... else ... can be nested.

compare.ab <- function(a=0, b=0){
  if (a > b){
    cat('a is greater than b', "\n")
  }else if (a==b){
    cat('a is equal to b', "\n")
  }else{
    cat('a is less than b', "\n")
  }
}

compare.ab(3, 4)
## a is less than b
compare.ab(5, 5)
## a is equal to b
compare.ab(10, 5)
## a is greater than b

R also has a function ifelse that can be convenient in some situations.

a <- 3
b <- 4

ifelse(a > b, "a is larger than b", "a is smaller than or equal to b")
## [1] "a is smaller than or equal to b"

2.2.1.1 Practice

Check whether a year is a leap year. A leap year is exactly divisible by 4 except for century years (years ending with 00). The century year is a leap year only if it is perfectly divisible by 400.

Note that the R operator %% (modulo operation) can be used to find the remainder after division of one number by another (sometimes called modulus).

leapyear <- function(year){
  if (! (year %% 4==0)){
    cat(year, 'is not a leap year', "\n")
  }else if (year %% 100 == 0 & (!(year %% 400==0))){
    cat (year, 'is not a leap year', "\n")
  }else{
    cat(year, 'is a leap year')
  }
}

leapyear1 <- function(year){
  if (! (year %% 4==0) ){
    cat(year, 'is not a leap year', "\n")
  }else {
    if (year %% 100 == 0){
      if (year %% 400 == 0){
        cat(year, 'is a leap year', "\n")
      }else{
        cat(year, 'is not a leap year', "\n")
      }
    }else{
      cat(year, 'is a leap year')
    }
  }
}

2.2.2 for

for or for loop can be used to repeat tasks. In R, it is used in the following way, for (var in seq){ tasks }. It means for each var in a set of values seq, R carries out the tasks in {}.

for (i in 1:500){
  cat("a")
}
## aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

for (i in 1:5){
  cat(i^2, "\n")
}
## 1 
## 4 
## 9 
## 16 
## 25


for (age in c(10,12,11,13,20)){
  cat(age*12, "\n")
}
## 120 
## 144 
## 132 
## 156 
## 240

ageyear <- c(10,12,11,13,20)

for (age in ageyear){
  cat(age*12, "\n")
}
## 120 
## 144 
## 132 
## 156 
## 240

for (i in 1:5){
  cat(ageyear[i]*12, "\n")
}
## 120 
## 144 
## 132 
## 156 
## 240


a <- NULL
for (i in 1:5){
  #cat(i, "\n")
  a <- c(a, i)
  cat(a, "\n")
}
## 1 
## 1 2 
## 1 2 3 
## 1 2 3 4 
## 1 2 3 4 5
a
## [1] 1 2 3 4 5

Multiple for loops can be nested.

for (i in 1:5){
  for (j in 1:i){
    cat(j)
  }
  cat("\n")
}
## 1
## 12
## 123
## 1234
## 12345

for (i in 1:5){
  for (j in 1:i){
    cat(j)
    
  }
  cat("\n")
}
## 1
## 12
## 123
## 1234
## 12345


for (i in 1:5){
  for (j in 1:i){
    for (k in 1:3){
      cat(i, "+", j, "+", k, "=", i+j+k, "\n")
    }
  }
}
## 1 + 1 + 1 = 3 
## 1 + 1 + 2 = 4 
## 1 + 1 + 3 = 5 
## 2 + 1 + 1 = 4 
## 2 + 1 + 2 = 5 
## 2 + 1 + 3 = 6 
## 2 + 2 + 1 = 5 
## 2 + 2 + 2 = 6 
## 2 + 2 + 3 = 7 
## 3 + 1 + 1 = 5 
## 3 + 1 + 2 = 6 
## 3 + 1 + 3 = 7 
## 3 + 2 + 1 = 6 
## 3 + 2 + 2 = 7 
## 3 + 2 + 3 = 8 
## 3 + 3 + 1 = 7 
## 3 + 3 + 2 = 8 
## 3 + 3 + 3 = 9 
## 4 + 1 + 1 = 6 
## 4 + 1 + 2 = 7 
## 4 + 1 + 3 = 8 
## 4 + 2 + 1 = 7 
## 4 + 2 + 2 = 8 
## 4 + 2 + 3 = 9 
## 4 + 3 + 1 = 8 
## 4 + 3 + 2 = 9 
## 4 + 3 + 3 = 10 
## 4 + 4 + 1 = 9 
## 4 + 4 + 2 = 10 
## 4 + 4 + 3 = 11 
## 5 + 1 + 1 = 7 
## 5 + 1 + 2 = 8 
## 5 + 1 + 3 = 9 
## 5 + 2 + 1 = 8 
## 5 + 2 + 2 = 9 
## 5 + 2 + 3 = 10 
## 5 + 3 + 1 = 9 
## 5 + 3 + 2 = 10 
## 5 + 3 + 3 = 11 
## 5 + 4 + 1 = 10 
## 5 + 4 + 2 = 11 
## 5 + 4 + 3 = 12 
## 5 + 5 + 1 = 11 
## 5 + 5 + 2 = 12 
## 5 + 5 + 3 = 13

for (i in 1:5){
  for (j in 1:i){
    cat(i, "*", j,  "=", i*j, "   ", sep="")
  }
  cat("\n")
}
## 1*1=1   
## 2*1=2   2*2=4   
## 3*1=3   3*2=6   3*3=9   
## 4*1=4   4*2=8   4*3=12   4*4=16   
## 5*1=5   5*2=10   5*3=15   5*4=20   5*5=25


for (i in sample(1:5)){
  for (j in sample(1:5)){
    cat(i, "+", j,  "=", "   ", sep="")
  }
  cat("\n")
}
## 2+5=   2+3=   2+1=   2+2=   2+4=   
## 5+3=   5+2=   5+5=   5+1=   5+4=   
## 4+4=   4+1=   4+5=   4+3=   4+2=   
## 1+2=   1+1=   1+3=   1+4=   1+5=   
## 3+1=   3+2=   3+5=   3+4=   3+3=

2.2.2.1 Practice

Generate a multiplication table.

How about a table with only the lower triangular information?

2.2.3 while

while can be kindly considered as a combination of if condition and for loop. It repeats certain tasks until a condition is met.

max.n <- 5
n <- 1
while (n <= max.n){
  cat(n, "\n")
  n <- n + 1
}
## 1 
## 2 
## 3 
## 4 
## 5

2.2.3.1 An example of bisection method

The bisection method is a way to find the root of an equation. Suppose that we are interested in finding the value of x so that \(f(x) = 3x^3 - 2x^2 - x - 4 = 0\).

The bisection method follows the algorithm below. We search a potential root in the interval [a, b] where a and b are chosen so that the function values f(a) and f(b) are of opposite sign. This ensures that at least one zero cross the interval). Now we can perform the following tasks repeatedly.

Calculate \(c\), the midpoint of the interval, \(c = (a+b)/2\).
Calculate the function value at the midpoint, \(f(c)\).
If convergence is satisfactory (e.g., |f(c)| is sufficiently close to zero), stop iterating.
Othereise, examine the sign of f(c) and replace either (a, f(a)) or (b, f(b)) with (c, f(c)) so that there is a zero crossing within the new interval.

## the f function
f <- function(x) 3*x^3 - 2*x^2 - x - 4

bisect <- function(L, U, max.n=50, tol = 0.00001){
  i <- 1
  stop.value <- 1
  while(i <= max.n & stop.value > tol){
    if (f(L)*f(U) > 0){
      stop('f(L) and f(U) have to have opposite sign')
    }else{
      x <- (L + U)/2
    }
    
    if (f(x)*f(L) <0 ){
      U <- x
    }else{
      L <- x
    }
    
    stop.value <- abs(f(x))
    i <- i + 1
  }
  
  if (stop.value > tol) cat("Warning: the result might not be accurate. max number of iterations is exceeded")
  
  list(x=x, n.iter=i, f=stop.value)
}

bisect(0, 2, max=14)
## Warning: the result might not be accurate. max number of iterations is exceeded
## $x
## [1] 1.490601
## 
## $n.iter
## [1] 15
## 
## $f
## [1] 0.001471286

2.2.4 Get the source code of R functions

The best way to learn to write an R function is to learn from R code. For many functions in R, if we type its name without () after it, its source code will be printed.

rowMeans
## standardGeneric for "rowMeans" defined from package "base"
## 
## function (x, na.rm = FALSE, dims = 1, ...) 
## standardGeneric("rowMeans")
## <bytecode: 0x000000000edd96e0>
## <environment: 0x000000000edbf490>
## Methods may be defined for arguments: x, na.rm, dims
## Use  showMethods("rowMeans")  for currently available ones.

This does not work for every function such as mean.

mean
## standardGeneric for "mean" defined from package "base"
## 
## function (x, ...) 
## standardGeneric("mean")
## <environment: 0x0000000025cb3888>
## Methods may be defined for arguments: x
## Use  showMethods("mean")  for currently available ones.

This is because mean is a generic method function. Depending on the type of input, it might do different things. To view its code, we would need to figure out which particular type of method is used.

A method function uses the same name for functions that do different tasks. Depending on the type of the data used, a method function behaves accordingly. See a very simple example below.

a <- 1:5
class(a) <- "type1"
b <- 1:5
class(b) <- "type2"

mean.type1 <- function(x){
  min(x)
}

mean.type2 <- function(x){
  max(x)
}

To find the available methods for a function, we use the methods function.

methods(mean)
##  [1] mean,ANY-method          mean,big.matrix-method   mean,Matrix-method      
##  [4] mean,sparseMatrix-method mean,sparseVector-method mean.Date               
##  [7] mean.default             mean.difftime            mean.ff                 
## [10] mean.POSIXct             mean.POSIXlt             mean.quosure*           
## [13] mean.type1               mean.type2              
## see '?methods' for accessing help and source code

Note that different mean methods are available. To view the source code, various ways can be used.

getAnywhere(mean.default)
## A single object matching 'mean.default' was found
## It was found in the following places
##   package:base
##   registered S3 method for mean from namespace base
##   namespace:base
## with value
## 
## function (x, trim = 0, na.rm = FALSE, ...) 
## {
##     if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
##         warning("argument is not numeric or logical: returning NA")
##         return(NA_real_)
##     }
##     if (na.rm) 
##         x <- x[!is.na(x)]
##     if (!is.numeric(trim) || length(trim) != 1L) 
##         stop("'trim' must be numeric of length one")
##     n <- length(x)
##     if (trim > 0 && n) {
##         if (is.complex(x)) 
##             stop("trimmed means are not defined for complex data")
##         if (anyNA(x)) 
##             return(NA_real_)
##         if (trim >= 0.5) 
##             return(stats::median(x, na.rm = FALSE))
##         lo <- floor(n * trim) + 1
##         hi <- n + 1 - lo
##         x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
##     }
##     .Internal(mean(x))
## }
## <bytecode: 0x0000000019ec1ec8>
## <environment: namespace:base>

base:::mean.default
## function (x, trim = 0, na.rm = FALSE, ...) 
## {
##     if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
##         warning("argument is not numeric or logical: returning NA")
##         return(NA_real_)
##     }
##     if (na.rm) 
##         x <- x[!is.na(x)]
##     if (!is.numeric(trim) || length(trim) != 1L) 
##         stop("'trim' must be numeric of length one")
##     n <- length(x)
##     if (trim > 0 && n) {
##         if (is.complex(x)) 
##             stop("trimmed means are not defined for complex data")
##         if (anyNA(x)) 
##             return(NA_real_)
##         if (trim >= 0.5) 
##             return(stats::median(x, na.rm = FALSE))
##         lo <- floor(n * trim) + 1
##         hi <- n + 1 - lo
##         x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
##     }
##     .Internal(mean(x))
## }
## <bytecode: 0x0000000019ec1ec8>
## <environment: namespace:base>

getS3method('mean', 'default')
## function (x, trim = 0, na.rm = FALSE, ...) 
## {
##     if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
##         warning("argument is not numeric or logical: returning NA")
##         return(NA_real_)
##     }
##     if (na.rm) 
##         x <- x[!is.na(x)]
##     if (!is.numeric(trim) || length(trim) != 1L) 
##         stop("'trim' must be numeric of length one")
##     n <- length(x)
##     if (trim > 0 && n) {
##         if (is.complex(x)) 
##             stop("trimmed means are not defined for complex data")
##         if (anyNA(x)) 
##             return(NA_real_)
##         if (trim >= 0.5) 
##             return(stats::median(x, na.rm = FALSE))
##         lo <- floor(n * trim) + 1
##         hi <- n + 1 - lo
##         x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
##     }
##     .Internal(mean(x))
## }
## <bytecode: 0x0000000019ec1ec8>
## <environment: namespace:base>

The new R packages use the S4 method for programming. For example, let’s try to view the source code of the function chol2inv of the package Matrix.

library(Matrix)
chol2inv
## standardGeneric for "chol2inv" defined from package "base"
## 
## function (x, ...) 
## standardGeneric("chol2inv")
## <bytecode: 0x000000000ed1ce90>
## <environment: 0x000000000ed16810>
## Methods may be defined for arguments: x
## Use  showMethods("chol2inv")  for currently available ones.

Based on the information, we find the available methods.

showMethods("chol2inv")
## Function: chol2inv (package base)
## x="ANY"
## x="CHMfactor"
## x="denseMatrix"
## x="diagonalMatrix"
## x="dtrMatrix"
## x="sparseMatrix"

To see the code, we can use

getAnywhere(sparseMatrix)
## A single object matching 'sparseMatrix' was found
## It was found in the following places
##   package:Matrix
##   namespace:Matrix
## with value
## 
## function (i = ep, j = ep, p, x, dims, dimnames, symmetric = FALSE, 
##     triangular = FALSE, index1 = TRUE, giveCsparse = TRUE, check = TRUE, 
##     use.last.ij = FALSE) 
## {
##     if ((m.i <- missing(i)) + (m.j <- missing(j)) + (m.p <- missing(p)) != 
##         1) 
##         stop("exactly one of 'i', 'j', or 'p' must be missing from call")
##     if (!m.p) {
##         p <- as.integer(p)
##         if ((lp <- length(p)) < 1 || p[1] != 0 || any((dp <- p[-1] - 
##             p[-lp]) < 0)) 
##             stop("'p' must be a non-decreasing vector (0, ...)")
##         ep <- rep.int(seq_along(dp), dp)
##     }
##     i1 <- as.logical(index1)[1]
##     i <- as.integer(i + !(m.i || i1))
##     j <- as.integer(j + !(m.j || i1))
##     dims.min <- suppressWarnings(c(max(i), max(j)))
##     if (anyNA(dims.min)) 
##         stop("NA's in (i,j) are not allowed")
##     if (missing(dims)) {
##         dims <- if (symmetric || triangular) 
##             rep(max(dims.min), 2)
##         else dims.min
##     }
##     else {
##         stopifnot(all(dims >= dims.min))
##         dims <- as.integer(dims)
##     }
##     if (symmetric && triangular) 
##         stop("Both 'symmetric' and 'triangular', i.e. asking for diagonal matrix.  Use 'Diagonal()' instead")
##     sx <- if (symmetric) {
##         if (dims[1] != dims[2]) 
##             stop("symmetric matrix must be square")
##         "s"
##     }
##     else if (triangular) {
##         if (dims[1] != dims[2]) 
##             stop("triangular matrix must be square")
##         "t"
##     }
##     else "g"
##     isPat <- missing(x)
##     kx <- if (isPat) 
##         "n"
##     else .M.kind(x)
##     r <- new(paste0(kx, sx, "TMatrix"))
##     r@Dim <- dims
##     if (symmetric && all(i >= j)) 
##         r@uplo <- "L"
##     else if (triangular) {
##         r@uplo <- if (all(i >= j)) 
##             "L"
##         else if (all(i <= j)) 
##             "U"
##         else stop("triangular matrix must have all i >= j or i <= j")
##     }
##     if (!isPat) {
##         if (kx == "d" && !is.double(x)) 
##             x <- as.double(x)
##         if (length(x) != (n <- length(i))) {
##             if (length(x) != 1 && n%%length(x) != 0) 
##                 warning("length(i) is not a multiple of length(x)")
##             x <- rep_len(x, n)
##         }
##         if (use.last.ij && (id <- anyDuplicated(cbind(i, j), 
##             fromLast = TRUE))) {
##             i <- i[-id]
##             j <- j[-id]
##             x <- x[-id]
##             if (any(idup <- duplicated(cbind(i, j), fromLast = TRUE))) {
##                 ndup <- -which(idup)
##                 i <- i[ndup]
##                 j <- j[ndup]
##                 x <- x[ndup]
##             }
##         }
##         r@x <- x
##     }
##     r@i <- i - 1L
##     r@j <- j - 1L
##     if (!missing(dimnames)) 
##         r@Dimnames <- .fixupDimnames(dimnames)
##     if (check) 
##         validObject(r)
##     if (giveCsparse) 
##         as(r, "CsparseMatrix")
##     else r
## }
## <bytecode: 0x000000001d1bdf90>
## <environment: namespace:Matrix>

getMethod("chol2inv", "diagonalMatrix")
## Method Definition:
## 
## function (x, ...) 
## {
##     chk.s(..., which.call = -2)
##     tcrossprod(solve(x))
## }
## <bytecode: 0x000000001b9b1cc0>
## <environment: namespace:Matrix>
## 
## Signatures:
##         x               
## target  "diagonalMatrix"
## defined "diagonalMatrix"

2.3 Packages

A package consists of R functions for carrying out particular tasks. With the R function, it is very easy to pack it as a package.

We now use the function for the bisection method as an example to build a package. There are additional utilities available for building R package. But I will use the base R to demonstrate this.

2.3.1 Set a new package structure.

The function package.skeleton automates some of the setup for a new package. It creates directories, saves functions, data, and R code files to appropriate places, and creates skeleton help files and a ‘Read-and-delete-me’ file describing further steps in packaging. For example, we first save the R code to a file called bisect.R.

package.skeleton(name = "bisection", 
                 code_files = 'bisect.R')

By running the code, a folder bisection was created within which there are following information:

DESCRIPTION file: the basic information about the R package
man folder: the manual for each R function
- bisect.Rd: manual for the bisect function
- f.Rd: manual for the f function
- bisection-package.Rd: general information about the package
NAMESPACE file:
R folder: include all the R functions
- bisect.R
Read-and-delete-me: provides basic instruction and should be removed before building the package.

2.3.2 Edit the files accordingly

In this step, we open each file and edit/add information accordingly.

2.3.3 Build and check the package

After all the edits, we can compress all the files and folders into a single R source file. This is done in the console using the command R CMD build bisection. For our example, it is called bisection_1.0.tar.gz.

$ R CMD build bisection
* checking for file 'bisection/DESCRIPTION' ... OK
* preparing 'bisection':
* checking DESCRIPTION meta-information ... OK
* installing the package to process help pages
* saving partial Rd database
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'bisection_1.0.tar.gz'

To check potential errors or compatible problems, we can use R CMD check bisection_1.0.tar.gz. If there is any error or warning information, we should correct them.

$ R CMD check bisection_1.0.tar.gz
* using log directory 'G:/My Drive/Classes/RDataScience/2 R programming/bisection.Rcheck'
* using R version 3.5.2 (2018-12-20)
* using platform: x86_64-w64-mingw32 (64-bit)
* using session charset: ISO8859-1
* checking for file 'bisection/DESCRIPTION' ... OK
* checking extension type ... Package
* this is package 'bisection' version '1.0'
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking whether package 'bisection' can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking examples ... OK
* checking PDF version of manual ... OK
* DONE

Status: OK

2.3.4 Submit the package to R CRAN

After testing to ensure everything to work as expected, we can then request to publish the package on R CRAN. To do so, one can use the web form here: https://cran.r-project.org/submit.html.

2.4 R coding styles

Using a consistent coding style helps not only the communication of the code but also the author better organize own code. For R, there are no general requirements on the style of coding. But over the years, some have suggested some common styles to use such as those recommended by bioconductor or Hadley Wickhams. Based on these recommendations, I compiled the following guidelines.

2.4.1 Naming

In R, we frequently have to name an R object, be the name of a function or just a scalar. First of all, a name can be a combination of letters, digits, period (.) and underscore (_). It must start with a letter or a period. If it starts with a period, it cannot be followed by a digit. For example, some valid names are a,.var, another.variable, day5, and V_3.

In naming something, it is important to be informative and meaningful. Therefore, often we would use multiple words in a name. Because space is not allowed in a name, how to connect the words is important.

Basically, there are 5 naming conventions to choose from:

alllowercase: e.g. adjustcolor
period.separated: e.g. plot.new
underscore_separated: e.g. numeric_version
lowerCamelCase: e.g. addTaskCallback
UpperCamelCase: e.g. SignatureMethod

Earlier versions of R used underscore (_) as an assignment operator. Therefore, the period (.) was used extensively in variable names having multiple words. Newer versions of R support underscore as a valid identifier and therefore, it is increasingly used now, e.g., in tidyverse packages. bioconductor, on the other hand, has more rules for naming:

variable names: use the lowerCamelCase, initial lowercase, then alternate case between words.
function names: use the lowerCamelCase. For non-exported functions, start with a ‘.’.
class names: use the UpperCamelCase

2.4.2 Indentation

It is typically suggested to use 2 or 4 spaces instead of a tab for indenting. Also, indent at the same level as surrounding code.

2.4.3 Space

Always use space after a comma such as a, b, c=5.
No space around “=” when using named arguments to functions such as bisect(L=0, U=2).
Space around all relational and logical operators such as a == b.

Practical Data Processing for Social and Behavioral Research Using R