How environments work in R and what lazy evaluation is

08 January 2018

Knowledge of the way how R evaluates expressions is crucial to avoid hours of staring at the screen or hitting unexpected and difficult bugs.

We’ll start with an example of an issue I came accross a few months ago when using the purrr::map function. To simplify, the issue I had:

⊕
wat

makePrintFunction <- function(index) {
  function() {
    print(index)
  }
}

printFunctions <- lapply(1:3, function(i) makePrintFunction(i))
printFunctions[[2]]()
# 2

printFunctions <- purrr::map(1:3,
                             function(i) makePrintFunction(i))
printFunctions[[2]]()
# 3

Since I came across the issue, purrr::map has changed and this example no longer applies. To simulate it, let’s use a simplified implementation of map function. You should be able to just copy-paste the code in this article and run it:

map <- function(range, functionToApply) {
  result <- vector("list", length(range))
  for (i in range) {
    result[[i]] <- functionToApply(i)
  }
  return(result)
}

makePrintFunction <- function(index) {
  function() {
    print(index)
  }
}

printFunctions <- lapply(1:3, function(i) makePrintFunction(i))
printFunctions[[2]]()
# 2

printFunctions <- map(1:3, function(i) makePrintFunction(i))
printFunctions[[2]]()
# 3

How to fix that?

If you don’t already know to fix that issue, you’ll quickly find out. This is quite a common problem and the solution is to use the force function as follows:

makePrintFunction <- function(index) {
  force(index)
  
  function() {
    print(index)
  }
}

printFunctions <- lapply(1:3, function(i) makePrintFunction(i))
printFunctions[[2]]()
# 2

printFunctions <- map(1:3, function(i) makePrintFunction(i))
printFunctions[[2]]()
# 2

It works! But … why?

This could be a great moment to just carry on – the problem is solved. You’ve heard about lazy evaluation and know that force() is useful in fixing such issues. But then again, what does lazy evaluation mean in this context?

Let’s take a look at the magical force function. It consists of two lines:

⊕
Huh?

force
# function (x) 
# x
# <bytecode: 0x18e0920>
# <environment: namespace:base>

Wait, what’s going on here? Does this mean that I can simply call index instead of force(index) and it will still work?

makePrintFunction <- function(index) {
  index
  
  function() {
    print(index)
  }
}

printFunctions <- lapply(1:3, function(i) makePrintFunction(i))
printFunctions[[2]]()
# 2

printFunctions <- map(1:3, function(i) makePrintFunction(i))
printFunctions[[2]]()
# 2

Let’s get to the bottom of this

There are two factors that cause the issue we are facing. The first one is lazy evaluation. The second is the way environments work in R.

Lazy evaluation

The way R works is that it doesn’t evaluate an expression when it is not used. Let’s take a look at an example that you can find in Hadley’s book http://adv-r.had.co.nz/Functions.html:

f <- function(x) {
  42
}

f(stop("This is an error!"))
# 42

f <- function(x) {
  force(x)
  42
}

f(stop("This is an error!"))
# Error in force(x): This is an error!

Another useful example to better understand that expressions are evaluated at the moment they are used:

printLabel <- function(x, label = toupper(x)) {
    x <- "changed"
    print(label)
}

printLabel("original")
# CHANGED

printLabel <- function(x, label = toupper(x)) {
    force(label)
  
    x <- "changed"
    print(label)
}

printLabel("original")
# ORIGINAL

sticky notePlease note that promises mentioned here are something different than promises package used to handle concurrent computations.
These semantics are described in R language definition R language definition:

The mechanism is implemented via promises. When a function is being evaluated the actual expression used as an argument is stored in the promise together with a pointer to the environment the function was called from. When (if) the argument is evaluated the stored expression is evaluated in the environment that the function was called from. Since only a pointer to the environment is used any changes made to that environment will be in effect during this evaluation. The resulting value is then also stored in a separate spot in the promise. Subsequent evaluations retrieve this stored value (a second evaluation is not carried out).

How environments work

Every function object has an environment assigned when it is created. Let’s call it environment A. When the function is invoked, a new environment is created and used in the function call. This new environment inherits from environment A.

a <- 1
f <- function(a) {
  a <- a + 1
  a                 # <-- debug here
}

f(5)

# Browse[2]> environment(f)
# <environment: R_GlobalEnv>
# 
# Browse[2]> environment(f)[["a"]]
# [1] 1
#
# Browse[2]> environment()
# <environment: 0x3fa2db0>
#
# Browse[2]> environment()[["a"]]
# [1] 6

This is what the environments hierarchy is at this point:

Environments hierarchy

How does our example work without force

Environment 0x3fa2db2 inherits from mpfEnv and points to index variable which is stored in 0x3fa2db0. index variable is not going to be copied to environment 0x3fa2db2 until it is used there.

How does our example work with force

You shouldn’t come across this issue while using most high-order functions:

R 3.2.0 (2015) changelog:

Higher-order functions such as the apply functions and Reduce()
now force arguments to the functions they apply in order to
eliminate undesirable interactions between lazy evaluation and
variable capture in closures. This resolves PR#16093.

Purrr issue fixed in March 2017: https://github.com/tidyverse/purrr/issues/191

I hope this knowledge will save you some time if you stumble upon such issues in the future.

Until next time!