Old dog and new tricks
Learning to {purrr}

Tom Smith
Insight Manager
Nottingham University Hospitals NHS Trust

purrr for functional programming

Ahhh, a friendly cat…

Scary functional programming

Run away now!

Aims

  • Don’t be scary!
  • Practical intro to purrr
  • A “mental model” which helped me
  • A helpful chart-making example

Functional programming is for another day (and another speaker!)

I wouldn’t start from here

From the tidyverse documentation:

“the best place to start is the family of map() functions which allow you to replace many for loops with code that is both more succinct and easier to read.”

and:

“The best place to learn about the map() functions is the iteration chapter in R for data science.

Loops vs. Map

Why should I learn how to use map()?

Loop example

# we need a vector to iterate over
food <- c("croissant", "baked potato")

# we need to create a place to put the results (the right size)
result <- vector("character", length(food))

# here's the loop
for(i in seq_along(food)){
  
  result[[i]] <- paste("Hot", food[[i]])
  
}

result
[1] "Hot croissant"    "Hot baked potato"

Map example

# we still need a vector to iterate over
food <- c("croissant", "baked potato")

# we create a function to do the "work"
heat_the_food <- function(food){
  
  paste("Hot", food)

}

# here's the loop
result <- purrr::map(food, heat_the_food)

result
[[1]]
[1] "Hot croissant"

[[2]]
[1] "Hot baked potato"

Comparison

loop


# we need a vector to iterate over
food <- c("croissant", "baked potato")

# we need to create a place to put the results (the right size)
result <- vector("character", length(food))

# here's the loop
for(i in seq_along(food)){
  
  result[[i]] <- paste("Hot", food[[i]])
  
}

result

Comparison

map


# we need a vector to iterate over
food <- c("croissant", "baked potato")

# we create a function to do the "work"
heat_the_food <- function(food){
  
  paste("Hot", food)

}

# here's the loop
result <- purrr::map(food, heat_the_food)

result

Comparison

loop


# we need a vector to iterate over
food <- c("croissant", "baked potato")

# we need to create a place to put the results (the right size)
result <- vector("character", length(food))

# here's the loop
for(i in seq_along(food)){
  
  result[[i]] <- paste("Hot", food[[i]])
  
}

result

There is more “boilerplate” code in a loop, and code explanations end up in comments (which might not exist)

Comparison

map


# we need a vector to iterate over
food <- c("croissant", "baked potato")

# we create a function to do the "work"
heat_the_food <- function(food){
  
  paste("Hot", food)

}

# here's the loop
result <- purrr::map(food, heat_the_food)

result

The code is easier to read, and we can use function names not comments to describe what it’s doing

Arguments

purrr::map(.x, .f)

# .x A list or atomic vector
# .f A function

Mental model

purrr::map(food, heat_the_food)

purrr::map(subjects, action)

purrr::map(noun, verb)

purrr::map(raw_material, instructions)

purrr::map(ingredients, recipe)

Cakes!

purrr::map(.x, .f)

.x

.f

=

result

Cars!

purrr::map(.x, .f)

.x

.f

=

result

Ikea!

purrr::map(.x, .f)

.x

.f

=

result

Real-world example

Some data

set.seed(100)
# make some timeseries data for 5 metrics
data_long <- data.frame(
  metric = paste0("Metric ", rep(c(1,2,3,4,5), each = 45)),
  date = rep(seq.Date(from = as.Date("2020-01-01"), to = as.Date("2023-09-01"), by = "month"), times = 5),
  values = rnorm(225)
)

head(data_long)
    metric       date      values
1 Metric 1 2020-01-01 -0.50219235
2 Metric 1 2020-02-01  0.13153117
3 Metric 1 2020-03-01 -0.07891709
4 Metric 1 2020-04-01  0.88678481
5 Metric 1 2020-05-01  0.11697127
6 Metric 1 2020-06-01  0.31863009

A function to make a plot

make_chart <- function(name, data){

  # filter the dataset down to metric of interest
  filtered_data <- data |>
    dplyr::filter(
      metric == name
    )

  # make a plot
  p <- ggplot(
    filtered_data,
    aes(date, values)
  ) +
    geom_line() +
    geom_point() +
    labs(
      title = name,
      subtitle = "An important subtitle"
    )

  # save the plot
  ggsave(paste0("img/", name, ".png"), p, units = "px", width = 600, height = 300, scale = 3)

}

Map

library(ggplot2)

purrr::map(
  c("Metric 1", "Metric 2", "Metric 3", "Metric 4", "Metric 5"), 
  make_chart, 
  data_long
)
[[1]]
[1] "img/Metric 1.png"

[[2]]
[1] "img/Metric 2.png"

[[3]]
[1] "img/Metric 3.png"

[[4]]
[1] "img/Metric 4.png"

[[5]]
[1] "img/Metric 5.png"

Charts

What about SPC!?

# we make minor alterations to the plotting function
make_spc <- function(name, data){

  # filter the dataset down to metric of interest
  filtered_data <- data |>
    dplyr::filter(
      metric == name
    )

  # make a plot using {NHSRplotthedots}
  p <- NHSRplotthedots::ptd_spc(
    filtered_data,
    values,
    date
  ) |>
    NHSRplotthedots::ptd_create_ggplot() +
    labs(
      title = name,
      subtitle = "A better, and more insightful subtitle"
    )

  # save the plot
  ggsave(paste0("img/", name, "_spc.png"), p, units = "px", width = 600, height = 300, scale = 4)

}

purrr::map(
  c("Metric 1", "Metric 2", "Metric 3", "Metric 4", "Metric 5"), 
  make_spc, 
  data_long
)

SPC charts

In summary

Some things we haven’t covered

  • Passing other arguments in
    • eg. the data in the charts example
  • The shape of the output
    • Getting a vector back instead of a list
  • map() vs. walk()
    • Returned values vs. side-effects
  • Mapping over more than one variable
    • map2() for 2 variables
    • pmap() and a dataframe of variables for more

Further online material

Tom Jemmett’s video

Hadley Wickham’s video

The iteration chapter of R for Data Science:
https://r4ds.had.co.nz/iteration.html#iteration

Docs: https://purrr.tidyverse.org

Cheatsheet: https://rstudio.github.io/cheatsheets/purrr.pdf

Open-source

Online presentation: https://thomuk.github.io/2023-NHSR-Conf-Presentation

Presentation code: https://github.com/ThomUK/2023-NHSR-Conf-Presentation

Other repos: https://github.com/ThomUK

Thank you