Ahhh, a friendly cat…
Run away now!
Functional programming is for another day (and another speaker!)
From the tidyverse documentation:
“the best place to start is the family of map() functions which allow you to replace many for loops with code that is both more succinct and easier to read.”
and:
“The best place to learn about the map() functions is the iteration chapter in R for data science.”
Why should I learn how to use map()?
[1] "Hot croissant" "Hot baked potato"
[[1]]
[1] "Hot croissant"
[[2]]
[1] "Hot baked potato"
loop
map
loop
# we need a vector to iterate over
food <- c("croissant", "baked potato")
# we need to create a place to put the results (the right size)
result <- vector("character", length(food))
# here's the loop
for(i in seq_along(food)){
result[[i]] <- paste("Hot", food[[i]])
}
result
There is more “boilerplate” code in a loop, and code explanations end up in comments (which might not exist)
map
# we need a vector to iterate over
food <- c("croissant", "baked potato")
# we create a function to do the "work"
heat_the_food <- function(food){
paste("Hot", food)
}
# here's the loop
result <- purrr::map(food, heat_the_food)
result
The code is easier to read, and we can use function names not comments to describe what it’s doing
purrr::map(.x, .f)
.x
.f
=
result
purrr::map(.x, .f)
.x
.f
=
result
purrr::map(.x, .f)
.x
.f
=
result
set.seed(100)
# make some timeseries data for 5 metrics
data_long <- data.frame(
metric = paste0("Metric ", rep(c(1,2,3,4,5), each = 45)),
date = rep(seq.Date(from = as.Date("2020-01-01"), to = as.Date("2023-09-01"), by = "month"), times = 5),
values = rnorm(225)
)
head(data_long)
metric date values
1 Metric 1 2020-01-01 -0.50219235
2 Metric 1 2020-02-01 0.13153117
3 Metric 1 2020-03-01 -0.07891709
4 Metric 1 2020-04-01 0.88678481
5 Metric 1 2020-05-01 0.11697127
6 Metric 1 2020-06-01 0.31863009
make_chart <- function(name, data){
# filter the dataset down to metric of interest
filtered_data <- data |>
dplyr::filter(
metric == name
)
# make a plot
p <- ggplot(
filtered_data,
aes(date, values)
) +
geom_line() +
geom_point() +
labs(
title = name,
subtitle = "An important subtitle"
)
# save the plot
ggsave(paste0("img/", name, ".png"), p, units = "px", width = 600, height = 300, scale = 3)
}
# we make minor alterations to the plotting function
make_spc <- function(name, data){
# filter the dataset down to metric of interest
filtered_data <- data |>
dplyr::filter(
metric == name
)
# make a plot using {NHSRplotthedots}
p <- NHSRplotthedots::ptd_spc(
filtered_data,
values,
date
) |>
NHSRplotthedots::ptd_create_ggplot() +
labs(
title = name,
subtitle = "A better, and more insightful subtitle"
)
# save the plot
ggsave(paste0("img/", name, "_spc.png"), p, units = "px", width = 600, height = 300, scale = 4)
}
purrr::map(
c("Metric 1", "Metric 2", "Metric 3", "Metric 4", "Metric 5"),
make_spc,
data_long
)
Tom Jemmett’s video
Hadley Wickham’s video
The iteration chapter of R for Data Science:
https://r4ds.had.co.nz/iteration.html#iteration
Docs: https://purrr.tidyverse.org
Cheatsheet: https://rstudio.github.io/cheatsheets/purrr.pdf
Online presentation: https://thomuk.github.io/2023-NHSR-Conf-Presentation
Presentation code: https://github.com/ThomUK/2023-NHSR-Conf-Presentation
Other repos: https://github.com/ThomUK