R and Docker

Warwick R User Group

James Tripp

Senior Research Software Engineer, IT Services (University of Warwick)

Who am I?

James Tripp

James Tripp

Research Software Engineer

Plan

  • Why?
  • Docker?
  • Next Steps

Why?

Developers

  • Software developers sometimes wrote code which only ran on their machine
  • That could be a problem

Developers

Developers

  • Developers could build and update images
  • Images are stored in a registry
  • Production servers download images
  • Copies of these images called containers are used in production

Researchers

How does this help research?

  • Allows other to run code without having to install lots of dependencies (Eglen et al. 2017)

Docker

What?

What?

What?

What?

What?

What?

R images on DockerHub

-   rocker/r-ver - alternative to r-base
-   rocker/rstudio - R and RStudio
-   rocker/tidyverse- R and tidyverse packages
-   rocker/shiny - Shiny server built in
-   many others

Base image

To use the official image

docker run --rm -ti r-base

  1. The r-base image downloaded

  2. Container created from image

  3. Terminal enters container

The command line options are to remove the container on exit (-rm) and to create a command line interface with interactivity (-ti)

RStudio image

docker run --rm -ti -e DISABLE_AUTH="true" -p 8787:8787 rocker/rstudio

  • Downloads image, creates and terminal enters container

  • The container provides access to RStudio via localhost:8787

  • You now have an isolated RStudio container running

Command line options are remove the container on exit (-rm), create a command line interface (-ti), set environment variables (-e) and set the port (-p) so that connecting via port 8787 will connect to port 8787 in the container.

Custom images

  • Docker images are created from dockerfiles. A sample dockerfile is below1.
FROM r-base
COPY . /usr/local/src/myscripts
WORKDIR /usr/local/src/myscripts
CMD ["Rscript", "myscript.R"]
  • To build the image, go to the folder containing the dockerfile and run.
docker build -t jamestripp/myimage .
  • Finally, to create a container and jump into it.
docker run --rm -ti jamestripp/myimage
  • You can publish images to the Docker Hub (or another registry) for others to use (see instructions here and here)

Local files

  • Copying files when the image is created. Baked into the image.
FROM r-base
COPY . /usr/local/src/myscripts
WORKDIR /usr/local/src/myscripts
CMD ["Rscript", "myscript.R"]
  • Volumes are folders on your local file system which are accessible to a container
# MacOS and Linux
docker run --rm -ti -e DISABLE_AUTH="true" -p 8787:8787 -v  $(pwd):/home/rstudio/data rocker/rstudio
# Windows
docker run --rm -ti -e DISABLE_AUTH="true" -p 8787:8787 -v  absolute_path:/home/rstudio/data rocker/rstudio
#May work on Windows
docker run --rm -ti -e DISABLE_AUTH="true" -p 8787:8787 -v  %cd%:/home/rstudio/data rocker/rstudio

Where absolute_path is the full Windows path

Example

Fortune

myscript.R

library(fortunes)

print(fortune())
print(fortune())

dockerfile

FROM r-base
COPY . /usr/local/src/myscripts
WORKDIR /usr/local/src/myscripts
RUN ["install2.r", "fortune"]
CMD ["Rscript", "myscript.R"]

Build and run it

docker build -t jamestripp/fortune .
docker run --rm -ti jamestripp/fortune

Fortune

Other examples

To consider

  • Smaller images with very little software may be preferable as these are easier to maintain (Gruening et al. 2019a)

Next steps

Learning materials

  • Work through the UserR2022 workshop by rsangole (repo, blog post)

Papers

  • Some R packages are mentioned in Nüst, Eddelbuettel, et al. (2020). I was not able to get these packages working
    • Stevedore - Sends commands to Docker. Required reticulate and the Python docker module. Recieved the error ‘Did not find required python module ’docker’’. Issue reported and not addressed in the past year…

    • Dockyard - Aims to help you create and run a container. Last commit was 3 years ago and the example code on the github page does not work…

    • Dockermachine - Last updated 5 years ago.

  • Nüst, Sochat, et al. (2020) outlines 10 rules for creating dockerfiles, talks about good practice and sign posts other tools
  • Peikert and Brandmaier (2021) offer a workflow for rendering markdown documents which includes a make file for dependencies

  • Not an exhaustive list

Tools

Binder

Run a docker container on a remote server. Place a button on your public github repo README.md.

ShinyProxy

Spins up shiny containers with one container per user (see the R Bloggers post)

Questions?

References

Boettiger, Carl. 2015. “An Introduction to Docker for Reproducible Research.” ACM SIGOPS Operating Systems Review 49 (1): 7179. https://doi.org/10.1145/2723872.2723882.
Boettiger, Carl, and Dirk Eddelbuettel. 2017. “An Introduction to Rocker: Docker Containers for R.” The R Journal 9 (2): 527–36. https://journal.r-project.org/archive/2017/RJ-2017-065/index.html.
Eglen, Stephen J., Ben Marwick, Yaroslav O. Halchenko, Michael Hanke, Shoaib Sufi, Padraig Gleeson, R. Angus Silver, et al. 2017. “Toward Standard Practices for Sharing Computer Code and Programs in Neuroscience.” Nature Neuroscience 20 (6): 770–73. https://doi.org/10.1038/nn.4550.
“Get Started with Binder Binder 0.1b Documentation.” n.d. https://mybinder.readthedocs.io/en/latest/introduction.html.
Gruening, Bjorn, Olivier Sallou, Pablo Moreno, Felipe da Veiga Leprevost, Hervé Ménager, Dan Søndergaard, Hannes Röst, et al. 2019a. “Recommendations for the Packaging and Containerizing of Bioinformatics Software.” https://doi.org/10.12688/f1000research.15140.2.
———, et al. 2019b. “Recommendations for the Packaging and Containerizing of Bioinformatics Software.” https://doi.org/10.12688/f1000research.15140.2.
Marwick, Ben, Carl Boettiger, and Lincoln Mullen. 2018. “Packaging Data Analytical Work Reproducibly Using r (and Friends).” The American Statistician 72 (1): 80–88. https://doi.org/10.1080/00031305.2017.1375986.
Mölder, Felix, Kim Philipp Jablonski, Brice Letcher, Michael B. Hall, Christopher H. Tomkins-Tinch, Vanessa Sochat, Jan Forster, et al. 2021. “Sustainable Data Analysis with Snakemake.” F1000Research 10 (April): 33. https://doi.org/10.12688/f1000research.29032.2.
Nüst, Daniel, Dirk Eddelbuettel, Dom Bennett, Robrecht Cannoodt, Dav Clark, Gergely Daróczi, Mark Edmondson, et al. 2020. “The Rockerverse: Packages and Applications for Containerisation with R.” The R Journal 12 (1): 437–61. https://journal.r-project.org/archive/2020/RJ-2020-007/index.html.
Nüst, Daniel, Vanessa Sochat, Ben Marwick, Stephen J. Eglen, Tim Head, Tony Hirst, and Benjamin D. Evans. 2020. “Ten Simple Rules for Writing Dockerfiles for Reproducible Data Science.” PLOS Computational Biology 16 (11): e1008316. https://doi.org/10.1371/journal.pcbi.1008316.
Peikert, Aaron, and Andreas M. Brandmaier. 2021. “A Reproducible Data Analysis Workflow With R Markdown, Git, Make, and Docker.” Quantitative and Computational Methods in Behavioral Sciences, May, 1–27. https://doi.org/10.5964/qcmb.3763.
Pittard, W. Stephen, and Shuzhao Li. 2020. “The Essential Toolbox of Data Science: Python, R, Git, and Docker.” In, edited by Shuzhao Li, 265–311. Methods in Molecular Biology. New York, NY: Springer US. https://doi.org/10.1007/978-1-0716-0239-3_15.
Smith, David. 2022. “Easy r Tutorials with Dev Containers | r-Bloggers.” https://www.r-bloggers.com/2022/08/easy-r-tutorials-with-dev-containers/.