class: center, middle, inverse, title-slide .title[ # R Markdown for Reproducible Reports. ] .subtitle[ ## White Rose BBSRC DTP Training: An Introduction to Reproducible Analyses in R. ] .author[ ### Emma Rand ] .institute[ ### University of York, UK ] --- <style> div.blue { background-color:#b0cdef; border-radius: 5px; padding: 20px;} div.grey { background-color:#d3d3d3; border-radius: 0px; padding: 0px;} </style> # Aims To introduce you to R Markdown for creating reproducible reports in a variety of output formats. You will also get more practice organising analyses and importing, summarising and plotting data. π¬ An instruction to do something --- # Learning outcomes The successful student will be able to: * explain what R Markdown is * appreciate the role of the YAML header * set default code chunk behaviour and that for individual chunks * use headings, simple text formatting and special characters * add citations and references with a .bib file * use inline code to report results in text * create automatically numbered tables and figures and cross reference them in text --- # Goal! By the end of this session you'll be able to do something like this: <!-- standard markdown report in pdf, word or html --> <!-- include a table, inline code, a figure, --> demo..... --- # Reproducible Reports * What is your process for getting your summary data, statistical results, tables and figures in to your report? -- * What do you do when you get additional data that increases your sample sizes? -- * What do you do if you wrote in Word formatted for one journal and now have to submit in PDF formatted for another? --- # Reproducible Reports How do you work? Typically people analyse, plot and write up in different programs. Graphs are saved to files and copied and pasted into the final report. This process relies on manual labour. If the data changes, the author must repeat the entire process to update the report. --- background-image: url(pics/rmarkdown.png) background-position: 100% 85% background-size: 180px # Reproducible Reports The brilliance of R Markdown (Allaire Xie, et al., 2019) is that you can use a **single R Markdown file** to: * save and execute code -- * do all your data processing, analysis and plotting -- * generate high quality reports that can be shared with an audience -- This is a time-efficient and reproducible way to write! π --- # Formats Many output formats are supported! .pull-left[  ] .pull-right[ * Word * Webpages * PDF * specific journal formats * webslides * powerpoint * books * blogs * posters * web applications ] --- # Organising your work You are going to work with some made-up data on the myoglobin content in the skeletal muscle of three species of seal -- Create a new project called `seals` -- Inside `seals`, make folders called `data-raw` and `data-processed` -- I strongly recommend avoiding spaces in names of files, folders and variable names. R can cope with them.... but they can often confuse humans! -- Save a copy of [seal.txt](data-raw/seal.txt) to the `data-raw` folder --- # Live demo π¬ Just watch for a while.... --- # Key points from demo * mixes text and code * human readable * YAML header between the `---` * code chunk options control whether the code and its output end up in your 'knitted' document * comments * in a code chunk `#` * in the text `<!-- a comment -->` * but use Ctrl+Shift+C * `#` in the text indicate headings --- # Your own R markdown π¬ File | New File | R Markdown -- π¬ Delete everything except the YAML header and the first code chunk -- π¬ Add your name, and a title You could copy and paste a title from [paper.html](https://3mmarand.github.io/seals/paper.html) -- π¬ Save the Rmd as `report.Rmd` --- # YAML π¬ Add `bookdown` (Xie, 2023; Xie, 2016) output types: .code60[ ``` --- title: "Variation in myoglobin content of skeletal muscle of seal species." author: "Emma Rand" output: bookdown::pdf_document2: default bookdown::word_document2: default bookdown::html_document2: default bibliography: reference.bib --- ``` ] I recommend using the `bookdown` package output types because they handle cross referencing well. --- # Default code chunk behaviour π¬ Set some **default** code chunk options. I recommend these: ```` ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE, fig.retina = 3) ``` ```` -- `echo = FALSE` code will not be included in output `warning = FALSE` and `message = FALSE` R messages and warnings will not be included `fig.retina = 3` for improving the appearance of R figures in HTML documents --- # Add a code chunk For package loading The first two code chunks are usually for the default code chunk options (which I tend to call `setup`) and for package loading. π¬ Use Insert | R to add a code chunk (Ctrl-Alt-i): .pull-left[ ```` ```{r packages} library(tidyverse) ``` ```` ] .pull-right[ * `r`indicates it is an R code chunk * `packages` is just a name for the chunk. Naming chunks makes debugging easier. ] --- # References References and citations can be added by: - creating a .bib file containing references in BibTeX format and - adding another line to the YAML header -- π¬ Add `bibliography: reference.bib` to the YAML header.<sup>1</sup> .footnote[ .font70[ 1. There are other ways of including references such as using `RefManageR` package which is what I am using in these slides. ] ] --- # Make a .bib file π¬ Do File | New file | Text file π¬ Save as `reference.bib` π¬ Add BibTeX entries `citation("package")` in the console will give packages references in BibTeX format. -- BibTeX format is also available through most referencing software (e.g., PaperPile). --- # References Citations are added using: * blah blah blah `[@xaringan]` for blah blah blah (Xie, 2019). * Xie `[-@xaringan]` said blah blah blah for Xie (2019) said blah blah blah. -- Every citation used results in the reference being added to a list at the bottom of the output. --- # Add text π¬ Add a header for the Introduction and text - again you could copy and paste. ``` # Introduction Aquatic and marine mammals are able to dive underwater................................. ...investigated whether the concentration of myoglobin differed between species. ``` --- # Your turn π¬ Include at least one reference in your introduction, such as this one: https://doi.org/https://doi.org/10.1016/S1095-6433(00)00182-3. -- π¬ Save your Rmd --- # Your turn Add a section of text for Methods π¬ Add a heading for the Methods section including at least one reference to R or packages. -- π¬ Hit Knit! π§Ά This should render as something like: We used R (R Core Team, 2022) with tidyverse packages (Wickham Averick, et al., 2019) for all analyses. --- # Add a code chunk π¬ Insert a code chunk for importing the data ```` ```{r import} # Data import # the data are organised in to three columns, one for each species. seal <- read_table("data-raw/seal.txt") ``` ```` -- Notice than you can, and should, comment your code. --- # Run code interactively You can run the code interactively to check your progress and the output will be shown in the Rmd document. π¬ Run the whole code chunk using the green arrow button on the right of the code chunk: --- # Run code interactively ```` ```{r import} # Data import # the data are organised in to three columns, one for each species. seal <- read_table("data-raw/seal.txt") str(seal) ``` ```` ```` 'data.frame': 30 obs. of 3 variables: $ Harbour : num 49.7 51 41.6 45.6 39.4 ... $ Weddell : num 55.4 40.1 46.3 29.8 52.5 ... $ Bladdernose: num 56.2 48.4 37.8 42.8 27 ... ```` -- Individual lines of code can be run by placing your cursor on the line and doing ctl-enter. --- # Add a code chunk π¬ Insert a code chunk for tidying the data: ```` ```{r tidy} seal <- seal %>% pivot_longer(names_to = "species", values_to = "myoglobin", cols = everything()) ``` ```` --- # Add a code chunk Add a code chunk to summarise π¬ Insert a code chunk for summarising the data: ```` ```{r summarise} sealsummary <- seal2 %>% group_by(species) %>% summarise(mean = mean(myoglobin), std = sd(myoglobin), n = length(myoglobin), se = std/sqrt(n)) ``` ```` --- # Run code interactively What is in `sealsummary`? π¬ Select a section of code and and run (ctrl-enter) -- π¬ Knit your file π§Ά --- # Inline code Inline code is how you include a variable value, like a mean, in a section of text. -- In fact, any code output can be inserted directly into the text of a .Rmd file using inline code. -- Inline code goes between ` `r` and ` ` ` . For example by writing: The squareroot of 2 is ` `r` `sqrt(2)` ` ` ` you will get: The squareroot of 2 is 1.4142136 --- # Reproducible Reports Inline code π¬ Add a sentence to your Methods section which uses inline code to report how many individuals were examined. --- # Inline code Suppose you want to report the highest mean in the text. We could add: The highest mean is ` `r` `max(sealsummary$mean)` ` ` ` to our text. When we knit the document we would get The highest mean is 49.01033 --- # Inline code π¬ Insert a code chunk to extract values from the summary to make inline reporting easier. This helps us avoid long pieces of inline code Don't worry too much about completely understanding the code. .code80[ ```` ```{r extract} # extract values for inline reporting highestmean <- max(sealsummary$mean) highestse <- sealsummary$se[sealsummary$mean == highestmean] highestspp <- sealsummary$species[sealsummary$mean == highestmean] ``` ```` ] <span style=" font-weight: bold; color: #fdf9f6 !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: #25496b !important;" >Extra exercise:</span> If you have spare time, also extract the lowest mean, its s.e. and the species to which it belongs. <!-- lowestmean <- min(sealsummary$mean) --> <!-- lowestse <- sealsummary$se[sealsummary$mean == lowestmean] --> <!-- lowestspp <- sealsummary$species[sealsummary$mean == lowestmean] --> --- # Your turn. π¬ Make a start on a results section by mixing text and inline coding. π¬ Knit your document. π§Ά --- # Special characters You can include special characters in a markdown document using LaTeX markup. This has `$` signs on the outside and uses backslashes and curly braces to indicate that what follows should be interpreted as a special character or special formatting. For example, to get `\(\bar{x} \pm s.e.\)` you write `$\bar{x} \pm s.e.$` π¬ Improve your results section by using special characters. --- # Adding a table the `knitr` package's `kable()` function is a useful way to include tables. `kable()` takes a dataframe as an argument and has options to format the contents as a table. -- For example, `digits` will round the number of decimal places used. --- # Adding a table π¬ Add a code chunk for a table of the summarised data: ```` ```{r summary-table} knitr::kable(sealsummary, digits = 2, caption = 'A summary of the data.', row.names = FALSE) ``` ```` --- # Cross-referencing a table ```` ```{r summary-table} knitr::kable(sealsummary, digits = 2, caption = 'A summary of the data.', row.names = FALSE) ``` ```` For tables to be automatically numbered, and to allow you to cross-reference them in the text: 1. the chunk must be labelled. I've used `summary-table` 2. the table must have a caption as set by the `caption` argument -- β The caption does **not** have to include "Table 1." --- # Cross-referencing a table To refer to this table in the text, we use `\@ref(tab:summary-table)` The general syntax for cross-references is `\@ref(label)` where label can be: - `tab:name-of-chunk-that-makes-a-table` - `fig:name-of-chunk-that-makes-a-figure` - `ref:name-of-section` - `eq:name-of-equation` --- # Cross-referencing a table π¬ Add a sentence to your results referring to the reader to the summary table such as: See table \\@ref(tab:summary-table) -- π¬ Hit knit! π§Ά --- # Statstical testing An appropriate analysis for this scenario is a one-way ANOVA followed by a Tukey Honest Significant Difference post-hoc test. -- We will first build the ANOVA model and extract the results. -- π¬ Insert a code chunk and build the model: ```` ```{r testing} mod <- aov(data = seal2, myoglobin ~ species) ``` ```` -- π¬ A dataframe containing the ANOVA table can be obtained with: ```` res <- summary(mod)[[1]] ```` --- # Your turn π¬ Extract the *F*, *d.f.* and *p* values from the dataframe and report them in the results section by mixing text and inline coding. --- # Statstical testing: Tukey π¬ Insert a code chunk an run a Tukey HSD on the model: ```` ```{r posthoc, include=FALSE} TukeyHSD(mod) ``` ```` Note: I have added the `include=FALSE` argument to the chunk. This suppresses the output in the knitted document. --- # Statstical testing: Tukey π¬ Run the Tukey code interactively: .code80[ ```` Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = myoglobin ~ species, data = seal2) $species diff lwr upr p adj Harbour-Bladdernose -2.835357 -7.968374 2.297660 0.3889209 Weddell-Bladdernose 4.474286 -0.658731 9.607302 0.1001324 Weddell-Harbour 7.309643 2.176626 12.442660 0.0029812 ```` ] It's the highest (Weddell) and lowest (Harbour) means that differ. --- # Adding a figure .pull-left[ This is what we are aiming for: <img src="pics/seals.png" width="350px" height="317px" /> ] .pull-right[ For figures to be automatically numbered, and to allow you to cross-reference them in the text: 1. the chunk must be labelled. I've used `myo-fig` 2. the figure must have a caption *but* this time it is set in the code chunk options using `fig.cap` To refer to this figure in the text, we will use `\@ref(fig:myo-fig)` ] --- # Adding a figure We will go through the items that need to be in the markdown with a basic plot. π¬ Add a code chunk for the figure: .code70[ ```` ```{r myo-fig, fig.cap="Mean Myoglobin content of skeletal muscle"} ggplot() + geom_point(data = seal, aes(x = species, y = myoglobin), position = position_jitter(width = 0.1, height = 0)) ``` ```` ] I have labelled the chunk `myo-fig` and set the caption with `fig.cap`. --- # Adding a figure π¬ Now add a sentence to your results such as: See Figure \\@ref(fig:myo-fig) -- π¬ Hit knit! π§Ά You should find the figure and the cross reference to it are automatically numbered. --- # Figure legends 2 Figure legends are often long and contain special characters -- Including long text in the code chunk option makes the code a bit hard to read and special characters are difficult. -- We can instead create a 'reference' for the legend. π¬Add this text to your document: (ref:myo-fig) Mean Myoglobin content of skeletal muscle. Error bars are `\(\pm 1 s.e.\)` --- # Reproducible Reports Adding a figure π¬And edit your chunk options like this: .code70[ ```` ```{r myo-fig, fig.cap="(ref:myo-fig)"} ggplot() + geom_point(data = seal, aes(x = species, y = myoglobin), position = position_jitter(width = 0.1, height = 0)) ``` ```` ] π¬ Hit knit! π§Ά --- # Your turn Suggestions: * customise this figure (see the Seal example) * add an additional figure * add an image as a figure * Try out a template from `rticles`: File | New File | R Markdown and choose Fr * Switch to Quarto! * integrate with zotero * Use the visual editor --- # Resources * R Markdown cheatsheet: https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf * RStudioβs guide http://rmarkdown.rstudio.com/index.html * R Markdown: The Definitive Guide (Xie Allaire, et al., 2018) https://bookdown.org/yihui/rmarkdown/ * bookdown: Authoring Books and Technical Documents with R Markdown (Xie, 2023) https://bookdown.org/yihui/bookdown/ * [Quarto](https://quarto.org/) --- # Summary * R Markdown interweaves analysis code and reporting and is human readable * metadata about the document is given YAML header * there are many formats for outputs and several packages available * code chunk behaviour such as visibility and output size is set with options * `#` is a heading in the text and a comment (as normal), in a code chunk * LateX can be included * citations and references are added with a .bib file * use inline code to report results in text * automatically numbered tables and figures and cross references to them can be included --- # References .footnote[ .font60[ Slides made with with xaringan (Xie, 2019) and xaringanExtra (Aden-Buie, 2020) ] ] .font60[ <p><cite>Aden-Buie, G. (2020). <em>xaringanExtra: Extras And Extensions for Xaringan Slides</em>. R package version 0.2.3.9000. URL: <a href="https://github.com/gadenbuie/xaringanExtra">https://github.com/gadenbuie/xaringanExtra</a>.</cite></p> <p><cite>Allaire, J., Y. Xie, et al. (2019). <em>rmarkdown: Dynamic Documents for R</em>. R package version 1.16. URL: <a href="https://github.com/rstudio/rmarkdown">https://github.com/rstudio/rmarkdown</a>.</cite></p> <p><cite>R Core Team (2022). <em>R: A Language and Environment for Statistical Computing</em>. R Foundation for Statistical Computing. Vienna, Austria. URL: <a href="https://www.R-project.org/">https://www.R-project.org/</a>.</cite></p> <p><cite>Wickham, H., M. Averick, et al. (2019). “Welcome to the tidyverse”. In: <em>Journal of Open Source Software</em> 4.43, p. 1686. DOI: <a href="https://doi.org/10.21105/joss.01686">10.21105/joss.01686</a>.</cite></p> <p><cite>Xie, Y. (2016). <em>bookdown: Authoring Books and Technical Documents with R Markdown</em>. ISBN 978-1138700109. Boca Raton, Florida: Chapman and Hall/CRC. URL: <a href="https://bookdown.org/yihui/bookdown">https://bookdown.org/yihui/bookdown</a>.</cite></p> <p><cite>Xie, Y. (2019). <em>xaringan: Presentation Ninja</em>. R package version 0.12. URL: <a href="https://CRAN.R-project.org/package=xaringan">https://CRAN.R-project.org/package=xaringan</a>.</cite></p> <p><cite>Xie, Y. (2023). <em>bookdown: Authoring Books and Technical Documents with R Markdown</em>. R package version 0.32. URL: <a href="https://github.com/rstudio/bookdown">https://github.com/rstudio/bookdown</a>.</cite></p> <p><cite>Xie, Y., J. Allaire, et al. (2018). <em>R Markdown: The Definitive Guide</em>. ISBN 9781138359338. Boca Raton, Florida: Chapman and Hall/CRC. URL: <a href="https://bookdown.org/yihui/rmarkdown">https://bookdown.org/yihui/rmarkdown</a>.</cite></p> ] --- # Intro to Repro in R Emma Rand [emma.rand@york.ac.uk](mailto:emma.rand@york.ac.uk) Twitter: [@er13_r](https://twitter.com/er13_r) GitHub: [3mmaRand](https://github.com/3mmaRand) blog: https://buzzrbeeline.blog/ <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. Rand, E. (2023). White Rose BBSRC DTP Training: An Introduction to Reproducible Analyses in R (Version v1.2). https://doi.org/10.5281/zenodo.3859818