Introduction and Principles of reproducibility.

class: center, middle, inverse, title-slide

.title[
# Introduction and Principles of reproducibility.
]
.subtitle[
## White Rose BBSRC DTP Training: An Introduction to Reproducible Analyses in R.
]
.author[
### Emma Rand
]
.institute[
### University of York, UK
]

---

---
class: inverse

# Programme Overview

---

# What this training *is* and *is not*

Chosen topics are: foundational, widely applicable, and transferable conceptually.

.pull-left[

.font90[

**It is**
* An introduction to R for those without previous experience
* About using RStudio projects and good practice for code and project documentation and organisation
* An introduction to the tidyverse and Quarto

]

.pull-right[

.font90[

**It is not**
* An introduction to statistics  
* Magic

]

---
# Programme overview

Modules, tutor-led or supported study. The selection of modules you undertake will depend on your previous experience.

.font70[

[1. Introduction and Principles of reproducibility](01_intro_and_principles_of_repro.html)

Audience: Everyone

* Rationale for scripting  
* Why R?

[2. A. Introduction to R and working with data](02_intro_to_r_and_working_with_data.html)

Audience: Those without previous experience of R

* Finding your way round RStudio  
* Typing in data, doing some calculations on it, plotting it  
* Importing data: working directories and paths  
* Summarising and visualising with the [`tidyverse`](https://www.tidyverse.org/)  
* Installing and loading packages

]

---
# Have experience?

.font70[

[2. B. Tidying data and the tidyverse including the pipe](04_tidying_data_and_the_tidyverse.html)

Audience: For those with previous experience of R

* Using the tidyverse including the pipe to link operations together.  
* Carrying out some common data tidying tasks such as reshaping, renaming and recoding variable and cleaning cell contents

]

---
# Programme overview

.font70[

[3. Project-oriented workflow](03_rstudio_projects.html)

Audience: Everyone

* Organising your work in a logical, consistent and reproducible way
* Working directories and paths    
* Using RStudio Projects

[4. Quarto for Reproducible Reports](xxx)

Audience: For those with previous experience of R such as having done "Introduction to R and working with data."

]

---
class: inverse

# Reproducibility

---

# Reproducibility

.font60[
How the Turing Way  defines reproducible research
]

---
# Why does it matter?

![futureself, CC-BY-NC, by Julen
Colomb](pics/future_you.png){fig-alt="Person working at a computer with an offstage person asking 'How is the analysis going?' The person at the computer replies 'Can't understand the date...and the data collector does not answer my emails or calls' Person offstage: 'That's terrible! So cruel! Who did collect the data? I will sack them!' Person at the computer: 'um...I did, 3 years ago.'"
width="400"}

---
# Who cares?

* Many high profile cases of work which did not reproduce e.g. Anil Potti unravelled by Baggerly and Coombes (2009)

* Five selfish reasons to work reproducibly (Markowetz, 2015). Alternatively, see the [talk](https://youtu.be/yVT07Sukv9Q)

* **Will** become standard in Science and publishing e.g OECD Global Science Forum Building digital workforce capacity and skills for data-intensive science (OECD Global Science Forum, 2020)

---

# How to achieve reproducibility

-   Scripting

-   Project-oriented workflows

-   Documentation at code and Project level

-   Literate programming

Workflows for computational projects and the data analysis and reporting of other work can, and should, be 100% reproducible!

---
class: inverse

# Why R?

---
# Why R?

Open source and free

--
 .......But so is Python
  
--

R has reputation for catering to users who do not see themselves as programmers, and  allowing them to slide gradually into programming.

![biologists](pics/biologist1.png)

Designed for data analysis and graphics - which means it is often easier to achieve those tasks in R than a general purpose programming language.

---
# Why R?

The R community is one of R's greatest assets, being vibrant, inclusive and supportive of users at all levels.

.pull-left[

-   [#rstats](https://twitter.com/hashtag/rstats?lang=en) on twitter has been very active
-   [RForwards](https://forwards.github.io/about/) the widening participation task force <sup>1</sup>
-   [RLadies](https://rladies.org/) gender diversity promotion 
-   [Hey! You there! You are welcome here](https://ropensci.org/blog/2017/06/23/community/)

.font70[
.footnote[
1. I am member of the Core Team for Forwards
]
]

]

.pull-right[
<img src="pics/welcome_to_rstats_twitter.png" width="400px" />

.font70[
.footnote[
Artwork by @allison_horst 
]
]

]

---
class: inverse

# Summary

---
# Summary

-   The course is:

- an introduction to reproducible analyses rather than statistics  
   - not enough, you need to practice!  
   - comprised of modules so you can opt out where you already have the skills  
   
-   Scripting makes your work reproducible

-   Focus is on R but principles of reproducibility are widely applicable; use Python if you prefer

---
class: inverse

# Further Reading

---
# Further Reading

-   "Data Organization in Spreadsheets" (Broman and Woo, 2018)

-   "Ten simple rules for reproducible computational research" (Sandve Nekrutenko et al., 2013)

-   "Best practices for scientific computing" (Wilson Aruliah et al., 2014)

-   "Good enough practices in scientific computing" (Wilson Bryan et al., 2017)

-   "Excuse Me, Do You Have a Moment to Talk About Version Control?" (Bryan, 2018)

---
# References
.footnote[
.font60[
Slides made with with xaringan (Xie, 2019) and xaringanExtra (Aden-Buie, 2020)
]
]
.font60[
Aden-Buie, G. (2020). _xaringanExtra: Extras And Extensions for
Xaringan Slides_. R package version 0.2.3.9000. URL:
[https://github.com/gadenbuie/xaringanExtra](https://github.com/gadenbuie/xaringanExtra).

Baggerly, K. A. and K. R. Coombes (2009). "DERIVING CHEMOSENSITIVITY
FROM CELL LINES: FORENSIC BIOINFORMATICS AND REPRODUCIBLE RESEARCH IN
HIGH-THROUGHPUT BIOLOGY". In: _Ann. Appl. Stat._ 3.4, pp. 1309-1334.

Broman, K. W. and K. H. Woo (2018). "Data Organization in
Spreadsheets". In: _Am. Stat._ 72.1. Publisher: Taylor & Francis, pp.
2-10. ISSN: 0003-1305. DOI:
[10.1080/00031305.2017.1375989](https://doi.org/10.1080%2F00031305.2017.1375989).
URL:
[https://doi.org/10.1080/00031305.2017.1375989](https://doi.org/10.1080/00031305.2017.1375989).

Bryan, J. (2018). "Excuse Me, Do You Have a Moment to Talk About
Version Control?" In: _Am. Stat._ 72.1, pp. 20-27.

Markowetz, F. (2015). "Five selfish reasons to work reproducibly". En.
In: _Genome Biol._ 16, p. 274.

OECD Global Science Forum (2020). _Building digital workforce capacity
and skills for data-intensive science_. OECD.

Sandve, G. K., A. Nekrutenko, et al. (2013). "Ten simple rules for
reproducible computational research". En. In: _PLoS Comput. Biol._
9.10, p. e1003285.

Wilson, G., D. A. Aruliah, et al. (2014). "Best practices for
scientific computing". En. In: _PLoS Biol._ 12.1, p. e1001745.

Wilson, G., J. Bryan, et al. (2017). "Good enough practices in
scientific computing". En. In: _PLoS Comput. Biol._ 13.6, p. e1005510.

Xie, Y. (2019). _xaringan: Presentation Ninja_. R package version 0.12.
URL:
[https://CRAN.R-project.org/package=xaringan](https://CRAN.R-project.org/package=xaringan).
]

---
# Intro to Repro in R

Emma Rand  
[emma.rand@york.ac.uk](mailto:emma.rand@york.ac.uk)  
Twitter: [@er13_r](https://twitter.com/er13_r)   
GitHub: [3mmaRand](https://github.com/3mmaRand)  
blog: https://buzzrbeeline.blog/

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

Rand, E. (2023). White Rose BBSRC DTP Training: An Introduction to Reproducible Analyses in R (Version v1.2). https://doi.org/10.5281/zenodo.3859818