class: center, middle, inverse, title-slide .title[ # Introduction and Principles of reproducibility. ] .subtitle[ ## White Rose BBSRC DTP Training: An Introduction to Reproducible Analyses in R. ] .author[ ### Emma Rand ] .institute[ ### University of York, UK ] --- <style> div.blue { background-color:#b0cdef; border-radius: 5px; padding: 20px;} div.grey { background-color:#d3d3d3; border-radius: 0px; padding: 0px;} </style> --- class: inverse # Programme Overview --- # What this training *is* and *is not* Chosen topics are: foundational, widely applicable, and transferable conceptually. .pull-left[ .font90[ **It is** * An introduction to R for those without previous experience * About using RStudio projects and good practice for code and project documentation and organisation * An introduction to the tidyverse and Quarto ] ] -- .pull-right[ .font90[ **It is not** * An introduction to statistics * Magic ] ] --- # Programme overview Modules, tutor-led or supported study. The selection of modules you undertake will depend on your previous experience. .font70[ [1. Introduction and Principles of reproducibility](01_intro_and_principles_of_repro.html) Audience: Everyone * Rationale for scripting * Why R? [2. A. Introduction to R and working with data](02_intro_to_r_and_working_with_data.html) Audience: Those without previous experience of R * Finding your way round RStudio * Typing in data, doing some calculations on it, plotting it * Importing data: working directories and paths * Summarising and visualising with the [`tidyverse`](https://www.tidyverse.org/) * Installing and loading packages ] --- # Have experience? .font70[ [2. B. Tidying data and the tidyverse including the pipe](04_tidying_data_and_the_tidyverse.html) Audience: For those with previous experience of R * Using the tidyverse including the pipe to link operations together. * Carrying out some common data tidying tasks such as reshaping, renaming and recoding variable and cleaning cell contents ] --- # Programme overview .font70[ [3. Project-oriented workflow](03_rstudio_projects.html) Audience: Everyone * Organising your work in a logical, consistent and reproducible way * Working directories and paths * Using RStudio Projects [4. Quarto for Reproducible Reports](xxx) Audience: For those with previous experience of R such as having done "Introduction to R and working with data." ] --- class: inverse # Reproducibility --- # Reproducibility <img src="pics/reproducible-matrix.jpg" width="700px" /> .font60[ How the Turing Way defines reproducible research ] --- # Why does it matter? ![futureself, CC-BY-NC, by Julen Colomb](pics/future_you.png){fig-alt="Person working at a computer with an offstage person asking 'How is the analysis going?' The person at the computer replies 'Can't understand the date...and the data collector does not answer my emails or calls' Person offstage: 'That's terrible! So cruel! Who did collect the data? I will sack them!' Person at the computer: 'um...I did, 3 years ago.'" width="400"} --- # Who cares? * Many high profile cases of work which did not reproduce e.g. Anil Potti unravelled by Baggerly and Coombes (2009) * Five selfish reasons to work reproducibly (Markowetz, 2015). Alternatively, see the [talk](https://youtu.be/yVT07Sukv9Q) * **Will** become standard in Science and publishing e.g OECD Global Science Forum Building digital workforce capacity and skills for data-intensive science (OECD Global Science Forum, 2020) --- # How to achieve reproducibility - Scripting - Project-oriented workflows - Documentation at code and Project level - Literate programming Workflows for computational projects and the data analysis and reporting of other work can, and should, be 100% reproducible! --- class: inverse # Why R? --- # Why R? Open source and free -- .......But so is Python -- R has reputation for catering to users who do not see themselves as programmers, and allowing them to slide gradually into programming. ![biologists](pics/biologist1.png) -- Designed for data analysis and graphics - which means it is often easier to achieve those tasks in R than a general purpose programming language. --- # Why R? The R community is one of R's greatest assets, being vibrant, inclusive and supportive of users at all levels. .pull-left[ - [#rstats](https://twitter.com/hashtag/rstats?lang=en) on twitter has been very active - [RForwards](https://forwards.github.io/about/) the widening participation task force <sup>1</sup> - [RLadies](https://rladies.org/) gender diversity promotion - [Hey! You there! You are welcome here](https://ropensci.org/blog/2017/06/23/community/) .font70[ .footnote[ 1. I am member of the Core Team for Forwards ] ] ] .pull-right[ <img src="pics/welcome_to_rstats_twitter.png" width="400px" /> .font70[ .footnote[ Artwork by @allison_horst ] ] ] --- class: inverse # Summary --- # Summary - The course is: - an introduction to reproducible analyses rather than statistics - not enough, you need to practice! - comprised of modules so you can opt out where you already have the skills - Scripting makes your work reproducible - Focus is on R but principles of reproducibility are widely applicable; use Python if you prefer --- class: inverse # Further Reading --- # Further Reading - "Data Organization in Spreadsheets" (Broman and Woo, 2018) - "Ten simple rules for reproducible computational research" (Sandve Nekrutenko et al., 2013) - "Best practices for scientific computing" (Wilson Aruliah et al., 2014) - "Good enough practices in scientific computing" (Wilson Bryan et al., 2017) - "Excuse Me, Do You Have a Moment to Talk About Version Control?" (Bryan, 2018) --- # References .footnote[ .font60[ Slides made with with xaringan (Xie, 2019) and xaringanExtra (Aden-Buie, 2020) ] ] .font60[ Aden-Buie, G. (2020). _xaringanExtra: Extras And Extensions for Xaringan Slides_. R package version 0.2.3.9000. URL: [https://github.com/gadenbuie/xaringanExtra](https://github.com/gadenbuie/xaringanExtra). Baggerly, K. A. and K. R. Coombes (2009). "DERIVING CHEMOSENSITIVITY FROM CELL LINES: FORENSIC BIOINFORMATICS AND REPRODUCIBLE RESEARCH IN HIGH-THROUGHPUT BIOLOGY". In: _Ann. Appl. Stat._ 3.4, pp. 1309-1334. Broman, K. W. and K. H. Woo (2018). "Data Organization in Spreadsheets". In: _Am. Stat._ 72.1. Publisher: Taylor & Francis, pp. 2-10. ISSN: 0003-1305. DOI: [10.1080/00031305.2017.1375989](https://doi.org/10.1080%2F00031305.2017.1375989). URL: [https://doi.org/10.1080/00031305.2017.1375989](https://doi.org/10.1080/00031305.2017.1375989). Bryan, J. (2018). "Excuse Me, Do You Have a Moment to Talk About Version Control?" In: _Am. Stat._ 72.1, pp. 20-27. Markowetz, F. (2015). "Five selfish reasons to work reproducibly". En. In: _Genome Biol._ 16, p. 274. OECD Global Science Forum (2020). _Building digital workforce capacity and skills for data-intensive science_. OECD. Sandve, G. K., A. Nekrutenko, et al. (2013). "Ten simple rules for reproducible computational research". En. In: _PLoS Comput. Biol._ 9.10, p. e1003285. Wilson, G., D. A. Aruliah, et al. (2014). "Best practices for scientific computing". En. In: _PLoS Biol._ 12.1, p. e1001745. Wilson, G., J. Bryan, et al. (2017). "Good enough practices in scientific computing". En. In: _PLoS Comput. Biol._ 13.6, p. e1005510. Xie, Y. (2019). _xaringan: Presentation Ninja_. R package version 0.12. URL: [https://CRAN.R-project.org/package=xaringan](https://CRAN.R-project.org/package=xaringan). ] --- # Intro to Repro in R Emma Rand [emma.rand@york.ac.uk](mailto:emma.rand@york.ac.uk) Twitter: [@er13_r](https://twitter.com/er13_r) GitHub: [3mmaRand](https://github.com/3mmaRand) blog: https://buzzrbeeline.blog/ <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. Rand, E. (2023). White Rose BBSRC DTP Training: An Introduction to Reproducible Analyses in R (Version v1.2). https://doi.org/10.5281/zenodo.3859818