background-image: url(pics/Royal_Society_of_Biology.png) background-position: 50% 65% background-size: 400px class: inverse # Reproducible Analyses in R ### Emma Rand <br> emma.rand@york.ac.uk <br> University of York <br> 2019-12-09 .footnote[ Made with xaringan (Xie, 2019) ] --- # Goal! By the end of today you'll be able to do something like this: <!-- standard markdown report in pdf, word or html --> <!-- include a table, inline code, a figure, --> ## demo..... --- # What this workshop is and is not .pull-left[ Is an introduction to * R * some universal and highly transferable concepts * the skills for organising reproducible analyses  ] -- .pull-right[ It is not * An introduction to statistics<sup>1</sup> * Magic [1] but ... ] --- # Workshop overview
--- background-image: url(pics/tidyverse_logo.png) background-position: 100% 0% background-size: 180px # Before Lunch ## Introduction 1000 - 1115 * Rationale for scripting * Why use R to script * Finding your way round RStudio * Typing in data, doing some calculations on it, plotting it * Understanding the manual ## Working with Data 1115 - 1230 * Importing data: working directories and paths * Summarising and visualising with the [`tidyverse`](https://www.tidyverse.org/) * Installing and loading packages * The concept of 'tidy' data (Wickham, 2014) --- class: inverse, # Introduction ### 1000 - 1115 --- background-image: url(pics/rationale1.png) background-position: 50% 35% background-size: 600px # Rationale for scripting analysis <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> .pull-left[ Generating the results ] .pull-right[ Analysing and reporting them ] --- background-image: url(pics/rationale2.png) background-position: 50% 35% background-size: 600px # Rationale for scripting analysis --- background-image: url(pics/rationale3.png) background-position: 50% 35% background-size: 600px # Rationale for scripting analysis --- background-image: url(pics/rationale5.png) background-position: 50% 35% background-size: 800px # Rationale for scripting analysis --- # Rationale for scripting analysis ## Why R? * Open source and free -- .......But so is Python -- * R caters to users who do not see themselves as programmers, but then allows them to slide gradually into programming  -- * Language for data analysis and graphics -- * Community - welcoming, inclusive. [RForwards](https://forwards.github.io/), [RLadies](https://rladies.org/) -- * R Markdown R's "killer feature" .......Super-charged reproducibility! --- # Finding your way round RStudio ## Live demonstration Overview [Larger](http://www-users.york.ac.uk/~er13/RStudio%20Anatomy.svg) <img src="http://www-users.york.ac.uk/~er13/RStudio%20Anatomy.svg" width="600px" /> There is an [RStudio cheatsheet](rstudio-ide.pdf) which covers more advanced RStudio features. --- # Finding your way round RStudio ## Live demonstration * the panels * making yourself comforatable * typing in the console sending commands * using R as a calculator * assigning values * where to see objects * using a script - typing doesn't make it happen you need to execute * comments \# * data types and structures * functions `c()`, `class()` and `str()` * types of of R files: .R, .RData . RHistory --- # Finding your way round RStudio ## Summary of live demonstration: <mark style="background-color: #e6f0ff">The panels</mark> .pull-left[ Top left * Script - where you write, edit and save code and comments you want to keep --- Bottom left * console - commands get executed here and can be typed here ] .pull-right[ Top right * Environment - where to see your ojects * History - a list of every command you sent --- Bottom right * Files - a file explorer * Packages - the packages you have installed and a method of installing packages * Help - the manual * Plots - where your plots go ] --- # Finding your way round RStudio ## Summary of Live demonstration: <mark style="background-color: #e6f0ff">Types of R files</mark> * .R a script file: code (to save) and comments * .RData a environment file also know as a workspace: objects but no code * .RHistory (everything you typed, mostly wrong!) ### Using a script * any R code can be executed from a script * code can be (should be!) commented * comments start with a `#` --- # Finding your way round RStudio ## Summary of Live demonstration: <mark style="background-color: #e6f0ff">Data types and structures</mark> These are the most commonly needed but there are others .pull-left[ Types * numeric * integer * logical * character ] .pull-right[ Structures * vectors * factors * dataframes ] --- # We are here ## Introduction 1000 - 1115 * Rationale for scripting * Why use R to script * Finding your way round RStudio * <mark style="background-color: #e6f0ff">Typing in data, doing some calculations on it, plotting it</mark> * Understanding the manual -- For slides purposes called... ### <mark style="background-color: #e6f0ff">Typing in data, calcs, plots</mark> --- # Typing in data, calcs, plots ## The goal We will work some data on the number of males in 64 bird nests with a clutch size of 5. -- .pull-left[ You are going to make this data, find the mean number of males per nest... <table> <thead> <tr> <th style="text-align:right;"> No. males </th> <th style="text-align:right;"> No. nests </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 13 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 14 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 15 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 13 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 5 </td> </tr> </tbody> </table> ] -- .pull-right[ ... and plot it <!-- --> ] --- # Typing in data, calcs, plots ## Getting set up In RStudio do File | New project | New directory Be purposeful about where you create it and what you name it. -- Make a [new script](pics/newscript.png) file called RBS_am.R to carry out the rest of the work. --- # Typing in data, calcs, plots ## Putting the data into R data structures Make a vector `n` that holds the numbers 0 to 5. ```r # the number of males in a clutch of five n <- c(0, 1, 2, 3, 4, 5) ``` * Write your command in the script file -- * Notice I have used a comment -- * Make sure you have your cursor on the line you want to execute -- * Run it using the  button or doing Ctrl+Enter<sup>1</sup>. .footnote[ [1] Ctrl+Enter is the default shortcut to run the current line or selection. I prefer to set to Ctrl+R] --- # Typing in data, calcs, plots ## Putting the data into R data structures Let's take a look at it using `str()` (structure) and `class()`<sup>1</sup>: .footnote[ [1] These commands are useful to check what data type you're working with. ] -- ```r class(n) ``` ``` ## [1] "numeric" ``` -- ```r str(n) ``` ``` ## num [1:6] 0 1 2 3 4 5 ``` It's a numeric vector. --- # Typing in data, calcs, plots ## Putting the data into R data structures Now create a vector called `freq` containing the numbers of nests with 0 to 5 males. ```r # the number of nests with 0 to 5 males freq <- c(4, 13, 14, 15, 13, 5) ``` -- We can use `sum(freq)` to check we get the total number of nests expected, 64 ```r # the total number of nests sum(freq) ``` ``` ## [1] 64 ``` --- # Typing in data, calcs, plots ## Finding the mean Our data is in the form of a frequency table so to find the mean number of males per nest we need find the total number of males like this: .pull-left[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> No. males </th> <th style="text-align:left;"> No. nests </th> <th style="text-align:left;"> No. males *No. nests </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 0 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 0 </td> </tr> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 13 </td> <td style="text-align:left;"> 13 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 14 </td> <td style="text-align:left;"> 28 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 15 </td> <td style="text-align:left;"> 45 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 13 </td> <td style="text-align:left;"> 52 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;background-color: #C0C2C9 !important;"> Total </td> <td style="text-align:left;font-weight: bold;background-color: #C0C2C9 !important;"> 64 </td> <td style="text-align:left;font-weight: bold;background-color: #C0C2C9 !important;"> 163 </td> </tr> </tbody> </table> ] .pull-right[ `$$\frac{163}{64} = 2.55$$` ] --- # Typing in data, calcs, plots ## Finding the mean Find the total number of nests ```r # total number of nests total_nests <- sum(freq) ``` -- Find the total number of males ```r # total number of males total_males <- sum(n * freq) ``` -- Find the mean number of males per nest ```r # mean number males/nest total_males/total_nests ``` ``` ## [1] 2.546875 ``` --- # Typing in data, calcs, plots ## Finding the mean R works 'elementwise' unlike most programming lanaguges. ` n * freq` gives `$$\begin{bmatrix}0\\1\\2\\3\\4\\5\end{bmatrix}\times\begin{bmatrix}4\\13\\14\\15\\13\\5\end{bmatrix}=\begin{bmatrix}0\\13\\28\\45\\52\\25\end{bmatrix}$$` It was designed to make it easy to work with data. --- # Typing in data, calcs, plots ## Plotting .pull-left[ Now for our first figure using the base<sup>1</sup> plotting function `barplot()`. Making default figures is often easy - useful if you just want to quickly explore your data. .footnote[ [1] the 'base' plotting system is part of the base distribution, _i.e.,_ all the packages that come with the basic installation of R. ] ] -- .pull-right[ Create simple barplot like this: ```r barplot(freq) ``` <!-- --> ] --- # Using the help manual The default barplot doesn't have: * labels to say what each bar represents * axis labels -- 'Arguments' can be added to the `barplot()` command. Commands do something with/to an object and their arguments specify: * what object to do it to * how exactly to do it -- Many commands have defaults so you need only supply an object. -- Open the manual page using: ```r ?barplot ``` --- # Using the help manual ## Key points * **Description** an overview of what the command does * **Usage** lists argument * format: argument name = default value * some arguments MUST be supplied, they have no defaults * ... essentially, etc * **Arguments** gives the detail on how the arguments work * **Details** decribes how the command works in more detial * **Value** gives the output of the command * Don't be too perturbed by not fully understanding the information --- # Typing in data, calcs, plots ## Plotting .pull-left[ ```r barplot(freq, names.arg = n) ``` <!-- --> ] -- .pull-right[ We can add names using the `names.arg` argument. The names of the bars are the numbers of males in a nest. These are in our variable `n` Arguments are separted by commas `command(argument1, argument2, argument3, ...)` ] --- # Typing in data, calcs, plots ## Plotting .pull-left[ Can you use the manual to add labels for both axes? Skip to the next slide if you want more challenge. ] -- .pull-right[ ```r barplot(freq, names.arg = n, xlab = "Number of males", ylab = "Number of nests") ``` <!-- --> ] --- # Typing in data, calcs, plots ## Plotting .pull-left[ Can you use the manual to add labels for both axes? Skip to the next slide if you want more challenge. <style> div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;} div.grey { background-color:#d3d3d3; border-radius: 0px; padding: 0px;} </style> <div class = "blue"> **Top Tip** Make your code easier to read by using white space and new lines * put spaces around `=` , `->` and after `,` * use a newline after every comma in a command with lots of arguments ] </div> .pull-right[ ```r barplot(freq, names.arg = n, xlab = "Number of males", ylab = "Number of nests") ``` <!-- --> ] --- # Plotting - extra challenges .pull-left[ Make all the bars red. Colours can be given by their name: "red" Or by their hexadecimal code: "#FF3030". You can look these up as follows: [By colour name](pics/colournames.pdf) [By hexadecimal code](pics/colourhex.pdf) Skip to the next slide if you want more challenge. ] -- .pull-right[ ```r barplot(freq, names.arg = n, xlab = "Number of males", ylab = "Number of nests", col = "#FF3030") ``` <!-- --> ] --- # Plotting - extra challenges .pull-left[ Make the bars alternating red, blue, and yellow. To solve this you will need to put together the concept of setting an argument with the concept of collecting several things together to make one thing. ] -- .pull-right[ ```r barplot(freq, names.arg = n, xlab = "Number of males", ylab = "Number of nests", col = c("red", "blue", "yellow")) ``` <!-- --> ] --- background-image: url(pics/ggplot.jpg) background-position: 100% 0% background-size: 100px # Typing in data, calcs, plots ## Plotting using ggplot2 The `barplot()` command is part the 'base' plotting system. Base packages (collections of commands) always come with R. -- Other packages, such as ggplot2 (Wickham, 2016)<sup>1</sup> need to be added ('installed'). .footnote[ [1] A fantastic plotting system which has a common conceptual framework for visualisation based on [The Grammar of Graphics](https://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448/ref=as_li_ss_tl?ie=UTF8&qid=1477928463&sr=8-1&keywords=the+grammar+of+graphics&linkCode=sl1&tag=ggplot2-20&linkId=f0130e557161b83fbe97ba0e9175c431) ] --- # Packages ## Installing - you only do once `ggplot2` is one of the `tidyverse` (Wickham, 2016) packages. I recommend installing all of `tidyverse` ```r install.packages("tidyverse") ``` -- ## Loading - you do once each R session To use a package which has already been installed you need to 'add it to your library' with the `library()` command. ```r library(tidyverse) ``` -- `tidyverse` includes `ggplot2` as well as `dplyr` and `tidyr` functions from which we will also use. --- background-image: url(pics/ggplot.jpg) background-position: 100% 0% background-size: 100px # Plotting using ggplot2 ## `ggplot2` or `ggplot()` `ggplot2` is the name of the package `ggplot()` is its most important command -- ## Compared to the base plotting system Using `ggplot2` might seem more complex than the base plotting system at first. However, graphs only need to get very slightly more involved for it to become much easier to use. -- `barplot()` needs a vector of bar heights `ggplot()` needs a dataframe --- background-image: url(pics/ggplot.jpg) background-position: 100% 0% background-size: 100px # Plotting using ggplot2 ## `ggplot()` takes a dataframe for an argument First, make sure you have a dataframe. We need to make one from `n` and `freq` ```r nest_data <- data.frame(n, freq) ``` -- And check ```r str(nest_data) ``` ``` ## 'data.frame': 6 obs. of 2 variables: ## $ n : num 0 1 2 3 4 5 ## $ freq: num 4 13 14 15 13 5 ``` -- `nest_data` is just a name we have given the dataframe Click on `nest_data` in the Environment pane to see a spreadsheet-like view of it. --- background-image: url(pics/ggplot.jpg) background-position: 100% 0% background-size: 100px # Plotting using ggplot2 ## A barplot Create a simple barplot using `ggplot` like this: ```r ggplot(data = nest_data, aes(x = n, y = freq)) + geom_col() ``` .pull-left[ <!-- --> ] --- background-image: url(pics/ggplot.jpg) background-position: 100% 0% background-size: 100px # Plotting using ggplot2 ## A barplot `ggplot()` alone creates a blank plot. -- `ggplot(data = nest_data)` looks the same. -- `aes()` gives the 'Aesthetic mappings'. How variables (columns) are mapped to visual properties (aesthetics) e.g., axes, colour, shapes. -- `ggplot(data = nest_data, aes(x = n, y = freq))` produces a plot with axes -- `geom_col` A 'Geom' (Geometric object) gives the visual representations of the data: points, lines, bars, boxplots etc. --- background-image: url(pics/ggplot.jpg) background-position: 100% 0% background-size: 100px # Plotting using ggplot2 You can add an axis label like this: .pull-left[ ```r ggplot(data = nest_data, aes(x = n, y = freq)) + geom_col() + xlab("Number of males") ``` <!-- --> ] -- .pull-right[ <br><br> Each 'layer' is added to the `ggplot()` command with a '+' <div class = "blue"> **We can customise much more and will return to this later.....** <small>[ggplot cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf) if you can't wait</small> ] </div> --- class: inverse, # Working with Data ### 1115 - 1230 --- # Working with Data ## The goal <table> <thead> <tr> <th style="border-bottom:hidden" colspan="2"></th> <th style="border-bottom:hidden; padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; "> Interorbital distance</div></th> </tr> <tr> <th style="text-align:left;"> Population </th> <th style="text-align:right;"> N </th> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> SE </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> A </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 11.2400 </td> <td style="text-align:right;"> 0.1170032 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 11.7025 </td> <td style="text-align:right;"> 0.0906167 </td> </tr> </tbody> </table> .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data: single variable ## Importing The Data: Measurements of interorbital width (in mm) in 40 domestic pigeons measured to the nearest 0.1 mm Save a copy [pigeon.txt](data/pigeon.txt) -- Import the data in to R with `read.table()`: ```r pigeon <- read.table("pigeon.txt", header = TRUE) ``` -- `header = TRUE` is an argument in the `read.table()` command which indicates that the datafile has a header, _i.e._, the first value in the file is the variable name -- You should see an object `pigeon` in the Environment window. --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data: single variable ## Importing Examine the structure ```r str(pigeon) ``` ``` ## 'data.frame': 40 obs. of 1 variable: ## $ interorbital: num 12.2 12.9 11.8 11.9 11.6 11.1 12.3 12.2 11.8 11.8 ... ``` The object created by `read.table()` is a dataframe. That's handy! --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data: single variable ## Understanding a dataframe A dataframe is made of columns and rows The columns are the variables; the rows are the observations -- To refer a column inside a dataframe (a single column data) we need to use the dollar notation: `dataframe$columnname` -- So to output all of the values of one column we can use: ```r pigeon$interorbital ``` ``` ## [1] 12.2 12.9 11.8 11.9 11.6 11.1 12.3 12.2 11.8 11.8 10.7 11.5 11.3 11.2 11.6 ## [16] 11.9 13.3 11.2 10.5 11.1 12.1 11.9 10.4 10.7 10.8 11.0 11.9 10.2 10.9 11.6 ## [31] 10.8 11.6 10.4 10.7 12.0 12.4 11.7 11.8 11.3 11.1 ``` --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data: single variable ## Summarising single group data R provides many functions to allow you to make calculations on your data. -- For example, the mean ```r mean(pigeon$interorbital) ``` ``` ## [1] 11.48 ``` -- Which functions will calculate the standard deviation and the number of cases, _i.e._, the length of the column? --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data: single variable ## Summarising single group data Standard deviation ```r sd(pigeon$interorbital) ``` ``` ## [1] 0.6917833 ``` -- Number of cases ```r length(pigeon$interorbital) ``` ``` ## [1] 40 ``` --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data: single variable ## Plotting single group data .pull-left[ ```r ggplot(data = pigeon, aes(x = interorbital)) + geom_histogram() + xlab("Interorbital distance (mm)") ``` <!-- --> ] --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data: single variable ## Plotting single group data .pull-left[ ```r ggplot(data = pigeon, aes(x = interorbital)) + geom_histogram(bins = 10, * col = "black") + xlab("Interorbital distance (mm)") ``` <!-- --> ] -- .pull-right[ <br> <br> <br> We will cover more figure options as we go through the afternoon. ] --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data: two variables ## Importing 2 This section will teach you about three concepts: -- 1. 'working directories', 'paths' and 'relative paths' -- 2. dealing with data in more than group -- 3. the concept of 'tidy' data -- We will work with the interorbital distances of domestion pigeons in two different populations: A and B --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data: two variables ## Importing 2 It is good practice to organise your files into a folder structure. For example, I often use:  --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data: two variables ## Importing 2 Make a folder in your Project directory (this is also your working directory) called 'data' The easiest way to do this is in RStudio - see the bottom right Files panel -- Forgotten where you are? Use ```r getwd() ``` -- Save a copy of [pigeon2.txt](data/pigeon2.txt) to the 'data' folder --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data: two variables ## Importing To read the data in to R you need to use the 'relative path' to the file in the `read.table()` command: ```r pigeon2 <- read.table("data/pigeon2.txt", header = TRUE) ``` -- The `data/` part is the 'relative path' to the file. -- It says where the file is **relative to your working directory** pigeon2.txt is inside a folder (directory) called 'data' which is in your working directory. --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data: two variables ## Importing Look at the structure of the dataframe pigeon2 using `str()` ```r str(pigeon2) ``` ``` ## 'data.frame': 40 obs. of 2 variables: ## $ A: num 12.4 11.2 11.6 12.3 11.8 10.7 11.3 11.6 12.3 10.5 ... ## $ B: num 12.6 11.3 12.1 12.2 11.8 11.5 11.2 11.9 11.2 12.1 ... ``` -- It's a dataframe with two variables (columns) called A and B The datatype of both variables is numeric. --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data: two variables ## Summarising multi-group data ### Find the mean for each population. Remember, to refer a column inside a dataframe (a single column data) we need to use the dollar notation: `dataframe$columnname` -- The mean of column $A ```r mean(pigeon2$A) ``` ``` ## [1] 11.24 ``` The mean of column $B ```r mean(pigeon2$B) ``` ``` ## [1] 11.7025 ``` --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Tidy format Instead of having a population in each column, we very often have, **and want**, data organised so the measurements are all in one column and a second column gives the group. -- This format is described as 'tidy' (Wickham, Averick, Bryan, Chang, McGowan, François, Grolemund, Hayes, Henry, Hester, Kuhn, Pedersen, Miller, Bache, Müller, Ooms, Robinson, Seidel, Spinu, Takahashi, Vaughan, Wilke, Woo, and Yutani, 2019). -- Tidy data has a variable in each column and only one observation (case) per row. --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Tidy format .pull-left[ Tidy data has a variable in each column and only one observation (case) per row. Suppose we had just 3 individuals in each population: **NOT TIDY!** <table> <thead> <tr> <th style="text-align:right;"> A </th> <th style="text-align:right;"> B </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 12.4 </td> <td style="text-align:right;"> 12.6 </td> </tr> <tr> <td style="text-align:right;"> 11.2 </td> <td style="text-align:right;"> 11.3 </td> </tr> <tr> <td style="text-align:right;"> 11.6 </td> <td style="text-align:right;"> 12.1 </td> </tr> </tbody> </table> ] .pull-right[ <br> <br> <br> **TIDY!** <table> <thead> <tr> <th style="text-align:left;"> population </th> <th style="text-align:right;"> distance </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> A </td> <td style="text-align:right;"> 12.4 </td> </tr> <tr> <td style="text-align:left;"> A </td> <td style="text-align:right;"> 11.2 </td> </tr> <tr> <td style="text-align:left;"> A </td> <td style="text-align:right;"> 11.6 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 12.6 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 11.3 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 12.1 </td> </tr> </tbody> </table> ] --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Tidy format We can make the data tidy with `gather()`<sup>1</sup>. `gather()` needs 3 arguments: .footnote[ [1] `gather()` is a function from a package called `tidyr` which is one of the `tidyverse` packages. ] -- * the dataframe -- * a `key` which will become the name of the grouping variable -- * a `value` the measurement variable. -- ```r pigeon3 <- gather(data = pigeon2, key = population, value = distance) str(pigeon3) ``` ``` ## 'data.frame': 80 obs. of 2 variables: ## $ population: chr "A" "A" "A" "A" ... ## $ distance : num 12.4 11.2 11.6 12.3 11.8 10.7 11.3 11.6 12.3 10.5 ... ``` --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Summarising multi-group data Now we have a dataframe in tidy format which will make it easier to summarise, analyse and visualise. -- How can we summarise now? -- If we use ```r mean(pigeon3$distance) ``` ``` ## [1] 11.47125 ``` We will get the mean of the two populations combined. -- To get the mean of each population we can use `tapply()`. --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Summarising multi-group data `tapply()` takes three arguments: -- * the measurement variable -- * the grouping variable -- * function to be applied: `tapply(measurement, grouping, function)` --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Summarising multi-group data ### Using `tapply()` ```r tapply(pigeon3$distance, pigeon3$population, mean) ``` ``` ## A B ## 11.2400 11.7025 ``` -- Find the number of cases in each group .footnote[ <div class = "blue"> Finshed and need more challenge? Go to the next slide </div> ] -- The number of cases ```r tapply(pigeon3$distance, pigeon3$population, length) ``` ``` ## A B ## 40 40 ``` --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Summarising multi-group data ### Extra challenges using `tapply()` Find the variance in each population .footnote[ <div class = "blue"> Finshed and need more challenge? Go to the next slide </div> ] -- ```r tapply(pigeon3$distance, pigeon3$population, var) ``` ``` ## A B ## 0.5475897 0.3284551 ``` --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Summarising multi-group data ### Extra challenges using `tapply()` Calculate the standard errors for each group. Standard errors are given by $$s.e. = \frac{s.d.}{\sqrt{n}} $$ where `\(s.d.\)` is the standard deviation of a sample and `\(n\)` is the size of the sample. -- You will need to * calculate and assign the standard deviations to variable * calculate and assign the number of cases to variable * find out what function finds the squareroot of its argument * apply what you know about the elementwise way R works --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Summarising multi-group data ### Extra challenges using `tapply()` Calculate the standard errors for each group. ```r sds <- tapply(pigeon3$distance, pigeon3$population, sd) ns <- tapply(pigeon3$distance, pigeon3$population, length) ses <- sds / sqrt(ns) ses ``` ``` ## A B ## 0.11700318 0.09061666 ``` -- To round ```r round(ses, 3) ``` ``` ## A B ## 0.117 0.091 ``` --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Plotting multi-group data .pull-left[ ```r ggplot(data = pigeon3, aes(x = distance)) + geom_histogram(bins = 10, col = "black") + xlab("Interorbital distance (mm)") ``` <!-- --> ] .pull-right[ <br> <br> <br> <br> Plots the whole data set without separating the populations ] --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Plotting multi-group data ### Using the `aes()` 'map' the population variable with the `fill` aesthetic ```r *ggplot(data = pigeon3, aes(x = distance, fill = population)) + geom_histogram(bins = 10, col = "black") + xlab("Interorbital distance (mm)") ``` <!-- --> --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Plotting multi-group data ### Using the `aes()` Aesthetics include: * x * y * color * fill * shape * size * alpha --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Plotting multi-group data ### Changing the `geom` From `geom_histogram()` to `geom_density()` ```r ggplot(data = pigeon3, aes(x = distance, fill = population)) + * geom_density(col = "black") + xlab("Interorbital distance (mm)") ``` <!-- --> --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Plotting multi-group data ### And altering the transparancy using `alpha` ```r ggplot(data = pigeon3, aes(x = distance, fill = population)) + * geom_density(col = "black", alpha = 0.3) + xlab("Interorbital distance (mm)") ``` <!-- --> --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Plotting multi-group data ### Themes ```r ggplot(data = pigeon3, aes(x = distance, fill = population)) + geom_density(col = "black", alpha = 0.3) + xlab("Interorbital distance (mm)") + * theme_classic() ``` <!-- --> --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Plotting multi-group data ### A different kind of plot Instead of plotting the distribution, we can plot the population on the _x_-axis and the distances on the _y_-axis **as points** --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Plotting multi-group data ### A different kind of plot ```r *ggplot(data = pigeon3, aes(x = population, y = distance)) + * geom_point() + xlab("Population") + ylab("Interorbital distance (mm)") ``` <!-- --> --- background-image: url(pics/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Plotting multi-group data ### Another plot ```r ggplot(data = pigeon3, aes(x = population, y = distance)) + * geom_violin() + xlab("Population") + ylab("Interorbital distance (mm)") + theme_classic() ``` <!-- --> --- class: inverse, # Lunch! --- # Workshop overview
--- class: inverse, # Reproducible Reports: R Markdown ### 1315 - 1430 --- # Reproducible Reports: R Markdown <iframe src="https://forms.gle/EqQKx4nLQQemQZWb8" width="500" height="700" frameborder="0" marginheight="0" marginwidth="0">Loading...</iframe> --- # Reproducible Reports: R Markdown ## How do you work? * What program do you analyse your data in? * What program do you plot your data in? * What program do you use to write up the results to submit to a journal (or similar)? * What is your process for getting your summary data, statistical results, tables and figures in to your report? -- * What do you do when you get additional data that increases your sample sizes? -- * What do you do if you wrote in Word formated for one journal and now have to submit in PDF formated for another? --- # Reproducible Reports: R Markdown ## How do you work? Typically people analyse, plot and write up in different programs. Graphs are saved to files and copied and pasted into the final report. This process relies on manual labour. If the data changes, the author must repeat the entire process to update the report. --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown The brilliance of R Markdown (Allaire, Xie, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng, Chang, and Iannone, 2019; Xie, Allaire, and Grolemund, 2018) is that you can use a **single R Markdown file** to: * save and execute code * do all your data processing, analysis and plotting * generate high quality reports that can be shared with an audience This is a time-efficient and reproducible way to write!  --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Many output formats are supported! .pull-left[  ] .pull-right[ * Word * Webpages - many styles and themes * PDF * journal article formats for many journals * webslides * powerpoint * books * blogs * posters * web applications including interactive ] --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Let's get started ### Organising your work You are going to work with some made-up data on the myoglobin content in the skeletal muscle of three species of seal -- Create a new project called 'seals' -- Inside 'seals', make folders called 'raw_data' and 'processed_data' -- I strongly recommend avoiding spaces in names of files, folders and variable names. R can cope with them.... but they can often confuse humans! -- Save a copy of [seal.txt](seals/raw_data/seal.txt) to the 'raw_data' folder --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Live demo Just watch for a while.... --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Key points from the demo * mixes text and code * human readable * YAML header between the \-\-\- * code chunk options control whether the code and its output end up in your 'knitted' document * comments * in a code chunk the \# is still used for comments * in the text a comment is written like this <!\-\- a comment \-\-> * but use Ctrl+Shift+C * \# in the text indicate headings --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Your own R markdown file File | New File | R Markdown -- Delete everything except the YAML header the first code chunk -- Add your name, and a title You could copy and paste a title from [paper.html](seals/paper.html) --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Your own R markdown file ### Default code chunk behaviour. Set some **default** code chunk options. I recommend these: ```` ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE) ``` ```` `echo = FALSE` means the code will not be included by default - this is normally what you want in a report. Any output is included by default. --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Your own R markdown file ### References References and citations can be added by creating a .bib file containing references in BibTeX format and another line to the YAML header. -- `bibliography: mybibfile.bib` must be added to the YAML header -- Citations are added using: * blah blah blah `[@xaringan]` for blah blah blah (Xie, 2019). * Xie `[-@xaringan]` said blah blah blah for Xie (2019) said blah blah blah. -- Every citation used results in the reference being added to a list at the bottom of the output. --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Your own R markdown file ### References #### Make a .bib file Do File | New file | Text file Save as `mybibfile.bib` #### Add BibTeX entries `citation("package")` in the console will give packages references in BibTeX format. BibTeX format is also available through most referencing software (e.g., PaperPile). --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Add a section of text ### Introduction Add a header for the introductions and text - again you could copy and paste. ``` # Introduction Aquatic and marine mammals are able to dive underwater................................. ...investigated whether the concentration of myoglobin differed between species. ``` -- Save your file. --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Add a code chunk ### Packages The first two code chunks are usually for the default code chunk options (which I tend to call `setup`) and for package loading.Use Insert | R to add a code chunk Load the tidyverse .pull-left[ ```` ```{r packages} library(tidyverse) ``` ```` ] .pull-right[ * `r`indicates it is an R code chunk * `packages` is just a name for the chunk. Naming chunks makes debugging easier. ] --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Add a section of text ### Methods Add a heading for the Methods section including a reference to R. ``` # Methods We used R [@R-core] with tidyverse packages [@tidyverse] for all analyses. ``` -- Hit Knit! This should render as: We used R (R Core Team 2019) with tidyverse packages (Wickham 2017) for all analyses. --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Add a code chunk ### Importing data Insert a code chunk for importing the data ```` ```{r import} # Data import # the data are organised in to three columns, one for each species. seal <- read.table("raw_data/seal.txt", header = TRUE) ``` ```` --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Run a code chunk ### Interactively.. Note you can run the code interactively to check your progress and the output will be shown in the Rmd document. ```` ```{r import} # Data import # the data are organised in to three columns, one for each species. seal <- read.table("raw_data/seal.txt", header = TRUE) str(seal) ``` ```` ```` 'data.frame': 30 obs. of 3 variables: $ Harbour : num 49.7 51 41.6 45.6 39.4 ... $ Weddell : num 55.4 40.1 46.3 29.8 52.5 ... $ Bladdernose: num 56.2 48.4 37.8 42.8 27 ... ```` --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Add a code chunk ### Tidying data Insert a code chunk for tidying the data and writing it out to a file ```` ```{r tidy} # put the data in tidy format seal2 <- gather(data = seal, species, myoglobin) # write tidy data to processed write.table(seal2, "processed_data/seal2.txt", row.names = FALSE) ``` ```` --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Add a code chunk ### Summarising data Insert a code chunk for summarising the data You may need to install `Rmisc` first. -- And add it to your `package`chunk. ```` ```{r packages} library(tidyverse) library(Rmisc) ``` ```` --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Add a code chunk ### Summarising data ```` ```{r summarise} sealsummary <- summarySE(data = seal2, measurevar = "myoglobin", groupvars = "species") ``` ```` --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## What is in `sealsummary`? ### Select a section of code And run (ctrl-enter) <img src="pics/sealsummary.png" width="700px" /> -- Knit your file --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Inline code ### Example Code results can be inserted directly into the text of a .Rmd file using inline code. Inline code goes between ` `r` and ` ` ` . For example by writing: The squareroot of 2 is ` `r` `sqrt(2)` ` ` ` you will get The squareroot of 2 is 1.4142136 --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Inline code ### Discuss the highest mean We could add The highest mean is ` `r` `max(sealsummary$myoglobin)` ` ` ` to our text. When we knit the document we would get The highest mean is 49.01033 --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Insert a code chunk ### To access object contents more easily Insert a code chunk extracting values to make inline reporting easier Don't worry too much about completely understanding the code ```` ```{r extract} # extract values for inline reporting highestmean <- max(sealsummary$myoglobin) highestse <- sealsummary$se[sealsummary$myoglobin == highestmean] highestspp <- sealsummary$species[sealsummary$myoglobin == highestmean] lowestmean <- min(sealsummary$myoglobin) lowestse <- sealsummary$se[sealsummary$myoglobin == lowestmean] lowestspp <- sealsummary$species[sealsummary$myoglobin == lowestmean] ``` ```` --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Your turn. ### Add a results section --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown ## Adding text ### Special characters You can include special characters in a markdown document using LaTeX markup. This has \$ signs on the outside and uses backslashes and curly braces to indicate that what follows should be interpreted as a special character of special formatting. For example, to get `\(\bar{x} \pm s.e.\)` you write `$\bar{x} \pm s.e.$` --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # Reproducible Reports: R Markdown Go to live demo of document --- background-image: url(pics/rmarkdown.png) background-position: 100% 0% background-size: 100px # R Markdown Resources ## R Markdown cheatsheet https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf ## R Markdown: The Definitive Guide Yihui Xie, J. J. Allaire, Garrett Grolemund https://bookdown.org/yihui/rmarkdown/ ## RStudio’s guide http://rmarkdown.rstudio.com/index.html --- # References Allaire, J, Y. Xie, J. McPherson, et al. (2019). _rmarkdown: Dynamic Documents for R_. R package version 1.16. URL: [https://github.com/rstudio/rmarkdown](https://github.com/rstudio/rmarkdown). Wickham, H. (2014). "Tidy Data". In: _Journal of Statistical Software, Articles_ 59.10, pp. 1-23. Wickham, H. (2016). _ggplot2: Elegant Graphics for Data Analysis_. Springer-Verlag New York. ISBN: 978-3-319-24277-4. URL: [https://ggplot2.tidyverse.org](https://ggplot2.tidyverse.org). Wickham, H, M. Averick, J. Bryan, et al. (2019). "Welcome to the Tidyverse". In: _JOSS_ 4.43, p. 1686. Xie, Y. (2019). _xaringan: Presentation Ninja_. R package version 0.12. URL: [https://CRAN.R-project.org/package=xaringan](https://CRAN.R-project.org/package=xaringan). Xie, Y, J. Allaire, and G. Grolemund (2018). _R Markdown: The Definitive Guide_. ISBN 9781138359338. Boca Raton, Florida: Chapman and Hall/CRC. URL: [https://bookdown.org/yihui/rmarkdown](https://bookdown.org/yihui/rmarkdown). --- background-image: url(pics/Royal_Society_of_Biology.png) background-position: 0.5% 0.5% background-size: 350px class: inverse # Congratulations! Keep practising! <br> <br> <br> <br> Emma Rand emma.rand@york.ac.uk Twitter: [@er13_r](https://twitter.com/er13_r) github: [3mmaRand](https://github.com/3mmaRand) blog: https://buzzrbeeline.blog/ <br> .footnote[ <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Royal Society of Biology CPD: An Introduction to Reproducible Analyses in R</span> by <span xmlns:cc="http://creativecommons.org/ns#" property="cc:attributionName">Emma Rand</span> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. ]