Introduction to R and working with data.

class: center, middle, inverse, title-slide

.title[
# Introduction to R and working with data.
]
.subtitle[
## White Rose BBSRC DTP Training: An Introduction to Reproducible Analyses in R.
]
.author[
### Emma Rand
]
.institute[
### University of York, UK
]

---

<style>
div.blue { background-color:#b0cdef; border-radius: 5px; padding: 20px;}
div.grey { background-color:#d3d3d3; border-radius: 0px; padding: 0px;}
</style>

# Overview

* Finding your way round RStudio  
* Typing in data and plotting it
* Understanding the manual  
* Working with a data  
  * Importing data: working directories and paths  
  * Summarising and visualising with the [`tidyverse`](https://www.tidyverse.org/)

<img src="pics/tidyverse_logo.png" width="160px" style="display: block; margin: auto 0 auto auto;" />

---

class: inverse

#  Finding your way round RStudio

---

# RStudio: live demonstration

Overview [Larger](http://www-users.york.ac.uk/~er13/RStudio%20Anatomy.svg). **Will be followed be a recap**

<img src="http://www-users.york.ac.uk/~er13/RStudio%20Anatomy.svg" width="600px" />

There is an [RStudio cheatsheet](http://www-users.york.ac.uk/~er13/rstudio-ide.pdf) which covers more advanced RStudio features.

---
# RStudio: Recap

* the panels
* making yourself comfortable
* typing in the console sending commands
* using R as a calculator
* assigning values
* where to see objects
* using a script - make sure to execute
* comments \#
* data types and structures
* functions `c()`, `class()` and `str()`
* types of R files: .R, .RData .RHistory

---
# RStudio: Recap

.pull-left[
Top left Panel
* Script - write and edit code and comments to keep  
 
---
Bottom left Panel 
* Console - where commands get executed and can be typed  
]

.pull-right[
Top right Panel
* Environment - see your objects  
* History - of commands  
--- 
Bottom right Panel
* Files - a file explorer  
* Packages - those installed and a method of installing  
* Help - the manual  
* Plots
]

---
# RStudio: Recap
Type of file
* .R 
  a script file: code and comments
* .RData: a environment file also known as a workspace. Objects but no code
* .RHistory: everything you typed, mostly wrong!
 
Using a script 
* any R code can be executed from a script
* code can be (should be!) commented  
* comments start with a `#`

---
# RStudio: Recap
Data types and structures
These are the most commonly needed but there are others
.pull-left[
Types
* numeric
* integer
* logical
* character
]
.pull-right[
Structures
* vectors
* factors
* dataframes
]

---

class: inverse

# Typing in data and plotting it

---
# Typing in data and plots

## The goal

We will work with some data on the coat colour of 62 cats. You are going to type data in R, summarise and plot it

The data are as a frequency table:
--
.pull-left[

<table style="width:30%; font-size: 16px; margin-left: auto; margin-right: auto;" class="table">
<caption style="font-size: initial !important;">Frequency of coat colours in 62 cats</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Coat colour </th>
   <th style="text-align:right;"> No. cats </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> black </td>
   <td style="text-align:right;"> 23 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> white </td>
   <td style="text-align:right;"> 15 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> tabby </td>
   <td style="text-align:right;"> 8 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ginger </td>
   <td style="text-align:right;"> 10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> tortoiseshell </td>
   <td style="text-align:right;"> 5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> calico </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
</tbody>
</table>
]
.pull-right[

<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-4-1.png" width="288" />
]

---
# Typing in data and plots

## Getting set up

In RStudio do File | New project | New directory

Be purposeful about where you create it and name it.

I suggest `dtp-training`

--

Make a new script ![new script](pics/newscript.png) and save it as `cats.R` to carry out the rest of the work.

---
# Typing in data and plots

## Creating the data

Start by making a vector called `coat` that holds coat colours:

```r
# coat colours
coat <- c("black", 
          "white", 
          "tabby", 
          "ginger", 
          "tortoiseshell", 
          "calico")
```

-   Cursor on the line you want to execute, press Ctrl+Enter

---
# Typing in data and plots

## Creating the data

Create a vector called `freq` containing the numbers of cats with each coat colour.

--

```r
# numbers of cats with each coat colour
freq <- c(23, 15, 8, 10, 5, 1)
```

---
# Typing in data and plots

## Total number of nests

We can use `sum(freq)` to check the total number of cats is 62

```r
# the total Number of cats
sum(freq)
```

```
## [1] 62
```

---

class: inverse

# Plotting the data with ggplot()

---
background-image: url(pics/ggplot2.png)
background-position: 90% 75%
background-size: 200px

# Typing in data and plots

Commands like `c()`, `sum()`, and `str()` are part the 'base' R system.

Base packages (collections of commands) always come with R.

--

Other packages, such as `ggplot2` (Wickham, 2016) need to be added.

`ggplot2` is one of the `tidyverse`  packages.

---
background-image: url(pics/tidyverse.png)
background-position: 90% 75%
background-size: 200px
# Typing in data, calcs, plots

You should have already installed `tidyverse` but we need to load it (add it to our library) before we can use it in a session.

```r
library(tidyverse)
```

--

We will also later use `dplyr` and `tidyr` functions also from `tidyverse`.

--

`ggplot2` is the name of the package

`ggplot()` is its most important command

---
# Plotting using ggplot2

## Data structure for `ggplot()`

`ggplot()` takes a dataframe for an argument

Make a dataframe called `coat_data`:

```r
coat_data <- data.frame(coat, freq)
```

Click on `coat_data` in the Environment to open a spreadsheet-like view of it.

---
# Plotting using ggplot2

## A barplot

Create a simple barplot using `ggplot` like this:

```r
ggplot(data = coat_data, aes(x = coat, y = freq)) +
  geom_col()
```

<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-10-1.png" width="288" />

---
# Plotting using ggplot2

## A barplot

`ggplot()` alone creates a blank plot.

--

`ggplot(data = coat_data)` looks the same.

--

`aes()` gives the 'Aesthetic mappings'. How variables (columns) are mapped to visual properties (aesthetics) e.g., axes, colour, shapes.

Thus...
---
# Plotting using ggplot2

## A barplot

`ggplot(data = coat_data, aes(x = coat, y = freq))` produces a plot with axes

--

`geom_col` A 'Geom' (Geometric object) gives the visual representations of the data: points, lines, bars, boxplots etc.

---

class: inverse

# Using the help manual

---
# Using the help manual

'Arguments' can be added to the `geom_col()` command.

Commands do something

Their arguments go in brackets and can specify:
* what object to do it to  
* how exactly to do it

--

Many commands have defaults so you need only supply an object.

--

Open the manual page using:

```r
?geom_col()
```

## Demonstration
---
# Using the help manual: Recap

* **Description** an overview of what the command does  
* **Usage** lists argument    
  * form: argument name = default value  
  * some arguments MUST be supplied others have defaults
  * ... means etc
* **Arguments** gives the detail about the arguments 
* **Details** describes how the command works in more detail  
* **Value** gives the output of the command 
* Don't be too perturbed by not fully understanding the information

---
# Using manual: Alter a ggplot

Change the fill of the bars using `fill`:

```r
ggplot(data = coat_data, aes(x = coat, y = freq)) +
  geom_col(fill = "lightblue")
```
.pull-left[
<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-12-1.png" width="288" />
]
--
.pull-right[
Colours can be given by their name, "lightblue" or code, "#ADD8E6".

Look up by [name](pics/colournames.pdf) or [code](pics/colourhex.pdf)

]

---
# Using manual: Alter a ggplot

`fill` is an aesthetic.

We can set (map) fill aesthetic to a particular colour inside `geom_col()` or map it to a variable inside the `aes()` instead
---
# Using manual: Alter a ggplot

```r
ggplot(data = coat_data, aes(x = coat, y = freq, fill = freq)) +
  geom_col()
```
.pull-left[
<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-13-1.png" width="288" />
]
--
.pull-right[

Mapping fill to variable means the colour varies for each value of n. 
]
---
# Using manual: Alter a ggplot

Can you use the manual to put the bars next to each other?

.footnote[
<br>
<span style=" font-weight: bold;    color: #f6fafd !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: #25496b !important;" >Extra exercise:</span>  Change the colour of the lines around each bar to black.]
--

```r
ggplot(data = coat_data, aes(x = coat, y = freq)) +
  geom_col(fill = "lightblue", width = 1)
```

<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-14-1.png" width="288" />

---
# Using manual: Alter a ggplot

<span style=" font-weight: bold;    color: #f6fafd !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: #25496b !important;" >Extra exercise:</span>  Change the colour of the lines around each bar to black.

.pull-left[

```r
ggplot(data = coat_data,
       aes(x = coat, y = freq)) +
  geom_col(fill = "lightblue", 
           width = 1, 
           colour = "black")
```
]

.pull-right[
<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-15-1.png" width="288" />
]
---
# Top Tip

<div class = "blue">
.font100[
Make your code easier to read by using white space and new lines

* put spaces around `=` , `->` and after `,`  
* use a newline after every comma in a command with lots of arguments ] 
</div>

---
# Alter a ggplot: axes

We can make changes to the axes using:

-   Changes to a discrete x axis: `scale_x_discrete()`
-   Changes to a continuous y axis: `scale_y_continuous()`

`ggplot` automatically extends the axes slightly. You can turn this behaviour off with the `expand` argument in `scale_x_discrete()` and `scale_y_continuous()`.

--

Each 'layer' is added to the ggplot() command with a `+`

---
# Alter a ggplot: axes

Remove the gap between the axes and the data:

```r
ggplot(data = coat_data, aes(x = coat, y = freq)) +
  geom_col(fill = "lightblue", 
           width = 1, 
           colour = "black") +
  scale_x_discrete(expand = c(0, 0)) + 
  scale_y_continuous(expand = c(0, 0)) 
```

<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-16-1.png" width="288" />

.pull-right[
.footnote[
<span style=" font-weight: bold;    color: #fdf9f6 !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: #25496b !important;" >Extra exercise:</span>  Look up `scale_x_discrete` in the manual and work out how to change the axis title from "coat" to "Coat colour"]
]

---
# Alter a ggplot: axes

```r
ggplot(data = coat_data, aes(x = coat, y = freq)) +
  geom_col(fill = "lightblue", 
           width = 1, 
           colour = "black") +
  scale_x_discrete(expand = c(0, 0),
                   name = "Coat colour") + 
  scale_y_continuous(expand = c(0, 0),
                     name = "Number of cats") 
```

<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-17-1.png" width="288" />

---

# Alter a ggplot: themes

```r
ggplot(data = coat_data, aes(x = coat, y = freq)) +
  geom_col(width = 1, 
           colour = "black",
           fill = "lightblue") +
  scale_x_discrete(expand = c(0, 0),
                   name = "Coat colour") + 
  scale_y_continuous(expand = c(0, 0),
                     name = "Number of cats",
                     limits = c(0, 25)) +
  theme_classic()
```

---
<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-18-1.png" width="288" />

---

# Alter a ggplot: order of bars

Reorder the categories in `coat` by the the value in `freq`:

```r
ggplot(data = coat_data, 
       aes(x = reorder(coat, freq, decreasing = TRUE), 
           y = freq)) +
  geom_col(width = 1, 
           colour = "black",
           fill = "lightblue") +
  scale_x_discrete(expand = c(0, 0),
                   name = "Coat colour") + 
  scale_y_continuous(expand = c(0, 0),
                     name = "Number of cats",
                     limits = c(0, 25)) +
  theme_classic()
```

---
<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-19-1.png" width="288" />

---

class: inverse

# Working with imported data

---
# The goal
.pull-left[
Summarise
<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <thead>
<tr>
<th style="empty-cells: hide;border-bottom:hidden;" colspan="2"></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; "> Interorbital distance</div></th>
</tr>
  <tr>
   <th style="text-align:left;"> Population </th>
   <th style="text-align:right;"> N </th>
   <th style="text-align:right;"> Mean </th>
   <th style="text-align:right;"> SE </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> A </td>
   <td style="text-align:right;"> 40 </td>
   <td style="text-align:right;"> 11.24 </td>
   <td style="text-align:right;"> 0.12 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> B </td>
   <td style="text-align:right;"> 40 </td>
   <td style="text-align:right;"> 11.70 </td>
   <td style="text-align:right;"> 0.09 </td>
  </tr>
</tbody>
</table>
]
.pull-right[
Plot

<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-21-1.png" width="288" />
]
---
background-image: url(pics/interorbital.png)
background-position: 100% 0%
background-size: 180px

# Working with Data 
 
## Importing

This section will teach you about three concepts:

--

1. 'working directories', 'paths' and  'relative paths'

--

2. Tidy data

--

3. dealing with data in more than group

--

We will work with the interorbital distances of domestic pigeons in two different populations: A and B

---
# Working with Data 
## Organising

Make a folder in your Project directory (this is also your working directory) called `data-raw`.

The easiest way to do this is in RStudio  - see the bottom right Files panel

--

Save a copy of [pigeon.txt](data-raw/pigeon.txt) to the `data-raw` folder

---
# Working with Data

## Start coding

Make a new script called `pigeons.R`

--

Add this code:

```r
# load packages
library(tidyverse)
```

We need to load the `tidyverse` packages for several of commands we will use.

---
# Working with Data 
## Importing

To read the data in to R you need to use the 'relative path' to the file:

```r
pigeon <- read_table("data-raw/pigeon.txt")
```

--

The `data-raw/` part is the 'relative path' to the file.

--
It says where the file is **relative to your working directory**

`pigeon.txt` is inside a folder (directory) called `data-raw` which is in your working directory.

.footnote[
.font60[
More  on File systems and paths: [Computational Analysis for Bioscientists](https://3mmarand.github.io/comp4biosci/file_systems.html) (Rand, 2023).
]
]

---
# Working with Data

## Understanding the dataframe
A dataframe is made of columns and rows

The columns are the variables; the rows are the observations

---
# Working with Data

## Tidy format

Instead of having a population in each column, we often have, **and want**, all measurements in one column with a second column giving the group.

--

This format is described as 'tidy' .

--

Has one variable in each column and only one observation (case) per row.

--

Captures the structure of data and allows you to specify the role of variables in analyses and visualisations.
---
# Data Organisation

## What is tidy data?

One response per row.

Tidy data adhere to a consistent structure which makes it easier to manipulate, model and visualize them. The structure is defined by:

1. Each variable has its own column.  
2. Each observation has its own row.  
3. Each value has its own cell.

---
# Data Organisation

## What is tidy data?

The term 'tidy data' was popularised by Wickham (2014).

Closely allied to the relational algebra of relational databases (Codd, 1990). Underlies the enforced rectangular formatting in SPSS, STATA and R's dataframe.

--

There may be more than one potential tidy structure.

---
# Working with Data

## Tidy format

Suppose we had just 3 individuals in each of two populations:

.pull-left[
NOT TIDY!
<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> A </th>
   <th style="text-align:right;"> B </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 12.4 </td>
   <td style="text-align:right;"> 12.6 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 11.2 </td>
   <td style="text-align:right;"> 11.3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 11.6 </td>
   <td style="text-align:right;"> 12.1 </td>
  </tr>
</tbody>
</table>
]

.pull-right[

TIDY!
<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> population </th>
   <th style="text-align:right;"> distance </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> A </td>
   <td style="text-align:right;"> 12.4 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> A </td>
   <td style="text-align:right;"> 11.2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> A </td>
   <td style="text-align:right;"> 11.6 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> B </td>
   <td style="text-align:right;"> 12.6 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> B </td>
   <td style="text-align:right;"> 11.3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> B </td>
   <td style="text-align:right;"> 12.1 </td>
  </tr>
</tbody>
</table>
]

---
# Working with Data 
We can make the data tidy with `pivot_longer()`.

.footnote[
.font60[
`pivot_longer()` is a function from a package called `tidyr` which is one of the `tidyverse` packages.
]
]
--

`pivot_longer()` collects the values from specified columns (`cols`) into a single column (`values_to`) and creates a column to indicate the group (`names_to`).

---
# Working with Data 
.scroll-output-width[

```r
pigeon2 <- pivot_longer(data = pigeon, 
                        cols = everything(), 
                        names_to = "population", 
                        values_to = "distance")
str(pigeon2)
```

```
## tibble [80 × 2] (S3: tbl_df/tbl/data.frame)
##  $ population: chr [1:80] "A" "B" "A" "B" ...
##  $ distance  : num [1:80] 12.4 12.6 11.2 11.3 11.6 12.1 12.3 12.2 11.8 11.8 ...
```
]

A 'tibble' `\(\approx\)` dataframe

---
# Working with Data

Now we have a dataframe in tidy format which *will* make it easier to summarise, analyse and visualise.

--

To summarise data in this format we use the `group_by()` and `summarise()` functions.

--

We will also use the pipe operator: ` |> `

---
# Working with Data

To summarise multiple group data in tidy form:

```r
pigeon2 |> 
  group_by(population) |> 
  summarise(mean = mean(distance))
```

--

This can be read as:
  - take pigeon2 *and then*  
  - group it by population *and then*  
  - summarise it by calculating the mean

i.e., the mean is done for each population.

--

The `mean` before the `=` is just a name.

---
# Working with Data 
The result:

```
## # A tibble: 2 × 2
##   population  mean
##   <chr>      <dbl>
## 1 A           11.2
## 2 B           11.7
```
---
# Working with Data

We can add the number of pigeons in each group to the summary using the `length()` function.

```r
pigeon2 |> 
  group_by(population) |> 
  summarise(mean = mean(distance),
*           n = length(distance))
```

.footnote[
<span style=" font-weight: bold;    color: #fdf9f6 !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: #25496b !important;" >Extra exercise:</span>  Add a column for the standard deviation

<span style=" font-weight: bold;    color: #fdf9f6 !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: #25496b !important;" >Extra exercise:</span>  Add a column for the standard error given by `\(\frac{s.d.}{\sqrt{n}}\)`

]
---
# Working with Data

The result:

```
## # A tibble: 2 × 3
##   population  mean     n
##   <chr>      <dbl> <int>
## 1 A           11.2    40
## 2 B           11.7    40
```

---
# Working with Data

```r
pigeon2 |> 
  group_by(population) |> 
  summarise(mean = mean(distance),
            n = length(distance),
*           sd = sd(distance),
*           se = sd/sqrt(n))
```

```
## # A tibble: 2 × 5
##   population  mean     n    sd     se
##   <chr>      <dbl> <int> <dbl>  <dbl>
## 1 A           11.2    40 0.740 0.117 
## 2 B           11.7    40 0.573 0.0906
```

---
# Working with Data 
To plot this data as a histogram:

```r
ggplot(data = pigeon2, aes(x = distance)) +
* geom_histogram(bins = 10,
                 col = "black") +
  scale_x_continuous(name = "Interorbital distance (mm)")
```
.pull-left[

<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-34-1.png" width="288" />
]

--
.pull-right[

`geom_histogram()` is the `geom` and `bins` gives the number of bars.

This is whole data set, not separated by population!

]

---
# Working with Data

To plot multiple group data in tidy form we map the population variable to the `fill` aesthetic

```r
*ggplot(data = pigeon2, aes(x = distance, fill = population)) +
  geom_histogram(bins = 10, 
                 col = "black") +
  scale_x_continuous(name = "Interorbital distance (mm)")
```

<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-35-1.png" width="288" />
.pull-right[
.footnote[
<span style=" font-weight: bold;    color: #fdf9f6 !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: #25496b !important;" >Extra exercise:</span>  Make the axes cross at (0,0)]
]
---
# Working with Data

<span style=" font-weight: bold;    color: #fdf9f6 !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: #25496b !important;" >Extra exercise:</span>  Make the axes cross at (0,0)

```r
ggplot(data = pigeon2, aes(x = distance, fill = population)) + 
  geom_histogram(bins = 10, 
                 col = "black") +
  scale_x_continuous(name = "Interorbital distance (mm)",
*                    expand = c(0, 0)) +
  scale_y_continuous(name = "Frequency",
*                    expand = c(0, 0))
```
result on next slide.
---
# Working with Data 
The result:

<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-36-1.png" width="288" />

---
# Working with Data

`geom_density()` can also be used when `distance` is mapped to `x` and  `y` gives a measure of occurrence.

```r
ggplot(data = pigeon2, aes(x = distance, fill = population)) +
* geom_density(col = "black") +
  scale_x_continuous(name = "Interorbital distance (mm)",
                     expand = c(0, 0)) + 
  scale_y_continuous(name = "Density",
                     expand = c(0, 0)) 
```

<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-37-1.png" width="288" />

---
# Working with Data

Alter the transparency using `alpha`:

```r
ggplot(data = pigeon2, aes(x = distance, fill = population)) +
* geom_density(col = "black", alpha = 0.3) +
  scale_x_continuous(name = "Interorbital distance (mm)",
                     expand = c(0, 0)) + 
  scale_y_continuous(name = "Density",
                     expand = c(0, 0)) 
```

<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-38-1.png" width="288" />

---
# Working with Data

Formatting figures for inclusion in reports?

All [elements can be customised individually](https://ggplot2.tidyverse.org/reference/theme.html) but `theme_classic()` takes care of many options you are likely to desire.

---
# Working with Data

```r
ggplot(data = pigeon2, aes(x = distance, fill = population)) +
  geom_density(col = "black", alpha = 0.3) +
  scale_x_continuous(name = "Interorbital distance (mm)",
                     expand = c(0, 0)) + 
  scale_y_continuous(name = "Density",
                     expand = c(0, 0)) +
* theme_classic()
```

<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-39-1.png" width="288" />

---
# Working with Data

A different kind of plot:

.pull-left[
<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-40-1.png" width="288" />
]

.pull-right[

Note: We need to change the `aes()` as well as the `geom` because this figure has population on the x axis.

]
---
# Working with Data

```r
ggplot(data = pigeon2, aes(x = population, y = distance)) + 
  geom_boxplot() +                            
  scale_x_discrete(name = "Population") +
  scale_y_continuous(name = "Interorbital distance (mm)",
                   expand = c(0, 0),
                   limits = c(0, 15)) +
  theme_classic()
```

.footnote[
<span style=" font-weight: bold;    color: #fdf9f6 !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: #25496b !important;" >Extra exercise:</span>  Can you (gratuitously) colour the boxes by population too?
]

---
# Working with Data

```r
ggplot(data = pigeon2, aes(x = population, y = distance, fill = population)) + 
  geom_boxplot() +                            
  scale_x_discrete(name = "Population") +
  scale_y_continuous(name = "Interorbital distance (mm)",
                   expand = c(0, 0),
                   limits = c(0, 15)) +
  theme_classic()
```

<img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-42-1.png" width="288" />

---

# Summary
-   Use a script and comment it
-   organise analyses and use relative paths
-   shortcuts: `<-` is Alt-minus  ` |>` is Ctl-Shift-M
-   objects are seen in the Environment window
-   data is read in to R from files into dataframes
-   the dataframe is a common data structure
-   you'll eventually get used to the manual!
-   a `ggplot` has a `data` argument and an `aesthetic` argument; layers are added with a `+`; `geoms` determine how the data are plotted

---

class: inverse

# 🥳 Congratulations! Keep practising! 🎈

---
# References
.footnote[
.font60[
Slides made with with xaringan (Xie, 2019) and xaringanExtra (Aden-Buie, 2020)
]
]
.font60[
Aden-Buie, G. (2020). _xaringanExtra: Extras And Extensions for
Xaringan Slides_. R package version 0.2.3.9000. URL:
[https://github.com/gadenbuie/xaringanExtra](https://github.com/gadenbuie/xaringanExtra).

Codd, E. F. (1990). _The Relational Model for Database Management:
Version 2_. Boston, MA, USA: Addison-Wesley Longman Publishing Co.,
Inc.

Rand, E. (2023). _Computational Analysis for Bioscientists_. URL:
[https://3mmarand.github.io/comp4biosci/](https://3mmarand.github.io/comp4biosci/).

Wickham, H. (2014). "Tidy Data". In: _Journal of Statistical Software,
Articles_ 59.10, pp. 1-23.

Wickham, H. (2016). _ggplot2: Elegant Graphics for Data Analysis_.
Springer-Verlag New York. ISBN: 978-3-319-24277-4. URL:
[https://ggplot2.tidyverse.org](https://ggplot2.tidyverse.org).

Xie, Y. (2019). _xaringan: Presentation Ninja_. R package version 0.12.
URL:
[https://CRAN.R-project.org/package=xaringan](https://CRAN.R-project.org/package=xaringan).
]

---
# Intro to Repro in R

Emma Rand  
[emma.rand@york.ac.uk](mailto:emma.rand@york.ac.uk)  
Twitter: [@er13_r](https://twitter.com/er13_r)   
GitHub: [3mmaRand](https://github.com/3mmaRand)  
blog: https://buzzrbeeline.blog/

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

Rand, E. (2023). White Rose BBSRC DTP Training: An Introduction to Reproducible Analyses in R (Version v1.2). https://doi.org/10.5281/zenodo.3859818