1  About this book

First draft

You are reading a work in progress. This page is a first draft but should be readable.

1.1 Who is this book for?

This book is primarily being written to support Bioscience students at the University of York. The ultimate aim is to support the full spectrum of computational skills that a bioscience undergraduate or postgraduate at York - and elsewhere - might need. But it is a work in progress. The content included so far is described in the Overview of contents section below.

It is being written in the open so that it can be used by anyone who finds it useful. It is also being written in the open so that anyone can contribute to it.

1.2 Approach of this book

  • explanations followed by worked examples

1.3 Overview of contents

It is in sections

Part 1: What they forgot to teach you about computers

This chapter tries to teach the computer skills that you might have missed if you have used mainly the mobile devices. I focus on the knowledge gaps that often appear when people are learning computational data analysis. Primarily these are to do with finding and organising their files and folders in the file systems.

Part 2 Getting started with data

The first steps into analysing data with R. The first chapter in this part covers important concepts about data: whether they are discrete and continuous and how we summarise them using descriptive statistics. The second chapter introduces you to R and RStudio for the first time. We start by exploring the layout and appearance then move on to coding. The third chapter describes some useful workflow patterns and tools for organising your work in RStudio. Using these will make learning R easier. Finally, we will go through a complete workflow from importing data from a file to saving a figure for reporting.

Part 3 Statistical Analysis

This section is a first course in Statistical inference which is the process of inferring the characteristics of populations from samples using data analysis. In this first course we take what is called a frequentist - or classical - approach to statistical inference. This is the approach that is most commonly taught in introductory statistics courses. We will learn about the logic of hypothesis testing and confidence intervals. You will also get an introduction to statistical models, what is a statistical model and in particular a linear model.

1.4 Conventions used in the book

I use some conventions most of which I hope are intuitive. I have tried to articulate them here. If you recognise conventions I have used that are not listed here please let me know.

Code and any output appears in blocks formatted like this:

# import the chaff data
chaff <- read_table("data-raw/chaff.txt")
glimpse(chaff)
## Rows: 40
## Columns: 2
## $ subspecies <chr> "coelebs", "coelebs", "coelebs", "coelebs", "coelebs", "coe…
## $ mass       <dbl> 18.3, 22.1, 22.4, 18.5, 22.2, 19.3, 17.8, 20.2, 22.1, 16.6,…

Lines of output start with a ## to distinguish from code comments which begin with a single #. You will learn more about comments in the Using Scripts section in First Steps in RStudio

Within the text: - packages are indicated in bold code font like this: ggplot2 - functions are indicated in code font with brackets after their name like this: ggplot() - R objects are indicated in code font like this: stag

The content of a code block can be copied using the icon in its top right corner.

I use packages from the tidyverse (Wickham et al. 2019) including ggplot2 (Wickham 2016), dplyr (Wickham et al. 2023), tidyr (Wickham, Vaughan, and Girlich 2023) and readr (Wickham, Hester, and Bryan 2023) throughout the book. All the code assumes you have loaded the core tidyverse packages with:

If you run examples and get an error like this:

# Error in read_table("data-raw/stag.txt") : 
#  could not find function "read_table"

It is likely you need to load the tidyverse as shown above.

All other packages will be loaded explicitly with library() statements where needed.

When you see “🎬 Your turn!” indicates that you might want to code along with examples or that there is an opportunity to check your understanding by answering a question. Questions are answered in words or with a piece of code. The answers are given in collapsed sections so you can try to answer them before checking the answer. For example, a question answered in words looks like this:

🎬 Your turn! Use the file system above to answer these questions.

  • What is the absolute path for the documentdoc4.txt on a Mac computer?
  • /home/user1/docs/data/doc4.txt

And a question answered with a piece of code looks like this:

🎬 Your turn! Assign the value of 4 to a variable called y:

Code
y <- 4

1.5 Annotating this book

This page has annotating with Hypothesis enabled. Hypothesis allows you to annotate this book with your own private notes or make notes shared with friends. You need to create a free personal account. You can make annotations that are public, private only to you or shared with a private group. Please follow the code of conduct in your annotations.

1.6 Code of Conduct

We are dedicated to providing a welcoming and supportive learning environment for all readers, regardless of background or identity. As such, we do not tolerate comments that are disrespectful to fellow learners or that excludes, intimidates, or causes discomfort to others. The following bullet points set out explicitly what we hope you will consider to be appropriate community guidelines:

  • Be respectful of different viewpoints and experiences. Do not use in homophobic, racist, transphobic, ageist, ableist, sexist, or otherwise exclusionary language.

  • Use welcoming and inclusive language. Do not address others in an angry, intimidating, or demeaning manner. Be considerate of the ways the words you choose may impact others. Be patient and respectful of the fact that English is a second (or third or fourth!) language for many.

  • Respect the privacy and safety of others. Do not share their information without their express permission.

  • As an overriding general rule, please be intentional in your actions and humble in your mistakes.

1.7 Contributing

This book is being written in the open so that anyone can contribute to it. If you find a mistake, or have a suggestion for improvement you can create an issue.

1.8 License

This work is licensed under CC BY-NC 4.0 This license requires that reusers give credit to the creator. It allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, for noncommercial purposes only.

1.9 Please cite as

Please cite this book as:

Rand, E. (2023). Computational Analysis for Bioscientists (Version 0.1) https://3mmarand.github.io/comp4biosci/

1.10 Credits

This book is written with R (R Core Team 2023), Quarto (Allaire et al. 2022), knitr (Xie 2022), kableExtra (Zhu 2021). My R session information is shown below:

sessionInfo()
## R version 4.3.1 (2023-06-16 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8 
## [2] LC_CTYPE=English_United Kingdom.utf8   
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.utf8    
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base     
## 
## other attached packages:
##  [1] patchwork_1.1.2 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.0    
##  [5] purrr_1.0.1     readr_2.1.4     tidyr_1.3.0     tibble_3.2.0   
##  [9] ggplot2_3.4.1   tidyverse_1.3.2
## 
## loaded via a namespace (and not attached):
##  [1] utf8_1.2.3          generics_0.1.3      renv_0.17.0        
##  [4] xml2_1.3.5          stringi_1.7.12      hms_1.1.2          
##  [7] digest_0.6.34       magrittr_2.0.3      timechange_0.2.0   
## [10] evaluate_0.23       grid_4.3.1          fastmap_1.1.0      
## [13] cellranger_1.1.0    jsonlite_1.8.8      backports_1.4.1    
## [16] DBI_1.1.3           googledrive_2.0.0   rvest_1.0.3        
## [19] httr_1.4.4          fansi_1.0.4         scales_1.2.1       
## [22] modelr_0.1.10       cli_3.4.1           crayon_1.5.2       
## [25] rlang_1.1.1         dbplyr_2.3.0        reprex_2.0.2       
## [28] ellipsis_0.3.2      munsell_0.5.0       withr_2.5.0        
## [31] yaml_2.3.7          tools_4.3.1         tzdb_0.3.0         
## [34] gargle_1.3.0        colorspace_2.1-0    broom_1.0.3        
## [37] assertthat_0.2.1    vctrs_0.5.2         R6_2.5.1           
## [40] lubridate_1.9.2     lifecycle_1.0.3     fs_1.6.1           
## [43] pkgconfig_2.0.3     pillar_1.8.1        gtable_0.3.1       
## [46] glue_1.6.2          haven_2.5.1         xfun_0.37          
## [49] tidyselect_1.2.0    rstudioapi_0.14     knitr_1.42         
## [52] htmltools_0.5.4     googlesheets4_1.0.1 rmarkdown_2.20     
## [55] compiler_4.3.1      readxl_1.4.2