Data Analysis in R for BABS 2
Welcome!
Welcome to an exciting six-week journey into data analysis with R! In this part of the module, we’ll explore key concepts that are essential for any budding bioscientist, including the logic of hypothesis testing, a foundation for making informed decisions based on data and Statistical models for analysing patterns and relationships across multiple variables.
This module builds on your skills from BABS1 step-by-step, helping you gain confidence in applying statistical tools to real-world bioscience problems. By the end, you’ll have a solid understanding of how to analyse data to go with your existing skills in summarising and plotting it.
Module Learning Objectives
The BABS2 Module Learning outcomes that relate to the Data Analysis in R content are:
Think creatively to address a Grand Challenge by designing investigations with testable hypotheses and rigorous controls
Appropriately select classical univariate statistical tests and some non-parametric equivalents to a given scenario and recognise when these are not suitable
Use R to perform these analyses, reproducibly, on data in a variety of formats and present the results graphically
Communicate research in scientific reports and via oral presentation
How this part of the module is organised
This module is designed to help you learn as you go — meaning you won’t need to spend hours revising. Instead, you’ll build skills gradually and steadily through a structured weekly schedule that combines contact time and guided independent study. The goal is to make learning manageable, effective, and enjoyable.
Each week has:
An “About” page which gives a topic summary, the Learning Objectives, and the instructions for the week. You should read this first.
Some independent study on the “Prepare!” page to help you get ready for the workshop. This will be reading from the course book , watching a video, or doing some coding or some set up. The preparation is designed to take about 30-45 minutes on average. To make it more fun and more productive, study with other people.
A two-hour workshop using R to apply concepts you are learning. We usually start with a demonstration of an example from the “Prepare!” tasks. You will spend most of the workshop diving into exercises, and building on previous weeks’ skills while being introduced to new concepts. Anything you have not done before is explained and guided but you will also have the opportunity to solve problems using using the skills gained in previous workshops on your own or others. I often remind you to take care of future you by making notes so you can look up your previous work but you can also search the R4BABS site (search is top right).
The “Consolidate!” page has independent exercises to reinforce what you’re learning. These are similar to the workshop activities but with less guidance to encourage independent thinking. Occasionally, there will also be some reading. It is designed to take about 30-45 mins on average but may be quicker if you understood the workshop very well or slower if you need to revisit parts.
Tips for Learning R:
Learning R is a bit like learning a new language, picking up a musical instrument, or mastering a sport — you can’t cram it all at once. Consistency is key. Even small, regular practice sessions are more effective than trying to do it all in one go.
a little bit of engagement and practice is always better than none, so celebrate each bit effort!
if you fall behind, don’t stress. Just pick up where you left off — there’s no need to skip ahead. It is fine to work on a previous week’s workshop!
Content
The logic of hypothesis testing and CIs
This week we will cover the logic of consider the logic of hypothesis testing and type 1 and type 2 errors. We will also find out what the sampling distribution of the mean and the standard error are, and how to calculate confidence intervals.
Introduction to statistical models: Single regression
This week, you’ll learn about statistical models, which are mathematical representations of data relationships. Specifically, you’ll explore the general linear model (GLM), a broad framework for analysing data patterns.
Your first GLM will be simple linear regression, which fits a straight line to data to predict a response variable (outcome) based on an explanatory variable (predictor). We’ll examine the two key parameters estimated in this model: the slope (which shows how the predictor influences the outcome) and the intercept (the value when the predictor is zero). We’ll also assess whether these values are significantly different from zero.
Two-sample tests
This week you will how to use and interpret the general linear model when the explanatory (or x) variable is categorical with two possible values. These tests are also known as t-tests. Just as with single linear regression, the response variable is continuous, the model puts a line of best through data and has two parameters called the intercept and the slope. These have the same in interpretation as they do in linear regression. The intercept is one of the group means, and the slope is the difference between that mean and the other group mean. You will also learn about the non-parametric tests we use when the assumptions of the general linear model are not met.
One-way ANOVA and Kruskal-Wallis
Last week you learnt how to use and interpret the general linear model when the x variable was categorical with two groups. You will now extend that to situations when there are more than two groups. This is often known as the one-way ANOVA (analysis of variance). You will also learn about the Kruskal-Wallis test which can be used when the assumptions of the general linear model are not met.
Two-way ANOVA
So far you have learnt how to use and interpret the general linear model when:
- the x variable was continuous
- the x variable was categorical with two groups
- the x variable was categorical with two or more groups
This week we will extend of our understanding by learning how to include two categorical explanatory variables in a general linear model. This model is often known as the two-way ANOVA. It allows us to design experiments to test whether a response is influenced by two variables and whether those two variables act independently on the response or interact in some way.
Association: Correlation and Contingency
This week you will learn how to test whether there is an association between two categorical variables using the chi-squared contingency test and how to test whether there is an association between two continuous variables using the correlation test.