Statistical Analysis - Part 1

Just needs proof reading

You are reading a work in progress. This page is compete but needs final proof reading.

What this section covers

This section introduces the fundamentals of Statistical Inference, the process of drawing conclusions about a population’s characteristics based on sample data. In this first course, we focus on the frequentist (or classical) approach to statistical inference, which is the most commonly taught method in introductory statistics courses.

Why do we need statistical inference?

In biosciences - and many other fields - we need statistical inference because it is usually impossible to study and measure entire populations. Instead, we need to take a random sample to draw conclusions about the population of interest. Statistical inference provides the tools to make these conclusions reliably, accounting for the variability and uncertainty inherent in sampling.

What tools will we cover?

We begin by explaining the logic of hypothesis testing and its role in making statistical inferences. Following this, we explore confidence intervals and provide an introduction to statistical models, with a particular focus on linear models.

The remaining chapters delve into regression, t-tests, and ANOVA, which are special cases of a broader statistical framework known as the General Linear Model (GLM). While t-tests and ANOVA are often taught using the t.test() and aov() functions in R, respectively, this book approaches them using the lm() function. This emphasizes their shared foundation within the GLM framework and helps you understand that regression, t-tests, and ANOVA are all variations of the same underlying model.

By learning these concepts through the lm() function, you will become familiar with the terminology and language of statistical modelling. This approach also makes it easier to build upon your knowledge, as the output of lm() is representative of the structure and results of statistical modelling functions in R more broadly.

The GLM framework makes some assumptions about the data to calculate probabilities. These are known as parametric assumptions and the GLM is a parametric test. As we apply each test we will check the assumptions are met. You learn how to apply non-paramtric tests when the parametric assumptions are not met. These are applied with wilcox.test() and kruskal.test()