Independent Study to consolidate this week
Two-way ANOVA
Set up
If you have just opened RStudio you will want to load the tidyverse
and emmeans
packages.
Exercises
- 💻 Researchers suspect there may be regional differences between two species of butterfly, F.concocti and F.flappa. They catch 8 randomly chosen individuals of each species in two regions (North and South) and measure their wing lengths in mm. The data are in butterf.txt. What do you conclude about the size of these species in the North and South?
Answer - don’t look until you have tried!
butter <- read_table("data-raw/butterf.txt")
str(butter)
Answer - don’t look until you have tried!
# create a rough plot of the data
ggplot(data = butter,
aes(x = region, y = winglen, fill = spp)) +
geom_violin()
Answer - don’t look until you have tried!
# There seems to be little difference between the two species in the
# south but F.concocti
# is larger than F.flappa in the North.
Answer - don’t look until you have tried!
# The data seem to be continuous so it is likely that a parametric
# test will be fine
# we will check the other assumptions after we have run the lm
# build the statistical model
mod <- lm(data = butter, winglen ~ spp * region)
# examine it
summary(mod)
# the line starting (Intercept) is β0
# the line starting sppF.flappa is β1
# the line starting regionsouth is β2
# the line starting sppF.flappa:regionsouth is β3
# F.concocti-north mean is β0 i.e., 32.275
# F.Flappa-north mean is β0 + β1 i.e., 32.275 - 7.875 = 24.4
# F.concocti-south is β0 + β2 i.e., 32.275 - 8.638 = 23.637
# F.Flappa--south is β0 + β1 + β2 + β3
# i.e., 32.275 - 7.875 - 8.638 + 7.713 = 23.475
# The model of spp and region overall explains a significant
# amount (49%) of the variation in wing lengths
# (F = 8.1; d.f. = 3, 28; p = 0.0002). To see which of the three
# effects are significant we can use the `anova()` function on our
# model.
Answer - don’t look until you have tried!
anova(mod)
# There was a significant effect of species (F = 8.1;
# d.f. = 1, 28; p = 0.008) and region (F = 8.1;
# d.f. = 1, 28; p = 0.008) on wing length and these effects interact
# (F = 7.5; d.f. = 1, 28; p = 0.01)
# It is important to consider how the interaction effect is
# influencing our interpretation of the main effects.
# Judging by the means and the rough plot, the significant
# effect of both species and region is due to the fact that
# F.concocti is larger than F.flappa in the North.
# In other words, we only have a significant effect of species
# in the North.
# the emmeans package will let us see exactly what comparisons
# are significant
Answer - don’t look until you have tried!
emmeans(mod, ~ spp * region) |> pairs()
# As we suspected all the difference are between F.concocti north
# and the other three groups.
# F.concocti north - F.flappa north p = 0.0026
# F.concocti north - F.concocti south p = 0.0010
# F.concocti north - F.flappa south p = 0.0008
# There was a signifcant interaction between the effects of species
# and region (F = 7.5; d.f. = 1, 28; p = 0.01) with Northern
# F.concocti being significantly larger than F.concocti in the
# south (Tukey HSD: p = 0.0010) and F.flappa in the
# north (p = 0.0026) and south (p = 0.0008). This creates a
# significant difference between species averaged over region
# (F = 8.1; d.f. = 1, 28; p = 0.008) and region averaged over species
Answer - don’t look until you have tried!
# There is a significant effect of species (F = 8.1;
# d.f. = 1, 28; p = 0.008) and region (F = 11.5; d.f. = 1, 28;
# p = 0.002) on wing length. These effect are
Answer - don’t look until you have tried!
# let's check the assumptions
plot(mod, which = 1)
Answer - don’t look until you have tried!
# we're looking for the variance in the residuals to be the same in both groups.
# This looks OK. Maybe a bit higher in the wild plants (with the higher mean)
hist(mod$residuals)
Answer - don’t look until you have tried!
shapiro.test(mod$residuals)
# On balance the use of lm() is probably justifiable The variance isn't quite equal
# and the histogram looks a bit off normal but the normality test is NS and the
# effect (in the figure) is clear.
Answer - don’t look until you have tried!
# A figure
butter_fig <- ggplot() +
geom_point(data = butter,
aes(x = region,
y = winglen,
shape = spp),
position = position_jitterdodge(dodge.width = 1,
jitter.width = 0.2,
jitter.height = 0),
size = 2,
colour = "gray50") +
geom_errorbar(data = butter_summary,
aes(x = region,
ymin = mean - se,
ymax = mean + se,
group = spp),
width = 0.4,
position = position_dodge(width = 1)) +
geom_errorbar(data = butter_summary,
aes(x = region,
ymin = mean,
ymax = mean,
group = spp),
width = 0.3,
position = position_dodge(width = 1) ) +
scale_x_discrete(name = "region") +
scale_y_continuous(name = "Wing length (mm)",
expand = c(0, 0),
limits = c(0, 55)) +
scale_shape_manual(values = c(19, 1),
name = NULL,
labels = c(bquote(italic("F.concocti")),
bquote(italic("F.flappa")))) +
# F.concocti north - F.flappa north p = 0.0026
annotate("segment",
x = 0.75, xend = 1.25,
y = 40, yend = 40,
colour = "black") +
annotate("text",
x = 1, y = 42,
label = "p = 0.0026") +
# F.concocti north - F.concocti south p = 0.0010
annotate("segment",
x = 0.75, xend = 1.75,
y = 44, yend = 44,
colour = "black") +
annotate("text", x = 1.25, y = 46,
label = "p = 0.0010") +
# F.concocti north - F.flappa south p = 0.0008
annotate("segment",
x = 0.75, xend = 2.25,
y = 48, yend = 48,
colour = "black") +
annotate("text", x = 1.5, y = 50,
label = "p = 0.0008") +
theme_classic() +
theme(legend.title = element_blank(),
legend.position = c(0.2, 0.12))
# save figure to figures/butter.png
ggsave("figures/butter.png",
plot = butter_fig,
width = 3.5,
height = 3.5,
units = "in",
dpi = 300)