Independent Study to consolidate this week

Two-way ANOVA

Set up

If you have just opened RStudio you will want to load the tidyverse and emmeans packages.

Exercises

  1. 💻 Researchers suspect there may be regional differences between two species of butterfly, F.concocti and F.flappa. They catch 8 randomly chosen individuals of each species in two regions (North and South) and measure their wing lengths in mm. The data are in butterf.txt. What do you conclude about the size of these species in the North and South?
Answer - don’t look until you have tried!
butter  <-  read_table("data-raw/butterf.txt")
str(butter)
Answer - don’t look until you have tried!
# create a rough plot of the data  
ggplot(data = butter,
       aes(x = region, y = winglen, fill = spp)) +
  geom_violin()
Answer - don’t look until you have tried!
# There seems to be little difference between the two species in the
# south but F.concocti
# is larger than F.flappa in the North.
Answer - don’t look until you have tried!
# create a summary of the data
butter_summary <- butter %>%
  group_by(region, spp) %>%
  summarise(mean = mean(winglen),
            median = median(winglen),
            sd = sd(winglen),
            n = length(winglen),
            se = sd/sqrt(n))
Answer - don’t look until you have tried!
# The data seem to be continuous so it is likely that a parametric
# test will be fine
# we will check the other assumptions after we have run the lm

# build the statistical model
mod <- lm(data = butter, winglen ~ spp * region)


# examine it
summary(mod)


# the line starting (Intercept) is  β0  
# the line starting sppF.flappa is β1
# the line starting regionsouth is β2
# the line starting sppF.flappa:regionsouth is β3

# F.concocti-north mean is β0 i.e., 32.275
# F.Flappa-north mean is β0 + β1 i.e., 32.275 - 7.875 = 24.4
# F.concocti-south is β0 + β2  i.e., 32.275 - 8.638 = 23.637
# F.Flappa--south is β0 + β1 + β2 + β3 
#           i.e., 32.275 - 7.875 - 8.638 + 7.713 = 23.475

# The model of spp and region overall explains a significant 
# amount (49%) of the variation in wing lengths
# (F = 8.1; d.f. = 3, 28; p = 0.0002). To see which of the three 
# effects are significant we can use the `anova()` function on our
# model.
Answer - don’t look until you have tried!
anova(mod)

# There was a significant effect of species (F = 8.1; 
# d.f. = 1, 28; p = 0.008) and region (F = 8.1; 
# d.f. = 1, 28; p = 0.008) on wing length and these effects interact 
# (F = 7.5; d.f. = 1, 28; p = 0.01)

# It is important to consider how the interaction effect is
# influencing our interpretation of the main effects.
# Judging by the means and the rough plot, the significant
# effect of both species and region is due to the fact that
# F.concocti is larger than F.flappa in the North.
# In other words, we only have a significant effect of species
# in the North. 


# the emmeans package will let us see exactly what comparisons
# are significant
Answer - don’t look until you have tried!
emmeans(mod, ~ spp * region) |> pairs()

# As we suspected all the difference are between  F.concocti north
# and the other three groups. 
 # F.concocti north - F.flappa north    p = 0.0026
 # F.concocti north - F.concocti south  p = 0.0010
 # F.concocti north - F.flappa south    p = 0.0008

# There was a signifcant interaction between the effects of species 
# and region (F = 7.5; d.f. = 1, 28; p = 0.01) with Northern
# F.concocti being significantly larger than F.concocti in the 
# south (Tukey HSD: p = 0.0010) and F.flappa in the 
# north (p = 0.0026) and south (p = 0.0008). This creates a 
# significant difference between species averaged over region 
# (F = 8.1; d.f. = 1, 28; p = 0.008) and region averaged over species
Answer - don’t look until you have tried!
# There is a significant effect of species (F = 8.1; 
# d.f. = 1, 28; p = 0.008) and region (F = 11.5; d.f. = 1, 28; 
# p = 0.002) on wing length. These effect are 
Answer - don’t look until you have tried!
# let's check the assumptions
plot(mod, which = 1) 
Answer - don’t look until you have tried!
# we're looking for the variance in the residuals to be the same in both groups.
# This looks OK. Maybe a bit higher in the wild plants (with the higher mean)
 
hist(mod$residuals)
Answer - don’t look until you have tried!
shapiro.test(mod$residuals)
# On balance the use of lm() is probably justifiable  The variance isn't quite equal 
# and the histogram looks a bit off normal but the normality test is NS and the 
# effect (in the figure) is clear.
Answer - don’t look until you have tried!
# A figure 
butter_fig <- ggplot() +
  geom_point(data = butter, 
             aes(x = region,
                 y = winglen,
                 shape = spp),
             position = position_jitterdodge(dodge.width = 1,
                                             jitter.width = 0.2,
                                             jitter.height = 0),
             size = 2,
             colour = "gray50") +
  geom_errorbar(data = butter_summary, 
                aes(x = region, 
                    ymin = mean - se, 
                    ymax = mean + se, 
                    group = spp),
                width = 0.4, 
                position = position_dodge(width = 1)) +
  geom_errorbar(data = butter_summary, 
                aes(x = region, 
                    ymin = mean, 
                    ymax = mean, 
                    group = spp),
                width = 0.3, 
                position = position_dodge(width = 1) ) +
  scale_x_discrete(name = "region") +
  scale_y_continuous(name = "Wing length (mm)",
                     expand = c(0, 0),
                     limits = c(0, 55)) +
  scale_shape_manual(values = c(19, 1),
                     name = NULL,
                     labels = c(bquote(italic("F.concocti")),
                                bquote(italic("F.flappa")))) +
  # F.concocti north - F.flappa north    p = 0.0026
  annotate("segment",
           x = 0.75, xend = 1.25,
           y = 40, yend = 40,
           colour = "black") +
  annotate("text",
           x = 1,  y = 42,
           label = "p = 0.0026") +
  # F.concocti north - F.concocti south  p = 0.0010
  annotate("segment",
           x = 0.75, xend = 1.75,
           y = 44, yend = 44,
           colour = "black") +
  annotate("text", x = 1.25,  y = 46,
           label = "p = 0.0010") +
  # F.concocti north - F.flappa south    p = 0.0008
  annotate("segment",
           x = 0.75, xend = 2.25,
           y = 48, yend = 48,
           colour = "black") +
  annotate("text", x = 1.5,  y = 50,
           label = "p = 0.0008") +
  theme_classic() +
  theme(legend.title = element_blank(),
        legend.position = c(0.2, 0.12)) 

# save figure to figures/butter.png
ggsave("figures/butter.png",
       plot = butter_fig,
       width = 3.5,
       height = 3.5,
       units = "in",
       dpi = 300)