Independent Study to consolidate this week

Summarising data in with several variables and the role of variables in analysis

Set up

If you have just opened RStudio you will want to load the packages and import the data.

  1. 💻 Summarise and plot the pigeons dataframe appropriately.
Answer - don’t look until you have tried!
# import
pigeons <- read_table("data-raw/pigeon.txt")

# reformat to tidy
pigeons <- pivot_longer(data = pigeons, 
                        cols = everything(), 
                        names_to = "population", 
                        values_to = "distance")

# sumnmarise
pigeons_summary <- pigeons %>%
  group_by(population) %>%
  summarise(mean = mean(distance),
            std = sd(distance),
            n = length(distance),
            se = std/sqrt(n))
# plot
ggplot() +
  geom_point(data = pigeons, aes(x = population, y = distance),
             position = position_jitter(width = 0.1, height = 0),
             colour = "gray50") +
  geom_errorbar(data = pigeons_summary, 
                aes(x = population, ymin = mean - se, ymax = mean + se),
                width = 0.3) +
  geom_errorbar(data = pigeons_summary, 
                aes(x = population, ymin = mean, ymax = mean),
                width = 0.2) +
  scale_y_continuous(name = "Interorbital distance (mm)", 
                     limits = c(0, 14), 
                     expand = c(0, 0)) +
  scale_x_discrete(name = "Population") +
  theme_classic()
  1. 💻 The data in blood.csv are measurements of several blood parameters from fifty people with Crohn’s disease, a lifelong condition where parts of the digestive system become inflamed. Twenty-five of people are in the early stages of diagnosis and 25 have started treatment. The variables in the dataset are:

    • sodium - Sodium concentration in umol/L, the average of 5 technical replicates
    • potassium - Potassium concentration in umol/L, the average of 5 technical replicates
    • B12 Vitamin - B12 in pmol/L, the average of 5 technical replicates
    • wbc - White blood cell count in 10^9 /L, the average of 5 technical replicates
    • rbc count - Red blood cell count in 10^12 /L, the average of 5 technical replicates
    • platlet count - platlet count in 10^9 /L, the average of 5 technical replicates
    • inflammation marker - the presence or absence of a marker of inflammation, either 0 or 1
    • status - whether the individual is before or after treatment.

    Your task is to summarise and plot these data in any suitable way. Create a complete RStudio Project for an analysis of these data. You will need to:

    • Make a new project
    • Make folders for data and for figures
    • Import the data
    • Summarise and plot variables of your choice. It doesn’t matter what you chose - the goal is the practice the project workflow and selecting appropriate plotting and summarising methods for particular data sets.