Workshop

Transcriptomics 3: Visualising and Interpreting

Author

Emma Rand

Published

18 September, 2024

Introduction

Session overview

In the workshop, you will learn how to merge gene information into our results, conduct and plot a Principle Component Analysis (PCA) as well as how to create a nicely formatted Volcano plot and heatmap.

Set up

Either:

🎬 Open 🐸 frogs-88H Project and the cont-fgf-s30.R script.

Or

🎬 Open 🐭 mice-88H Project and the hspc-prog.R script.

🎬 Make a new folder figures in the project directory. You can use the New Folder button in the Files pane but here I have used the fs (Hester, Wickham, and CsÑrdi 2023) package

fs::dir_create("figures")

This is where we will save our figure files.

🎬 Load tidyverse (Wickham et al. 2019) and conflicted (Wickham 2023). You most likely have this code at the top of cont-fgf-s30.R or hspc-prog.R already.

── Attaching core tidyverse packages ─────────────────────────────────────────────── tidyverse 2.0.0 ──
βœ” dplyr     1.1.3     βœ” readr     2.1.4
βœ” forcats   1.0.0     βœ” stringr   1.5.0
βœ” ggplot2   3.4.3     βœ” tibble    3.2.1
βœ” lubridate 1.9.3     βœ” tidyr     1.3.0
βœ” purrr     1.0.2     
── Conflicts ───────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
βœ– dplyr::filter() masks stats::filter()
βœ– dplyr::lag()    masks stats::lag()
β„Ή Use the conflicted package to force all conflicts to become errors

🎬 Use the dplyr version of filter() by default:

conflict_prefer("filter", "dplyr")

🐸 Analysis

We will carry out several steps

  1. Import data and merge statistical results with normalised values
  2. Add gene information from Xenbase (Fisher et al. 2023)
  3. log2 transform the data
  4. Write the significant genes to file
  5. View the relationship between samples using PCA
  6. Visualise the expression of the most significant genes using a heatmap
  7. Visual all the results with a volcano plot

Import

We need to import both the normalised counts and the statistical results. We will need all of these for the visualisation and interpretation.

🎬 Import files saved from last week from the results folder: S30_normalised_counts.csv and S30_results.csv. I used the names s30_count_norm and s30_results for the dataframes.

🎬 Remind yourself what is in the rows and columns and the structure of the dataframes (perhaps using glimpse())

It is useful to have this information in a single dataframe to which we will add the gene information from xenbase. Having all the information together will make it easier to interpret the results and select genes of interest.

🎬 Merge the two dataframes:

# merge the results with the normalised counts
s30_results <- s30_count_norm |>
  left_join(s30_results, by = "xenbase_gene_id")

This means you have the counts for each sample along with the statistical results for each gene.

🎬 Import the metadata that maps the sample names to treatments:

# Import metadata that maps the sample names to treatments
meta <- read_table("meta/frog_meta_data.txt")
row.names(meta) <- meta$sample_id
# We only need the s30
meta_s30 <- meta |>
  dplyr::filter(stage == "stage_30")

log2 transform the data

We use the normalised counts for data visualisations so that the comparisons are meaningful. Since the fold changes are given is log2 it is useful to log2 transform the normalised counts too. We will add columns to the dataframe with these transformed values. Since we have some counts of 0 we will add a tiny amount to avoid -Inf values.

🎬 log2 transform the normalised counts:

# log2 transform the counts plus a tiny amount to avoid log(0)
s30_results <- s30_results |>
  mutate(across(starts_with("s30"), 
                \(x) log2(x + 0.001),
                .names = "log2_{.col}"))

This is a wonderful bit or R wizardry. We are using the across() function to apply a transformation to multiple columns. We have selected all the columns that start with s30. The \(x) is an β€œanonymous” function that takes the value of the column and adds 0.001 to it before applying the log2() function. The .names = "log2_{.col}" argument tells across() to name the new columns with the prefix log2_ followed by the original column name. You can read more about across() and anonymous functions from my posit::conf(2023) workshop

I recommend viewing the dataframe to see the new columns.

We now have dataframe with all the information we need: normalised counts, log2 normalised counts, statistical comparisons with fold changes and p values, and information about the gene other than just the id.

Write the significant genes to file

We will create dataframe of the significant genes and write them to file. These are the files you want to examine in more detail along with the visualisations to select your genes of interest.

🎬 Create a dataframe of the genes significant at the 0.01 level:

s30_results_sig0.01 <- s30_results |> 
  filter(padj <= 0.01)

🎬 Write the dataframe to file

🎬 Create a dataframe of the genes significant at the 0.05 level and write to file:

❓How many genes are significant at the 0.01 and 0.05 levels?

View the relationship between samples using PCA

We have 10,136 genes in our dataset. PCA will allow us to plot our samples in the β€œgene expression” space so we can see if FGF-treated sample cluster together and control samples cluster together as we would expect. We do this on the log2 transformed normalised counts.

Our data have genes in rows and samples in columns which is a common organisation for gene expression data. However, PCA expects samples in rows and genes, the variables, in columns. We can transpose the data to get it in the correct format.

🎬 Transpose the log2 transformed normalised counts:

s30_log2_trans <- s30_results |> 
  select(starts_with("log2_")) |>
  t() |> 
  data.frame()

We have used the select() function to select all the columns that start with log2_. We then use the t() function to transpose the dataframe. We then convert the resulting matrix to a dataframe using data.frame(). If you view that dataframe you’ll see it has default column name which we can fix using colnames() to set the column names to the Xenbase gene ids.

🎬 Set the column names to the Xenbase gene ids:

colnames(s30_log2_trans) <- s30_results$xenbase_gene_id

🎬 Perform PCA on the log2 transformed normalised counts:

pca <- s30_log2_trans |>
  prcomp(rank. = 4) 

The rank. argument tells prcomp() to only calculate the first 4 principal components. This is useful for visualisation as we can only plot in 2 or 3 dimensions. We can see the results of the PCA by viewing the summary() of the pca object.

summary(pca)
Importance of first k=4 (out of 6) components:
                           PC1     PC2     PC3     PC4
Standard deviation     64.0124 47.3351 38.4706 31.4111
Proportion of Variance  0.4243  0.2320  0.1532  0.1022
Cumulative Proportion   0.4243  0.6562  0.8095  0.9116

The Proportion of Variance tells us how much of the variance is explained by each component. We can see that the first component explains 0.4243 of the variance, the second 0.2320, and the third 0.1532. Together the first three components explain nearly 81% of the total variance in the data. Plotting PC1 against PC2 will capture about 66% of the variance which is likely much better than we would get plotting any two genes against each other. To plot the PC1 against PC2 we will need to extract the PC1 and PC2 score from the pca object and add labels for the samples.

🎬 Remove log2 from the row names:

sample_id <- row.names(s30_log2_trans) |> str_remove("log2_")

🎬 Create a dataframe of the PC1 and PC2 scores which are in pca$x and add the sample ids:

pca_labelled <- data.frame(pca$x,
                           sample_id)

🎬 Merge with the metadata so we can label points by treatment and sibling pair:

pca_labelled <- pca_labelled |> 
  left_join(meta_s30, 
            by = "sample_id")

Since the metadata contained the sample ids, it was especially important to remove the log2_ from the row names so that the join would work. The dataframe should look like this:

PC1 PC2 PC3 PC4 sample_id stage treatment sibling_rep
-76.38391 0.814699 -60.728327 -5.820669 S30_C_5 stage_30 control five
-67.02571 25.668563 51.476835 28.480254 S30_C_6 stage_30 control six
-14.02772 -78.474054 15.282058 -9.213076 S30_C_A stage_30 control A
47.60726 49.035510 -19.288753 20.928290 S30_F_5 stage_30 FGF five
26.04954 32.914201 20.206072 -55.752818 S30_F_6 stage_30 FGF six
83.78054 -29.958919 -6.947884 21.378020 S30_F_A stage_30 FGF A

🎬 Plot PC1 against PC2 and colour by sibling pair and shape by treatment:

pca <- pca_labelled |> 
  ggplot(aes(x = PC1, y = PC2, 
             colour = sibling_rep,
             shape = treatment)) +
  geom_point(size = 3) +
  scale_colour_viridis_d(end = 0.95, begin = 0.15,
                         name = "Sibling pair",
                         labels = c("A", ".5", ".6")) +
  scale_shape_manual(values = c(21, 19),
                     name = NULL,
                     labels = c("Control", "FGF-Treated")) +
  theme_classic()
pca

There is a good separation between treatments on PCA1. The sibling pairs do not seem to cluster together.

🎬 Save the plot to file:

ggsave("figures/frog-s30-pca.png",
       plot = pca,
       height = 3, 
       width = 4,
       units = "in",
       device = "png")

Visualise the expression of the most significant genes using a heatmap

A heatmap is a common way to visualise gene expression data. Often people will create heatmaps with thousands of genes but it can be more informative to use a subset along with clustering methods. We will use the genes which are significant at the 0.01 level.

We are going to create an interactive heatmap with the heatmaply (Galili et al. 2017) package. heatmaply takes a matrix as input so we need to convert a dataframe of the log2 values to a matrix. We will also set the rownames to the Xenbase gene symbols.

🎬 Convert a dataframe of the log2 values to a matrix:

mat <- s30_results_sig0.01 |> 
  select(starts_with("log2_")) |>
  as.matrix()

🎬 Set the rownames to the Xenbase gene symbols:

rownames(mat) <- s30_results_sig0.01$xenbase_gene_symbol

You might want to view the matrix by clicking on it in the environment pane.

🎬 Load the heatmaply package:

We need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the treatments to be 2 and the number of clusters for the genes to be the same since it makes sense to see what clusters of genes correlate with the treatments.

🎬 Set the number of clusters for the treatments and genes:

n_treatment_clusters <- 2
n_gene_clusters <- 2

🎬 Create the heatmap:

heatmaply(mat, 
          scale = "row",
          k_col = n_treatment_clusters,
          k_row = n_gene_clusters,
          fontsize_row = 7, fontsize_col = 10,
          labCol = str_remove(colnames(mat), pattern = "log2_"),
          labRow = rownames(mat),
          heatmap_layers = theme(axis.line = element_blank()))

On the vertical axis are genes which are differentially expressed at the 0.01 level. On the horizontal axis are samples. We can see that the FGF-treated samples cluster together and the control samples cluster together. We can also see two clusters of genes; one of these shows genes upregulated (more yellow) in the FGF-treated samples (the pink cluster) and the other shows genes down regulated (more blue, the blue cluster) in the FGF-treated samples.

The heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can β€œShow in a new window” to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting hide_colorbar = TRUE, in the heatmaply() function.

Visualise all the results with a volcano plot

colour the points if padj < 0.05 and log2FoldChange > 1

s30_results <- s30_results |> 
  mutate(log10_padj = -log10(padj),
         sig = padj < 0.05,
         bigfc = abs(log2FoldChange) >= 2) 
vol <- s30_results |> 
  ggplot(aes(x = log2FoldChange, 
             y = log10_padj, 
             colour = interaction(sig, bigfc))) +
  geom_point() +
  geom_hline(yintercept = -log10(0.05), 
             linetype = "dashed") +
  geom_vline(xintercept = 2, 
             linetype = "dashed") +
  geom_vline(xintercept = -2, 
             linetype = "dashed") +
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0)) +
  scale_colour_manual(values = c("gray", 
                                 "pink",
                                 "gray30",
                                 "deeppink")) +
  geom_text_repel(data = subset(s30_results, 
                                bigfc & sig),
                  aes(label = xenbase_gene_symbol),
                  size = 3,
                  max.overlaps = 50) +
  theme_classic() +
  theme(legend.position = "none")
vol
Error in `geom_text_repel()`:
! Problem while computing aesthetics.
β„Ή Error occurred in the 5th layer.
Caused by error:
! object 'xenbase_gene_symbol' not found
ggsave("figures/frog-s30-volcano.png",
       plot = vol,
       height = 4.5, 
       width = 4.5,
       units = "in",
       device = "png")
Error in `geom_text_repel()`:
! Problem while computing aesthetics.
β„Ή Error occurred in the 5th layer.
Caused by error:
! object 'xenbase_gene_symbol' not found

🐭 Analysis

We will carry out several steps

  1. Import data and merge statistical results with normalised values
  2. Add gene information from the NCBI using biomaRt
  3. Write the significant genes to file
  4. View the relationship between cells using PCA
  5. Visualise the expression of the most significant genes using a heatmap
  6. Visual all the results with a volcano plot

Import

We need to import both the normalised counts and the statistical results. We will need all of these for the visualisation and interpretation.

🎬 Import the normalised counts for the Prog and HSPC cell types. I used the names prog and hspc for the dataframes.

🎬 Combine the two dataframes (minus one set of gene ids) into one dataframe called prog_hspc:

# combine into one dataframe dropping one of the gene id columns
prog_hspc <- bind_cols(prog, hspc[-1])

🎬 Import the statistical results in results/prog_hspc_results.csv. I used the name prog_hspc_results for the dataframe.

🎬 Remind yourself what is in the rows and columns and the structure of the dataframe (perhaps using glimpse())

It is useful to have this information in a single dataframe to which we will add the gene information from Ensembl Having all the information together will make it easier to interpret the results and select genes of interest.

🎬 Merge the two dataframes:

# merge stats results with normalise values
prog_hspc_results <- prog_hspc_results |> 
  left_join(prog_hspc, by = "ensembl_gene_id")

This means you have the counts for each sample along with the statistical results for each gene.

Write the significant genes to file

We will create dateframe of the signifcant genes and write them to file. These are the files you want to examine in more detail along with the visualisations to select your genes of interest.

🎬 Create a dataframe of the genes significant at the 0.01 level:

prog_hspc_results_sig0.01 <- prog_hspc_results |> 
  filter(FDR <= 0.01)

🎬 Write the dataframe to file

🎬 Create a dataframe of the genes significant at the 0.05 level and write to file:

❓How many genes are significant at the 0.01 and 0.05 levels?

View the relationship between cells using PCA

We have 280 genes in our dataset. PCA will allow us to plot our cells in the β€œgene expression” space so we can see if Prog cells cluster together and HSPC cells cluster together as we would expect. We do this on the log2 transformed normalised counts.

Our data have genes in rows and samples in columns which is a common organisation for gene expression data. However, PCA expects cells in rows and genes, the variables, in columns. We can transpose the data to get it in the correct format.

🎬 Transpose the log2 transformed normalised counts:

prog_hspc_trans <- prog_hspc_results |> 
  dplyr::select(starts_with(c("Prog_", "HSPC_"))) |>
  t() |> 
  data.frame()

We have used the select() function to select all the columns that start with Prog_ or HSPC_. We then use the t() function to transpose the dataframe. We then convert the resulting matrix to a dataframe using data.frame(). If you view that dataframe you’ll see it has default column name which we can fix using colnames() to set the column names to the gene ids.

🎬 Set the column names to the gene ids:

colnames(prog_hspc_trans) <- prog_hspc_results$ensembl_gene_id

perform PCA using standard functions

pca <- prog_hspc_trans |>
  prcomp(rank. = 15) 

The rank. argument tells prcomp() to only calculate the first 15 principal components. This is useful for visualisation as we can only plot in 2 or 3 dimensions. We can see the results of the PCA by viewing the summary() of the pca object.

summary(pca)
Importance of first k=15 (out of 280) components:
                           PC1     PC2     PC3     PC4     PC5     PC6     PC7
Standard deviation     12.5612 8.36646 5.98988 5.41386 4.55730 4.06142 3.84444
Proportion of Variance  0.1099 0.04874 0.02498 0.02041 0.01446 0.01149 0.01029
Cumulative Proportion   0.1099 0.15861 0.18359 0.20400 0.21846 0.22995 0.24024
                           PC8     PC9   PC10    PC11    PC12    PC13    PC14
Standard deviation     3.70848 3.66899 3.5549 3.48508 3.44964 3.42393 3.37882
Proportion of Variance 0.00958 0.00937 0.0088 0.00846 0.00829 0.00816 0.00795
Cumulative Proportion  0.24982 0.25919 0.2680 0.27645 0.28473 0.29290 0.30085
                          PC15
Standard deviation     3.33622
Proportion of Variance 0.00775
Cumulative Proportion  0.30860

The Proportion of Variance tells us how much of the variance is explained by each component. We can see that the first component explains 0.1099 of the variance, the second 0.04874, and the third 0.2498. Together the first three components explain 18% of the total variance in the data. Plotting PC1 against PC2 will capture about 16% of the variance. This is not that high but it likely better than we would get plotting any two genes against each other. To plot the PC1 against PC2 we will need to extract the PC1 and PC2 score from the pca object and add labels for the cells.

🎬 Create a dataframe of the PC1 and PC2 scores which are in pca$x and add the cell ids:

pca_labelled <- data.frame(pca$x,
                           cell_id = row.names(prog_hspc_trans))

It will be helpful to add a column for the cell type so we can label points. One way to do this is to extract the information in the cell_id column into two columns.

🎬 Extract the cell type and cell number from the cell_id column (keeping the cell_id column):

pca_labelled <- pca_labelled |> 
  extract(cell_id, 
          remove = FALSE,
          c("cell_type", "cell_number"),
          "([a-zA-Z]{4})_([0-9]{3})")

"([a-zA-Z]{4})_([0-9]{3})" is a regular expression - or regex. [a-zA-Z] means any lower or upper case letter, {4} means 4 of them, and [0-9] means any number, {3} means 3 of them. The brackets around the two parts of the regex mean we want to extract those parts. The first part goes into cell_type and the second part goes into cell_number. The _ between the two patterns matches the underscore and the fact it isn’t in a bracket means we don’t want to keep it.

We can now plot the PC1 and PC2 scores.

🎬 Plot PC1 against PC2 and colour the points by cell type:

pca <- pca_labelled |> 
  ggplot(aes(x = PC1, y = PC2, 
             colour = cell_type)) +
  geom_point(alpha = 0.4) +
  scale_colour_viridis_d(end = 0.8, begin = 0.15,
                         name = "Cell type") +
  theme_classic()
pca

Fairly good separation of cell types but plenty of overlap

🎬 Save the plot to file:

ggsave("figures/prog_hspc-pca.png",
       plot = pca,
       height = 3, 
       width = 4,
       units = "in",
       device = "png")

Visualise the expression of the most significant genes using a heatmap

A heatmap is a common way to visualise gene expression data. Often people will create heatmaps with thousands of genes but it can be more informative to use a subset along with clustering methods. We will use the genes which are significant at the 0.01 level.

We are going to create an interactive heatmap with the heatmaply (Galili et al. 2017) package. heatmaply takes a matrix as input so we need to convert a dataframe of the log2 values to a matrix. We will also set the rownames to the gene names.

🎬 Convert a dataframe of the log2 values to a matrix. I have used sample() to select 70 random columns so the heatmap is generated quickly:

mat <- prog_hspc_results_sig0.01 |> 
  dplyr::select(starts_with(c("Prog", "HSPC"))) |>
  dplyr::select(sample(1:1499, size = 70)) |>
  as.matrix()

🎬 Set the row names to the gene names:

rownames(mat) <- prog_hspc_results_sig0.01$external_gene_name

You might want to view the matrix by clicking on it in the environment pane.

🎬 Load the heatmaply package:

We need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the cell types to be 2 and the number of clusters for the genes to be the same since it makes sense to see what clusters of genes correlate with the cell types.

n_cell_clusters <- 2
n_gene_clusters <- 2

🎬 Create the heatmap:

heatmaply(mat, 
          scale = "row",
          k_col = n_cell_clusters,
          k_row = n_gene_clusters,
          fontsize_row = 7, fontsize_col = 10,
          labCol = colnames(mat),
          labRow = rownames(mat),
          heatmap_layers = theme(axis.line = element_blank()))

It will take a minute to run and display. On the vertical axis are genes which are differentially expressed at the 0.01 level. On the horizontal axis are cells. We can see that cells of the same type don’t cluster that well together. We can also see two clusters of genes but the pattern of gene is not as clear as it was for the frogs and the correspondence with the cell clusters is not as strong.

The heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can β€œShow in a new window” to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting hide_colorbar = TRUE, in the heatmaply() function.

Using all the cells is worth doing but it will take a while to generate the heatmap and then show in the viewer so do it sometime when you’re ready for a coffee break.

Visualise all the results with a volcano plot

colour the points if FDR < 0.05 and prog_hspc_results > 1

prog_hspc_results <- prog_hspc_results |> 
  mutate(log10_FDR = -log10(FDR),
         sig = FDR < 0.05,
         bigfc = abs(summary.logFC) >= 2) 
vol <- prog_hspc_results |> 
  ggplot(aes(x = summary.logFC, 
             y = log10_FDR, 
             colour = interaction(sig, bigfc))) +
  geom_point() +
  geom_hline(yintercept = -log10(0.05), 
             linetype = "dashed") +
  geom_vline(xintercept = 2, 
             linetype = "dashed") +
  geom_vline(xintercept = -2, 
             linetype = "dashed") +
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0)) +
  scale_colour_manual(values = c("gray",
                                 "pink",
                                 "deeppink")) +
  geom_text_repel(data = subset(prog_hspc_results, 
                                bigfc & sig),
                  aes(label = external_gene_name),
                  size = 3,
                  max.overlaps = 50) +
  theme_classic() +
  theme(legend.position = "none")
vol
Error in `geom_text_repel()`:
! Problem while computing aesthetics.
β„Ή Error occurred in the 5th layer.
Caused by error:
! object 'external_gene_name' not found
ggsave("figures/prog-hspc-volcano.png",
       plot = vol,
       height = 4.5, 
       width = 4.5,
       units = "in",
       device = "png")
Error in `geom_text_repel()`:
! Problem while computing aesthetics.
β„Ή Error occurred in the 5th layer.
Caused by error:
! object 'external_gene_name' not found

πŸ€— Look after future you!

🎬 Go through your script (cont-fgf-s30.R or hspc-prog.R) and tidy up. I would suggest restarting R and trying to run the full pipeline from start to finish. You might need to :

  • collect together library statements at the top of the script
  • remove code that you needed to start today but which wouldn’t be needed running the script from the top (e.g., importing)
  • edit your comments for clarity
  • rename variables for consistency or clarity
  • remove house keeping or exploratory code
  • restyle code, add code section headers etc

πŸ₯³ Finished

Well Done!

Independent study following the workshop

Consolidate

The Code file

These contain all the code needed in the workshop even where it is not visible on the webpage.

The workshop.qmd file is the file I use to compile the practical. Qmd stands for Quarto markdown. It allows code and ordinary text to be interleaved to produce well-formatted reports including webpages. Right-click on the link and choose Save-As to download. You will be able to open the Qmd file in RStudio. Alternatively, View in Browser. Coding and thinking answers are marked with #---CODING ANSWER--- and #---THINKING ANSWER---

Pages made with R (R Core Team 2024), Quarto (Allaire et al. 2024), knitr [Xie (2024); knitr2; knitr3], kableExtra (Zhu 2021)

References

Allaire, J. J., Charles Teague, Carlos Scheidegger, Yihui Xie, and Christophe Dervieux. 2024. β€œQuarto.” https://doi.org/10.5281/zenodo.5960048.
Fisher, Malcolm, Christina James-Zorn, Virgilio Ponferrada, Andrew J Bell, Nivitha Sundararaj, Erik Segerdell, Praneet Chaturvedi, et al. 2023. β€œXenbase: Key Features and Resources of the Xenopus Model Organism Knowledgebase.” Genetics 224 (1): iyad018. https://doi.org/10.1093/genetics/iyad018.
Galili, Tal, O’Callaghan, Alan, Sidi, Jonathan, Sievert, and Carson. 2017. β€œHeatmaply: An r Package for Creating Interactive Cluster Heatmaps for Online Publishing.” Bioinformatics. https://doi.org/10.1093/bioinformatics/btx657.
Hester, Jim, Hadley Wickham, and GΓ‘bor CsΓ‘rdi. 2023. β€œFs: Cross-Platform File System Operations Based on ’Libuv’.”
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley. 2023. Conflicted: An Alternative Conflict Resolution Strategy. https://conflicted.r-lib.org/.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain FranΓ§ois, Garrett Grolemund, et al. 2019. β€œWelcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Xie, Yihui. 2024. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Zhu, Hao. 2021. β€œkableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax.” https://CRAN.R-project.org/package=kableExtra.