Independent Study to prepare for workshop

Transcriptomics 1: 👋 Hello data!

Emma Rand

17 December, 2024

Overview

  • Concise summary of the experimental design and aims

  • What the raw data consist of

  • What has been done to the data so far

  • What steps we will take in the workshop

The Data

There are 4 transcriptomic datasets

  • 🐸 bulk RNA-seq from Xenopus laevis embryos.

  • 🎄 bulk RNA-seq from Arabidopsis thaliana

  • 💉 bulk RNA-seq from Leishmania mexicana

  • 🐭 single cell RNA-seq from mouse stemcells

Experimental design

🐸 Experimental design

Schematic of frog development experiment

🐸 Experimental design

Schematic of frog development experiment

  • 3 fertilisations

  • 2 siblings from each fertilisation one control, one FGF treated

  • sequenced at 3 time points

  • 3 x 2 x 3 = 18 samples

🐸 Experimental design

Schematic of frog development experiment

  • 3 fertilisations. These are the replicates, 1, 2, 3

  • 2 siblings from each fertilisation one control, one FGF treated. The treatments are paired

  • sequenced at 3 time points. S14, S20, S30

  • 3 x 2 x 3 = 18 samples

🐸 Aim

  • Find genes that are “differentially expressed” between control-treated and FGF-treated siblings

  • Differentially expressed means the expression in one group is significantly higher than in the other

🐸 Guided analysis

  • The workshops will take you through comparing the control and FGF treated sibling at S30

  • You will make other comparisons independently

  • You will be guided to carefully document your work so you can apply the same methods to other comparisons

  • Do the independent study before and after the workshop!

🎄 Experimental design

Schematic of arabidopsis experiment

🎄 Experimental design

Schematic of arabidopsis experiment

  • 2 plant genotypes

  • 2 copper conditions

  • 2 plants

  • 2 x 2 x 3 = 8 samples

🎄 Experimental design

Schematic of arabidopsis experiment

  • 2 plant genotypes: wildtype and spl7 mutant. This is the genotype treatment

  • 2 copper conditions: sufficient and deficient. This is the Cu treatment

  • 2 plants. These are the replicates

  • 2 x 2 x 3 = 8 samples

🎄 Aim

  • Find genes that are “differentially expressed” between plant types and copper conditions e.g. wildtype plants grown under copper sufficient and copper deficient conditions

  • Differentially expressed means the expression in one group is significantly higher than in the other

🎄 Guided analysis

  • The workshops will take you through comparing the wildtype plants grown under copper sufficient and copper deficient conditions

  • You will make other comparisons independently

  • You will be guided to carefully document your work so you can apply the same methods to other comparisons

  • Do the independent study before and after the workshop!

💉 Experimental design

Schematic of leishmania experiment

💉 Experimental design

Schematic of leishmania experiment

  • 3 stages

  • 3 samples

  • 3 x 3 = 9 samples

💉 Experimental design

Schematic of leishmania experiment

  • three stages: procyclic promastigotes, metacyclic promastigotes and amastigotes. This is the stage treatment

  • three samples. These are the replicates

  • 3 x 3 = 9 samples

💉 Aim

  • Find genes that are “differentially expressed” between stages e.g., procyclic promastigotes and the metacyclic promastigotes

  • Differentially expressed means the expression in one group is significantly higher than in the other

💉 Guided analysis

  • The workshops will take you through comparing the procyclic promastigotes and the metacyclic promastigotes

  • You will make other comparisons independently

  • You will be guided to carefully document your work so you can apply the same methods to other comparisons

  • Do the independent study before and after the workshop!

🐭 Experimental design

Schematic of stem cell experiment

🐭 Experimental design

Schematic of stem cell experiment

  • Cells were sorted using flow cytometry on the basis of cell surface markers

  • There are 3 cell types

  • Many cells of each cell type were sequenced

🐭 Experimental design

Schematic of stem cell experiment

  • There are three cell types: LT-HSCs, HSPCs, Progs This is the cell “treatment”

  • Many cells of each type were sequenced: These are the replicates

  • 155 LT-HSCs, 701 HSPCs, 798 Progs

🐭 Aim

  • find genes that are “differentially expressed” between at least two cell types

  • Differentially expressed means the expression in one group is significantly higher than in the other

🐭 Guided analysis

  • The workshops will take you through comparing the HSPC and Prog cells

  • You will make other comparisons independently

  • You will be guided to carefully document your work so you can apply the same methods to other comparisons

  • Do the independent study before and after the workshop!

Where do the data come from?

Raw Sequence data

  • The raw data are “reads” from a sequencing machine in FASTQ files

  • A read is sequence of RNA which is shorter than the whole transcriptome

  • The length of the reads depends on the type of sequencing machine

    • Short-read technologies (e.g. Illumina) have higher base accuracy but are harder to align
    • Long-read technologies (e.g. Nanopore) have lower base accuracy but are easier to align

Raw Sequence data

Optional

You can read more about Sequencing technologies in Statistically useful experimental design(Rand and Forrester 2022)

What has been done to the data so far

General steps

  • Reads are filtered and trimmed on the basis of a quality score

  • They are then aligned/pseudo-aligned to a reference genome/transcriptome (or assembled de novo)

  • And then counted to quantify the expression

  • Counts need to be normalised to account for differences in sequencing depth and transcript length before, or as part of, statistical analysis.

🐸 Data

🎄 Data

💉 Data

🐭 Data

  • Published in Nestorowa et al. (2016)

  • Expression for a subset of genes, the surfaceome

  • Values are log2 normalised values

  • The statistical analysis method we will use is scran (Lun, McCarthy, and Marioni 2016) and it requires normalised values

Workshops

Workshops

  • Transcriptomics 1: Hello data Getting to know the data. Checking the distributions of values overall, across rows and columns to check things are as we expect and detect rows/columns that need to be removed

  • Transcriptomics 2: Statistical Analysis. Identifying which genes are differentially expressed between treatments. This is the main analysis step. We will use different methods for bulk and single cell data.

  • Transcriptomics 3: Visualising. Principal Component Analysis (PCA) and volcano plots to visualise the results of the analysis.

References

Pages made with R (R Core Team 2024), Quarto (Allaire et al. 2024), knitr (Xie 2024, 2015, 2014), kableExtra (Zhu 2021)

Allaire, J. J., Charles Teague, Carlos Scheidegger, Yihui Xie, and Christophe Dervieux. 2024. Quarto.” https://doi.org/10.5281/zenodo.5960048.
Bernal, María, David Casero, Vasantika Singh, Grandon T. Wilson, Arne Grande, Huijun Yang, Sheel C. Dodani, et al. 2012. “Transcriptome Sequencing Identifies SPL7-Regulated Copper Acquisition Genes FRO4/FRO5 and the Copper Dependence of Iron Homeostasis in Arabidopsis.” The Plant Cell 24 (2): 738–61. https://doi.org/10.1105/tpc.111.090431.
Fisher, Malcolm, Christina James-Zorn, Virgilio Ponferrada, Andrew J Bell, Nivitha Sundararaj, Erik Segerdell, Praneet Chaturvedi, et al. 2023. “Xenbase: Key Features and Resources of the Xenopus Model Organism Knowledgebase.” Genetics 224 (1): iyad018. https://doi.org/10.1093/genetics/iyad018.
Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome Biology 15: 550. https://doi.org/10.1186/s13059-014-0550-8.
Lun, Aaron T. L., Davis J. McCarthy, and John C. Marioni. 2016. “A Step-by-Step Workflow for Low-Level Analysis of Single-Cell RNA-Seq Data with Bioconductor.” F1000Res. 5: 2122. https://doi.org/10.12688/f1000research.9501.2.
Nestorowa, Sonia, Fiona K. Hamey, Blanca Pijuan Sala, Evangelia Diamanti, Mairi Shepherd, Elisa Laurenti, Nicola K. Wilson, David G. Kent, and Berthold Göttgens. 2016. “A Single-Cell Resolution Map of Mouse Hematopoietic Stem and Progenitor Cell Differentiation.” Blood 128 (8): e20–31. https://doi.org/10.1182/blood-2016-05-716480.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Rand, Emma, and Sarah Forrester. 2022. “Statistically Useful Experimental Design.” https://cloud-span.github.io/experimental_design00-overview/.
Rogers, Matthew B., James D. Hilley, Nicholas J. Dickens, Jon Wilkes, Paul A. Bates, Daniel P. Depledge, David Harris, et al. 2011. “Chromosome and Gene Copy Number Variation Allow Major Structural Change Between Species and Strains of Leishmania.” Genome Research 21 (12): 2129–42. https://doi.org/10.1101/gr.122945.111.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2024. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Yates, Andrew D, James Allen, Ridwan M Amode, Andrey G Azov, Matthieu Barba, Andrés Becerra, Jyothish Bhai, et al. 2022. “Ensembl Genomes 2022: An Expanding Genome Resource for Non-Vertebrates.” Nucleic Acids Research 50 (D1): D996–1003. https://doi.org/10.1093/nar/gkab1007.
Zhu, Hao. 2021. “kableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax.” https://CRAN.R-project.org/package=kableExtra.