Transcriptomics 1: 👋 Hello data!
6 November, 2024
Concise summary of the experimental design and aims
What the raw data consist of
What has been done to the data so far
What steps we will take in the workshop
There are 4 transcriptomic datasets
🐸 bulk RNA-seq from Xenopus laevis embryos.
🎄 bulk RNA-seq from Arabidopsis thaliana
💉 bulk RNA-seq from Leishmania mexicana
🐭 single cell RNA-seq from mouse stemcells
3 fertilisations
2 siblings from each fertilisation one control, one FGF treated
sequenced at 3 time points
3 x 2 x 3 = 18 samples
3 fertilisations. These are the replicates, 1
, 2
, 3
2 siblings from each fertilisation one control, one FGF treated. The treatments are paired
sequenced at 3 time points. S14, S20, S30
3 x 2 x 3 = 18 samples
Find genes that are “differentially expressed” between control-treated and FGF-treated siblings
Differentially expressed means the expression in one group is significantly higher than in the other
The workshops will take you through comparing the control and FGF treated sibling at S30
You will make other comparisons independently
You will be guided to carefully document your work so you can apply the same methods to other comparisons
Do the independent study before and after the workshop!
2 plant genotypes
2 copper conditions
2 plants
2 x 2 x 3 = 8 samples
2 plant genotypes: wildtype and spl7 mutant. This is the genotype treatment
2 copper conditions: sufficient and deficient. This is the Cu treatment
2 plants. These are the replicates
2 x 2 x 3 = 8 samples
Find genes that are “differentially expressed” between plant types and copper conditions e.g. wildtype plants grown under copper sufficient and copper deficient conditions
Differentially expressed means the expression in one group is significantly higher than in the other
The workshops will take you through comparing the wildtype plants grown under copper sufficient and copper deficient conditions
You will make other comparisons independently
You will be guided to carefully document your work so you can apply the same methods to other comparisons
Do the independent study before and after the workshop!
3 stages
3 samples
3 x 3 = 9 samples
three stages: procyclic promastigotes, metacyclic promastigotes and amastigotes. This is the stage treatment
three samples. These are the replicates
3 x 3 = 9 samples
Find genes that are “differentially expressed” between stages e.g., procyclic promastigotes and the metacyclic promastigotes
Differentially expressed means the expression in one group is significantly higher than in the other
The workshops will take you through comparing the procyclic promastigotes and the metacyclic promastigotes
You will make other comparisons independently
You will be guided to carefully document your work so you can apply the same methods to other comparisons
Do the independent study before and after the workshop!
Cells were sorted using flow cytometry on the basis of cell surface markers
There are 3 cell types
Many cells of each cell type were sequenced
There are three cell types: LT-HSCs, HSPCs, Progs This is the cell “treatment”
Many cells of each type were sequenced: These are the replicates
155 LT-HSCs, 701 HSPCs, 798 Progs
find genes that are “differentially expressed” between at least two cell types
Differentially expressed means the expression in one group is significantly higher than in the other
The workshops will take you through comparing the HSPC and Prog cells
You will make other comparisons independently
You will be guided to carefully document your work so you can apply the same methods to other comparisons
Do the independent study before and after the workshop!
The raw data are “reads” from a sequencing machine in FASTQ files
A read is sequence of RNA which is shorter than the whole transcriptome
The length of the reads depends on the type of sequencing machine
You can read more about Sequencing technologies in Statistically useful experimental design(Rand and Forrester 2022)
Reads are filtered and trimmed on the basis of a quality score
They are then aligned/pseudo-aligned to a reference genome/transcriptome (or assembled de novo)
And then counted to quantify the expression
Counts need to be normalised to account for differences in sequencing depth and transcript length before, or as part of, statistical analysis.
Unpublished (so far!)
Expression for the whole transcriptome X. laevis v10.1 genome assembly(Fisher et al. 2023)
Values are raw counts
The statistical analysis method we will use is DESeq2
(Love, Huber, and Anders 2014). It requires raw counts and performs the normalisation itself.
Based on PRJNA132271(Bernal et al. 2012)
Expression for the whole transcriptome ENSEMBL Arabidopsis TAIR10(Yates et al. 2022)
Values are raw counts
The statistical analysis method we will use is DESeq2
(Love, Huber, and Anders 2014). It requires raw counts and performs the normalisation itself.
Brand spanking new!
Expression for the whole transcriptome L. mexicana MHOM/GT/2001/U1103(Rogers et al. 2011)
Values are raw counts
The statistical analysis method we will use is DESeq2
(Love, Huber, and Anders 2014). It requires raw counts and performs the normalisation itself.
Published in Nestorowa et al. (2016)
Expression for a subset of genes, the surfaceome
Values are log2 normalised values
The statistical analysis method we will use is scran
(Lun, McCarthy, and Marioni 2016) and it requires normalised values
Transcriptomics 1: Hello data Getting to know the data. Checking the distributions of values overall, across rows and columns to check things are as we expect and detect rows/columns that need to be removed
Transcriptomics 2: Statistical Analysis. Identifying which genes are differentially expressed between treatments. This is the main analysis step. We will use different methods for bulk and single cell data.
Transcriptomics 3: Visualising. Principal Component Analysis (PCA) and volcano plots to visualise the results of the analysis.
Pages made with R (R Core Team 2024), Quarto (Allaire et al. 2024), knitr
(Xie 2024, 2015, 2014), kableExtra
(Zhu 2021)