Transcriptomics 1: 👋 Hello data!
18 September, 2024
Concise summary of the experimental design and aims
What the raw data consist of
What has been done to the data so far
What steps we will take in the workshop
There are three datasets
🐸 transcriptomic data (bulk RNA-seq) from frog embryos.
🐭 transcriptomic data (single cell RNA-seq) from stemcells
🍂 ??????? Metabolomic / Metagenomic data from anaerobic digesters
3 fertilisations
two siblings from each fertilisation one control, on FGF treated
sequenced at three time points
3 x 2 x 3 = 18 groups
3 fertilisations. These are the replicates, 1
, 2
, 3
two siblings from each fertilisation one control, one FGF treated. The treatments are paired
sequenced at three time points. S14, S20, S30
3 x 2 x 3 = 18 groups
find genes important in frog development
Important means the genes that are differentially expressed between the control-treated and the FGF-treated siblings
Differentially expressed means the expression in one group is significantly higher than in the other
The workshops will take you through comparing the control and FGF treated sibling at S30
This is the “least interesting” comparison
You will be guided to carefully document your work so you can apply the same methods to other comparisons
Cells were sorted using flow cytometry on the basis of cell surface markers
There are three cell types: LT-HSCs, HSPCs, Progs
Many cells of each cell type were sequenced
There are three cell types: LT-HSCs, HSPCs, Progs These are the “treaments”
Many cells of each type were sequenced: These are the replicates
155 LT-HSCs, 701 HSPCs, 798 Progs
find genes for cell surface proteins that are important in stem cell identity
Important means genes that are differentially expressed between at least two cell types
Differentially expressed means the expression in one group is significantly higher than in the other
The workshops will take you through comparing the HSPC and Prog cells
This is the “least interesting” comparison
You will be guided to carefully document your work so you can apply the same methods to other comparisons
The raw data are “reads” from a sequencing machine.
A read is sequence of DNA or RNA shorter than the whole genome or transcriptome
The length of the reads depends on the type of sequencing machine
Sequencing technology is constantly improving
Optional: You can read more about Sequencing technologies in Statistically useful experimental design (Rand and Forrester 2022)
The RNA-seq data are from an Illumina machine 150-300bp
Reads are in FASTQ files
FASTQ files contain the sequence of each read and a quality score for each base
Reads are filtered and trimmed on the basis of the quality score
They are then aligned/pseudo-aligned to a reference genome/transcriptome
Reads are then counted to quantify the expression
Counts will need to be normalised to account for differences in sequencing depth and gene/transcript/ length before, or as part of, statistical analysis.
Unpublished (so far!)
Expression for the whole transcriptome X. laevis v10.1 genome assembly
Values are raw counts
The statistical analysis method we will use DESeq2
(Love, Huber, and Anders 2014) requires raw counts and performs the normalisation itself
Published in Nestorowa et al. (2016)
Expression for a subset of genes, the surfaceome
Values are log2 normalised values
The statistical analysis method we will use scran
(Lun, McCarthy, and Marioni 2016) requires normalised values
Transcriptomics 1: Hello data Getting to know the data. Checking the distributions of values overall, across samples and across genes to check things are as we expect and detect genes/samples that need to be removed
Transcriptomics 2: Statistical Analysis Identifying which genes are differentially expressed between treatments. This is the main analysis step. We will use different methods for bulk and single cell data.
Transcriptomics 3: Visualising and Interpreting Production of volcano plots and heatmaps to visualise the results of the statistical analysis. We will also look at how to interpret the results and how to find out more about the genes of interest.