Saturday, September 27, 2014

Sufficient biological replication is essential for differential expression analysis of RNA-seq

I just took part in a twitter discussion about the trade-offs between sequencing depth and number of independent biological replicate (per treatment group) for differential gene expression analysis. While there are applications of RNA-seq where  sequencing deeply (more than say 50 million reads for a given sample) can be important for discovery. However, most researchers I interact with are interested at some level with differential expression among groups (different genotypes, species, tissues, etc).  As with everything else that requires making estimates and quantifying uncertainty for those estimates (minimally necessary for differential expression), you need independent biological samples within each group as well. The ENCODE guidelines suggest a minimum of 2 biological replicates per treatment group (well they do not say "biological" replicates, but I will give them the benefit of the doubt).

However, numerous studies have demonstrated that 2 is rarely sufficient (see links below). I have no idea where the ENCODE got this number from. Generally you want to aim for 4 or more for simple experimental designs. There are numerous studies that have shown this (both by simulation and by rarefaction analysis). These also demonstrate that on balance, beyond a certain read depth per sample (somewhere between 10-25 million reads per sample) there is diminishing returns for rare transcripts (in terms of differential expression), and that it is better to do more independent biological replication (say 5 samples each at 20 million reads) rather than more depth (2 independent biological samples at 50 million reads each). The exact number depends on a number of factors including biological variability (and measurement error) within groups, as well as experimental design. A number of tools have been developed to help folks with figuring out optimal designs.

Here are just a few such studies (there are many more, just wanted a handful for the moment).

http://www.ncbi.nlm.nih.gov/pubmed/24319002
http://www.ncbi.nlm.nih.gov/pubmed/25246651
http://www.ncbi.nlm.nih.gov/pubmed/22985019
http://www.ncbi.nlm.nih.gov/pubmed/22268221
http://www.ncbi.nlm.nih.gov/pubmed/23497356

Check out
http://bfg.oxfordjournals.org/content/early/2011/12/30/bfgp.elr041.full.pdf+html
for a brief and succinct discussion of these and other issues.

And yes, depending on your questions, read length (and PE for SE ) also contribute!

No comments:

Post a Comment