Monday, May 10, 2010

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

Pubmed Direct Link

Similar to another paper I blogged about, this paper describes a novel method to analyze NGS data wrapped in a biology coating (mouse genome analysis). I really like this way of describing a method since this highlights the motivation for the method's creation: to answer a question in biology.

This paper describes a method called Cufflinks, which uses TopHat to find splice junctions, which uses bowtie for mapping. Here you can see the recursive dependency tree on which this method is built. You must trust the upstream methods first.

Off to the methods section. Just like any nature paper, the methods are hidden in in the supplemental materials. I am glad to see the supplementary methods written in latex using the hyperref package. This makes it easy to move around the document quickly, plus latex is much easier to write and format than Word or other programs.

The main contribution is the transcript abundance calculation. I would strongly recommend reading the supplementary methods as this is a great Bayesian discussion of transcript assembly and abundance estimation. It also covers some nice computer science Theorems. The basic idea is to find the minimal set of transcripts that explain the fragments, then quantify each transcript. Seems simple right?

Anyhow, take a look at this paper as Cufflinks (with TopHat (with bowtie)) is a very popular pipeline for RNA-SEQ. You should probably understand what it is doing before you use it. Kudos to the authors.

No comments:

Post a Comment