Saturday, May 8, 2010

Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome

Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome.

Pubmed Direct Link

This paper presents a new method to detect structural variant (SV) breakpoints called HYDRA.  Since it is a Genome Research (GR) paper, the paper focuses on answering a question in biology, namely detecting SVs in the mouse genome; I will focus on the method.

SVs are detected by clustering discordant matepairs.  Matepairs are pairs of reads from the same DNA fragment with some base distance (insert size) between the ends.  Discordant reads are reads for which the insert size is outside the expected distribution.  Hydra seeks to use multiple mappings per read, where each read may map to multiple locations, albeit with possibly different quality/probability.  The method then seeks to select the correct mapping for each read collectively for all reads parsimoniously. This is to detect SVs in regions with segmental duplications, copy number changes, or other repetitive or difficult to map regions.  This is an extremely important feature.  A minor point is how to store multiple mappings, as a whole genome human resequencing BAM file may be hundreds of gigabytes when there is only one mapping per read. Nevertheless, HYDRA does not report what type of SV occurs, but only that there exists a SV.  Further analysis needs to be performed to determine the type of SV (insert/deletion/CNV/inversion/etc.).  The authors suggest to use bedtools to do this (their own software).  It is yet to be seen if author software will come out that automate this whole process