CeBiTec Colloquium (unscheduled)

 date 

Monday, July 21st 2011, 11 c.t.

 location 

G2-104, CeBiTec Building

 speaker 

Dr. Inanc Birol

Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada

title 

Haploid Assembly of Diploid Genomes

  Most assembly algorithms in use have an implicit assumption that a sequenced nucleic acid has a haploid-like structure, or at worst that the sequence representing a genomic location is present with a high similarity in the read data. This assumption holds nicely in the case of model organisms, which are usually inbred to reduce haplotypic diversity, and is even not challenged significantly in the case of human samples because of an evolutionarily recent population bottleneck our species experienced. However, when an organism of interest has a diploid genome with a pronounced distance between haplotypes, or when an environmental sample is investigated that represents a collection of similar species, the assumption fails.
 
Furthermore, read lengths of popular high throughput sequencing technologies, and even the spatial associations brought by paired end reads are often not enough to completely de-phase the assembled sequences. In this talk, I will discuss the particular challenges one faces during de novo assembly of such datasets, and describe how we approach to their solution within the ABySS framework.
 
ABySS is a short read assembly tool, particularly developed to address large scale assembly problems. I will introduce how it handles distance information to extend and scaffold contigs, and how it represents haplotypic information in its output. I will demonstrate its performance on the assembly of the mountain pine beetle genome and human transcriptomes.

 host 

Prof. Dr. J. Stoye