Exploring genome structure and dynamics
Genome informatics aims to fill the gap between genomic data and its biological interpretation by developing efficient and effective computational methods. In our research, we span a broad spectrum from the low level of DNA sequence comparison up to the higher levels of comparative genomics, metagenomics and phylogenetics.
Previous and Current Research
Comparative Genomics is a powerful paradigm for the analysis of genomic data, applied in several contexts, from functional annotation of genes to phylogenomics and comparison of whole genomes. The dramatically increasing amount of available data requires an important research effort in the development of comparative models, biologically sound and mathematically well understood, and of efficient algorithms and software that can handle large data sets.
To achieve these goals, various lines of research are conducted in the Genome Informatics group.
In sequence analysis, we develop index-based analysis methods for large-scale sequence comparison, pattern search, and pattern discovery. A recent research project addresses an important task in whole-genome sequencing. We develop software tools assisting in closing the gaps that remain between the contigs after a standard assembly of shotgun reads. In another, young branch of sequence analysis, metagenomics, we analyze sequencing data obtained from complex environmental samples to characterize their species composition.
In whole-genome comparison, we usually consider genomes at the level of gene orders. Here, one research branch is the study of gene clusters, i.e. sets of genes that are co-localized in several genomes and might thus be functionally related. We develop mathematically sound and biologically reasonable models, and, besides efficient algorithms for their detection, we are interested in the evolution of gene clusters in a phylogenetic context.
We are also working on models and algorithms for genomic rearrangement. The Double-Cut-and-Join (DCJ) operation provides a unifying concept for the well-known rearrangement events such as inversions, translocations, fissions, fusions, and transpositions. This model not only considerably simplifies the formal treatment of the different events. Moreover, the DCJ operation is an important tool with the potential to go even beyond the classical questions, addressing rearrangement problems that involve either gene duplications or missing information about the actual order of genes in a genome.
Future Projects and Aims
Historically, genomic sequence analysis and genome rearrangement studies have been performed at different levels of granularity: While in sequence analysis the DNA bases are the basic entities, in rearrangement studies the order of genes or other unique markers in the evolving genome is studied. Unifying both pictures into one formal model is one of our goals. First, fruitful attempts are considering rearrangements in contig assembly. Also, in a new project, we integrate sequence similarity information into new, more sensible gene cluster models. Moreover, we believe that the mathematical theory of genome rearrangements can be considerably simplified. With our studies of theDCJ operation we have done first steps in this direction, but we believe that much more is possible.