Previous and Current Research

Metagenomics, the direct analysis of DNA from a whole environmental community, represents a strategy for discovering genes with diverse functionality. In the past, the identification of new genes with desired activities has relied primarily on relatively low-throughput function-based screening of environmental DNA clone libraries. Current sequencing technologies can generate more than 600 Gbp of sequence data in a single experiment, allowing sequence-based metagenomic discovery of complete genes or even genomes from environmental samples with moderate microbial species complexity.The cow rumen metagenome, sequenced at the DOE Joint Genome Institute (JGI), is one of the largest metagenomic datasets from a single sample to date (>500 Gbp). The paucity of enzymes that efficiently deconstruct plant polysaccharides represents a major bottleneck for industrial-scale conversion of cellulosic biomass into biofuels. Cow rumen microbes specialize in degradation of cellulosic plant material and are therefore an promising target for the identification of novel carbohydrate-active genes. Datasets of such a large size require high-throughput computational techniques to cope with the analysis of billions of sequencing reads. In collaboration with the JGI we develop high-throughput gene-centric and de-novo assembly pipelines for metagenomic datasets. In case of the cow rumen dataset, we were able to identify more than 27,000 putative carbohydrate-active genes and assemble 15 uncultured microbial genomes using these pipelines.
CowRumen2 webMetagenomic sequencing identified 27,755 putative biomass degrading enzymes in the cow rumen microbiome.

Future Projects and Aims

Despite the fact that we managed to assemble a large number of genes and genomes from a complex metagenome as the cow rumen, there is still a need for metagenome-specific assemblers. Current short read assemblers were specifically designed for the assembly of isolate genomes, but metagenome data sets pose a number of challenges on the assembly problem. We are developing new tools and approaches for the metagenomic assembly problem.