Development of computational methods for the analysis of metagenome and metatranscriptome data (Dr. Martha Zakrzewski)

The fields of metagenomics and metatranscriptomics have evolved as helpful disciplines to unlock the taxonomic composition and functional diversity of heterogeneous microbial communities in their natural habitats. Both fields are mainly facilitated by advances in sequencing technologies that enabled the study of microorganisms in a high-throughput manner. At the same time, the sequencing technologies posed challenges on the storage, computational processing and analysis of high-throughput datasets.
In the scope of this thesis, methods were designed and developed that allow the interpretation of metagenome and metatranscriptome data in terms of taxonomic and functional information hidden in natural microbial communities.

At first, the system MetaSAMS has been designed, developed and applied, which facilitates the automated storage, processing and analysis of whole metagenome shotgun datasets. MetaSAMS is accessible over a web-based user interface, which supplies the functional and taxonomic annotations for specific metagenome projects in graphical and tabular representations. Furthermore, the pipeline AMPLA for the analysis of the phylogenetic marker gene encoding 16S rRNA was designed and implemented, which generates an elaborate taxonomic profile of an underlying community. The workflow consists of several consecutive steps, namely the processing, clustering and taxonomic characterization of the data. Finally, the metatranscriptome pipeline MeTra was designed and implemented, which captures central RNA types for the taxonomic and functional profiling of the microorganisms in a community.
This thesis demonstrates the functionalities of the three pipelines on respective datasets obtained from a biogas plant. Knowledge of the microorganisms residing in a biogas fermenter is highly important, as biogas is a renewable and environmentally-friendly energy source. Analyses of the metagenome deduced in MetaSAMS confirmed previous findings that Firmicutes and Euryarchaeota dominate the biogas-producing community. Moreover, analyses of 16S rRNA gene sequences provided detailed insights into the diversity of species and highlighted that still the origin of some sequences is not well described, which is due to the absence of appropriate reference sequences in databases. The metatranscriptome pipeline unveiled that the most abundant species dominating the community also contributed the majority of the transcripts. The analysis shed light on the central processes of the anaerobic biogas digestion and the associated bacteria.
Finally, a method for the discovery of industrially relevant enzymes was designed. The method was applied for the identfication of novel laccase genes in  metagenomes obtained from marine habitats. Laccases are important in many industrial processes. Therefore, novel laccases with improved functionalities are required. The analysis demonstrated that laccases are widely distributed in bacterial species. Moreover, only 34% of metagenome sequences encoding fragments of putative laccases could be assigned to a genus indicating potentially novel enzymes.

The complete thesis is available online.