Comparing Organisms on the Level of Metabolism (Dr. Sebastian Oehm)
In this thesis a fully automated approach for comparative analysis of organisms on the functional level of metabolism yielding a classification of the analyzed organisms according to their individual metabolic pathway variants was developed. In contrast to gene sequence-based comparison techniques, the approach developed herein is based on the functional annotation of genes, namely metabolic reactions. Moreover, instead of comparing individual reactions one at a time, sets of reactions that are jointly involved in the same cellular process, also known as metabolic pathways, are compared.
Data on metabolic pathways were taken from the KEGG database. This includes definitions of metabolic reactions, reaction annotation data for individual organisms as well as data on organization of reactions into metabolic pathways. Metabolic pathways were modeled as directed node labeled graphs. Distance measures were developed based on the theory of edit distances on graphs. It was proven that the distance measures are metrics, and, where appropriate, correspondences between the implemented edit distance-based distance measures and already published distance measures were shown. The developed comparative analysis approach comprises the following steps. Firstly, pairwise distances are calculated between the pathway variants of a set of organisms to be analyzed. Then, organisms are clustered based on these distances using various clustering approaches which results in a dendrogram for each clustering method. Subsequently, these dendrograms are cut at a certain height and thus a classification (partitioning) of the analyzed organisms into groups is achieved. The number of groups is determined as the value for which the cophenetic correlation coefficient between the cophenetic matrix of the partitioning and the distance matrix is maximized. Finally, the differential reaction content is calculated for each pair of groups and can either be presented in a table or visualized on KEGG’s metabolic pathway maps. The entire functionality is implemented as a web-based application called Comparative Pathway Analyzer, which is publicly accessible.
Several distance measures were implemented, namely reaction-based distance measures, metabolite-based distance measures, reaction and metabolite-based distance measures, as well as distance measures that, when calculating the edit cost for the deletion or insertion of a reaction, take into account the neighboring reactions. All distance measures were evaluated against each other in order to find the one that is most adequate for the given data. The evaluation was performed on two manually designed test scenarios, since a standard of truth did not exist. Three different clustering techniques, namely average and complete linkage agglomerative clustering as well as Ward clustering, were evaluated for their suitability to group organisms based on distance data on the organisms’ pathway variants. Furthermore, as an application example, five Corynebacteria were compared against each other using the newly developed approach and the results were discussed in light of their biological relevance.
The complete thesis is available online.
Further details about the Comparative Pathway Analyzer (CPA) have been published as follows:
Oehm, S., D. Gilbert, A. Tauch, J. Stoye, & A. Goesmann. 2008. “Comparative Pathway Analyzer - a web server for comparative analysis, clustering and visualization of metabolic networks in multiple organisms”.
NUCLEIC ACIDS RESEARCH, 36 (Supplement 2: Web Server Issue 2008), W433 - W437.
PUB | PDF | PubMed