PhD thesis: EMMA

EMMA2 - A MAGE-Compliant System for the Analysis of Microarray Data in Integrated Functional Genomics (Dr. Michael Dondrup)

Since the acquisition of the first complete genomic sequences, many advances have been made in the field of functional genomics. High-throughput methods have been developed to study gene-expression and metabolic pathways. Microarrays have become a highly popular method to measure the transcriptional regulation in functional genomics. Microarrays allow to measure the expression levels of thousands of genes in parallel, but the measured datasets contain a certain level of technical and biological variation. Many methods for the analysis of large datasets from error-prone mircorarray experiments have been developed, including normalization, statistical inference, and machine learning. Attempts to standardize the annotation of microarray data, such as Minimum Information About a Microarray Experiment (MIAME), the MAGEML format for data interchange, and ontologies, have been made.
The existing software systems for microarray data analysis have only rudimentary implementations of the mentioned standards and are hard to extend. The EMMA2 software has been designed to resolve these shortcomings.

Its specification includes full support of MIAME and MAGE-ML as well as the support of ontologies. Integration of genomic annotation data and other internal and external data-sources has been an important requirement. The specification, design, and implementation of EMMA2 follows an objectoriented development paradigm. This is reflected in the use of object-oriented modeling tools such as the Unified Modeling Language (UML). During the design phase, the MAGE object-model was taken as the core of the application to model microarray data and their annotations. Additional models were needed to complement MAGE by classes for access control and data analysis. The software has been implemented using a code-generation approach. The backend code and database definitions have been derived from the joint object model defined in UML. EMMA2 can be used via a web-interface and contains a Laboratory Information Management System (LIMS) component. A flexible PlugIn system for data analysis, which includes methods for preprocessing, normalization, statistical tests, cluster analysis, and visualization, has been added. Integration of other functional genomics data sources has been implemented by using the integration layer BRIDGE and also by the use of web-services. Data integration allows for several new visualization components using metabolic pathway data and functional categories.
The system was successfully applied in eight national and international projects. More that 2700 microarrays have been processed using EMMA2. Furthermore, an evaluation study has been carried out to compare the performance of inference tests for microarrays. As a result of this study, two methods (SAM an CyberT) can be recommended for experiments with very few replicates, while for larger numbers of replicates the t-test performs comparable.

The complete thesis is available online.

The most important aspects of this thesis have been published in the following papers:

Dondrup, M., S. Albaum, T. Griebel, K. Henckel, S. Jünemann, T. Kahlke, C.K. Kleindt, H. Kuester, B. Linke, D. Mertens, V. Mittard-Runte, H. Neuweger, K.J. Runte, A. Tauch, F. Tille, A. Pühler, & A. Goesmann. 2009. “EMMA 2-A MAGE-compliant system for the collaborative analysis and integration of microarray data”.
BMC Bioinformatics(10:50).
PUB | PDF | PubMed

Dondrup, M., A.T. Hueser, D. Mertens, & A. Goesmann. 2009. “An evaluation framework for statistical tests on microarray data”.
JOURNAL OF BIOTECHNOLOGY, 140(1-2), 18 - 26.
PUB | PubMed