CeBiTec Colloquium: Prof. Dr. Sebastian Böcker, Friedrich-Schiller-Universität Jena

2008/01/21, 5:15 p.m.

CeBiTec laboratory building, room G2-104

Cluster Editing and other hard problems in bioinformatics

Abstract :

Clustering data still represents a key step of numerous biological and medical problems. Many clustering algorithms utilize the graph-theoretic Cluster Editing formulation where a similarity matrix between objects is
transformed into a graph. Our goal is to make the fewest changes to the edge set of this graph such that the new graph is a disjoint union of cliques (clusters). Unfortunately, the Cluster Editing problem is NP-complete.

Recently, parameterized algorithms have been introduced: such algorithms allow us to find exact solutions for hard problems in provable worst-case running time. In my talk, I present a surprisingly simple parametrized algorithm for Cluster Editing that is, at the same time, the currently fastest for this problem. I also present data reduction rules that significantly cut down the size of an instance in polynomial time, but still guarantee to find the optimal solution. I will compare the performance of this approach and several well-known clustering algorithms on gene expression data for tissue classification: The new approach, while being computationally expensive, often outperforms all other methods with respect to clustering quality.

Finally, I will present other hard bioinformatics problems where parametrized algorithms can be used to find exact solutions.