CeBiTec Colloquium
Monday, November 2nd 2015, 17 c.t.
G2-104, CeBiTec Building
Dr. Johannes Södling
Research Group Quantitative and Computational Biology Max Planck Institute for Biophysical Chemistry Göttingen, Germany
Tools for fast protein sequence searches and for the de-novo discovery of improved models of regulatory motifs
The talk will cover results from three as yet unpublished projects. It will start with an introduction into protein sequence searching and to our software package HH-suite for very sensitive remote homology detection. I will show how we can significanlty boost the sensitivity of searches in HH-suite using a homology-enriched profile-HMM database.
Second, I will present our new software package MMseqs (Many-against-Many sequence searching) for very fast batch protein sequence searches and clustering of huge protein sequence data sets, such as UniProt or sets of predicted open reading frames from large metagenomics experiments. MMseqs2 achieves a sensitivity comparable to protein BLAST (blastpgp) at 400x its speed.
Third, I present GIMMEmotif, a method for the de-novo discovery of regulatory motifs in nucleotide sequences that learns correlations between nucleotides. The method automatically learns the dependencies up to the optimum order (i.e. k-mer length) supported by the data, without risk of overtraining. GIMMEmotif improves on PWMs by ~40% in AUC on ~400 ENCODE ChIP-seq data sets and achieves similar improvements in detecting core promoter sequences, poly(A) sites, RNAP pause sites and binding sites for ~20 PAR-CLIPped RNA binding factors.
Host: Dr. Alexander Sczyrba