- KonferenzbeitragDocking protein domains using a contact map representation(German Conference on Bioinformatics, 2006) Lise, Stefano; Jones, David; Huson, Daniel; Kohlbacher, Oliver; Lupas, Andrei; Nieselt, Kay; Zell, Andreas
- KonferenzbeitragCharacterization of protein interactions(German Conference on Bioinformatics, 2006) Küffner, Robert; Duchrow, Timo; Fundel, Kartin; Zimmer, Ralf; Huson, Daniel; Kohlbacher, Oliver; Lupas, Andrei; Nieselt, Kay; Zell, AndreasAvailable information on molecular interactions between proteins is currently incomplete with regard to detail and comprehensiveness. Although a number of repositories are already devoted to capture interaction data, only a small subset of the currently known interactions can be obtained that way. Besides further experiments, knowledge on interactions can only be complemented by applying text extraction methods to the literature. Currently, information to further characterize individual interactions can not be provided by interaction extraction approaches and is virtually nonexistent in repositories. We present an approach to not only confirm extracted interactions but also to characterize interactions with regard to four attributes such as activation vs. inhibition and protein-protein vs. protein-gene interactions. Here, training corpora with positional annotation of interacting proteins are required. As suitable corpora are rare, we propose an extensible curation protocol to conveniently characterize interactions by manual annotation of sentences so that machine learning approaches can be applied subsequently. We derived a training set by manually reading and annotating 269 sentences for 1090 candidate interactions; 439 of these are valid interactions, predicted via support vector machines at a precision of 83% and a recall of 87%. The prediction of interaction attributes from individual sentences on average yielded a precision of about 85% and a recall of 73%.
- KonferenzbeitragAnnotation-based distance measures for patient subgroup discovery in clinical microarray studies(German Conference on Bioinformatics, 2006) Lottaz, Claudio; Toedling, Joern; Spang, Rainer; Huson, Daniel; Kohlbacher, Oliver; Lupas, Andrei; Nieselt, Kay; Zell, AndreasBackground: Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering, however, can also stratify patients by similarity of their gene expression profiles, thereby defining novel disease entities based on molecular characteristics. Several distance-based cluster algorithms have been suggested, but little attention has been given to the choice of the distance measure between patients. Even with the Euclidean metric, including and excluding genes from the analysis leads to different distances between the same objects, and consequently different clustering results. Methodology: We describe a novel clustering algorithm, in which gene selection is used to derive biologically meaningful clusterings of samples. Our method combines expression data and functional annotation data. According to gene annotations, candidate gene sets with specific functional characterizations are generated. Each set defines a different distance measure between patients, and consequently different clusterings. These clusterings are filtered using a novel resampling based significance measure. Significant clusterings are reported together with the underlying gene sets and their functional definition. Conclusions: Our method reports clusterings defined by biologically focused sets of genes. In annotation driven clusterings, we have recovered clinically relevant patient subgroups through biologically plausible sets of genes, as well as novel subgroupings. We conjecture that our method has the potential to reveal so far unknown, clinically relevant classes of patients in an unsupervised manner.
- KonferenzbeitragClassifying permanent and transient protein interactions(German Conference on Bioinformatics, 2006) Kottha, Samatha; Schroeder, Michael; Huson, Daniel; Kohlbacher, Oliver; Lupas, Andrei; Nieselt, Kay; Zell, AndreasCurrently much research is devoted to the characterization and classification of transient and permanent protein-protein interactions. From the literature, we take data sets consisting of 161 permanent (65 homodimers, 96 heterodimers) and 242 transient interactions. We collect over 300 interface attributes relating to size, physiochemical properties, interaction propensities, and secondary structure elements. Our major discovery is a surprisingly simple relationship not yet reported in the literature: interactions with the same molecular weight or very big interfaces are per- manent and otherwise transient. We train a support vector machine and achieve the following results: Molecular weight difference alone achieves 80% success rate. To- gether with the size of the buried surface the success rate improves to 89%. Adding water at the interface and the number of hydrophobic contacts we achieve a success rate of 97%.
- KonferenzbeitragAb initio prediction of molecular fragments from tandem mass spectrometry data(German Conference on Bioinformatics, 2006) Heinonen, Markus; Rantanen, Ari; Mielikäinen, Taneli; Pitkänen, Esa; Kokkonen, Juha; Rousu, Juho; Huson, Daniel; Kohlbacher, Oliver; Lupas, Andrei; Nieselt, Kay; Zell, AndreasMass spectrometry is one of the key enabling measurement technologies for systems biology, due to its ability to quantify molecules in small concentrations. Tandem mass spectrometers tackle the main shortcoming of mass spectrometry, the fact that molecules with an equal mass-to-charge ratio are not separated. In tandem mass spectrometer molecules can be fragmented and the intensities of these fragments measured as well. However, this creates a need for methods for identifying the generated fragments. In this paper, we introduce a novel combinatorial approach for predicting the structure of molecular fragments that first enumerates all possible fragment candidates and then ranks them according the cost of cleaving a fragment from a molecule. Unlike many existing methods, our method does not rely on hand-coded fragmentation rule databases. Our method is able to predict the correct fragmentation of small-to-medium sized molecules with high accuracy.
- KonferenzbeitragComparison of human protein-protein interaction maps(German Conference on Bioinformatics, 2006) Futschik, Matthias E.; Chaurasia, Gautam; Wanker, Erich; Herzel, Hanspeter; Huson, Daniel; Kohlbacher, Oliver; Lupas, Andrei; Nieselt, Kay; Zell, AndreasLarge-scale mappings of protein-protein interactions have started to give us new views of the complex molecular mechanisms inside a cell. After initial projects to systematically map protein interactions in model organisms such as yeast, worm and fly, researchers have begun to focus on the mapping of the human interactome. To tackle this enormous challenge, different approaches have been proposed and pursued. While several large-scale human protein interaction maps have recently been published, their quality remains to be critically assessed. We present here a first comparative analysis of eight currently available large-scale maps with a total of over 10000 unique proteins and 57000 interactions included. They are based either on literature search, orthology or by yeast-two-hybrid assays. Comparison reveals only a small, but statistically significant overlap. More importantly, our analysis gives clear indications that all interaction maps suffer under selection and detection biases. These results have to be taken into account for future assembly of the human interactome.
- KonferenzbeitragMPI-ClustDB: A fast string matching strategy utilizing parallel computing(German Conference on Bioinformatics, 2006) Hamborg, Thomas; Kleffe, Jürgen; Huson, Daniel; Kohlbacher, Oliver; Lupas, Andrei; Nieselt, Kay; Zell, AndreasClustDB is a tool for the identification of perfect matches in large sets of sequences. It is faster and can handle at least 8 times more data than VMATCH, the most memory efficient exact program currently available. Still ClustDB needs about four hours to compare all Human ESTs. We therefore present a distributed and parallel implementation of ClustDB to reduce the execution time. It uses a message-passing library called MPI and runs on distributed workstation clusters with significant runtime savings. MPI-ClustDB is written in ANSI C and freely available on request from the authors.
- KonferenzbeitragMicroarray layout as quadratic assignment problem(German Conference on Bioinformatics, 2006) Carvalho Jr., Sérgio A. de; Rahmann, Sven; Huson, Daniel; Kohlbacher, Oliver; Lupas, Andrei; Nieselt, Kay; Zell, AndreasThe production of commercial DNA microarrays is based on a light-directed chemical synthesis driven by a set of masks or micromirror arrays. Because of the natural properties of light and the ever shrinking feature sizes, the arrangement of the probes on the chip and the order in which their nucleotides are synthesized play an important role on the quality of the final product. We propose a new model called conflict index for evaluating microarray layouts, and we show that the probe placement problem is an instance of the quadratic assignment problem (QAP), which opens up the way for using QAP heuristics. We use an existing heuristic called GRASP to design the layout of small artificial chips with promising results. We compare this approach with the best known algorithm and describe how it can be combined with other existing algorithms to design the latest million-probe microarrays.
- Editiertes BuchGerman Conference on Bioinformatics(2006) Huson, Daniel; Kohlbacher, Oliver; Lupas, Andrei; Nieselt, Kay; Zell, Andreas
- KonferenzbeitragA novel, comprehensive method to detect and predict protein-protein interactions applied to the study of vesicular trafficking(German Conference on Bioinformatics, 2006) Winter, Christof; Baust, Thorsten; Hoflack, Bernard; Schroeder, Michael; Huson, Daniel; Kohlbacher, Oliver; Lupas, Andrei; Nieselt, Kay; Zell, AndreasMotivation. Computational methods to predict protein-protein interactions are of great need. They can help to formulate hypotheses, guide experimental research and serve as additional measures to assess the quality of data obtained in high-throughput interaction experiments. Here, we describe a fully automated threestep procedure to predict and confirm protein-protein interactions. By maximising the information from text mining of the biomedical literature, data from interaction databases, and from available protein structures, we aim at generating a comprehensive picture of known and novel potential interactions between a given set of proteins. Results. A recent proteomics assay to identify the protein machinery involved in vesicular trafficking between the biosynthetic and the endosomal compartments revealed 35 proteins that were found as part of membrane coats on liposomes. When applying our method to this data set, we are able to reconstruct most of the interactions known to the molecular biologist. In addition, we predict novel interactions, among these potential linkers of the AP-1 and the Arp2/3 complex to membrane-bound proteins as well as a potential GTPase-GTPase effector interaction. Conclusions. Our method allows for a comprehensive network reconstruction that can assist the molecular biologist. Predicted interactions are backed up by structural or experimental evidence and can be inferred at varying levels of confidence. Our method pinpoints existing key interactions and can facilitate the generation of hypotheses.