P235 - GCB 2014 - German Conference on Bioinformatics 2014
Autor*innen mit den meisten Dokumenten
Neueste Veröffentlichungen
- KonferenzbeitragFlexible database-assisted graphical representation of metabolic networks for model comparison and the display of experimental data(German conference on bioinformatics 2014, 2014) Tillack, Jana; Bende, Melanie; Rother, Michael; Scheer, Maurice; Ulas, Susanne; Schomburg, DietmarIntracellular processes in living organisms are described by metabolic models. A visualization of metabolic models assists interpretation of data or analyzing results. We introduce the visualization tool DaViMM creating personalized graphical representations of metabolic networks for model comparison or the display of measurements or analyzing results. The tool is coupled to a relational database containing graphical network properties like coordinates, which ensure an intuitive network layout. A combination of DaViMM, the graphical database, and available biochemical databases enables an automated creation of metabolic network maps. The flexibility of this combination is demonstrated with some application examples.
- KonferenzbeitragA general approach for discriminative de novo motif discovery from high-throughput data(German conference on bioinformatics 2014, 2014) Grau, Jan; Posch, Stefan; Grosse, Ivo; Keilwagen, JensHigh-throughput techniques like ChIP-seq, ChIP-exo, and protein binding microarrays (PBMs) demand for novel de novo motif discovery approaches that focus on accuracy and runtime on large data sets. While specialized algorithms have been designed for discovering motifs in in-vivo ChIP-seq/ChIP-exo or in in-vitro PBM data, none of these works equally well for all these high-throughput techniques. Here, we present Dimont, a general approach for fast and accurate de-novo motif discovery from high-throughput data, which achieves a competitive performance on both ChIP-seq and PBM data compared to recent approaches specifically designed for either technique. Hence, Dimont allows for investigating differences between in-vitro and in-vivo binding in an unbiased manner using a unified approach. For most transcription factors, Dimont discovers similar motifs from in-vivo and in-vitro data, but we also find notable exceptions. Scrutinizing the benefit of modeling dependencies between binding site positions, we find that more complex motif models often increase prediction performance and, hence, are a worthwhile field of research. Original paper: doi: 10.1093/nar/gkt831
- KonferenzbeitragLarge-scale bicluster editing(German conference on bioinformatics 2014, 2014) Sun, Peng; Guo, Jiong; Efficient, Jan BaumbachThe explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as simultaneous clustering or co-clustering, has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new approach: Bi-Force. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of similarity function. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol from a recent review paper from Eren et al. and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, Bimax, Spectral, xMOTIFS and ISA. To this end, a suite of synthetic data sets as well as nine large gene expression data sets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al.. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used data sets are publicly available at
- KonferenzbeitragCharacterizing metagenomic novelty with unexplained protein domain hits(German conference on bioinformatics 2014, 2014) Lingner, Thomas; Meinicke, PeterIn metagenomics, the discovery of functional novelty has always been pursued in a gene-centered manner. In that way, sequence-based analysis has been restricted to particular features and to a sufficient length of the sequences. We propose a statistical approach that is independent from the identification of single sequences but rather yields an overall characterization of a metagenome. Our method is based on the analysis of significant differences between the functional profile of a metagenome and its reconstruction from a combination of genomic profiles using the Taxy-Pro mixture model. Here, protein families with a large proportion of domain hits that cannot be explained by the model are interesting candidates for the exploration of metagenomic novelty. The results of three case studies indicate that our method is able to characterize metagenomic novelty in terms of the protein families that significantly contribute to unexplained domain counts. We found a good correspondence between our predictions and the discoveries in the original studies as well as specific indicators of functional novelty that have not yet been described.
- KonferenzbeitragInteractive and dynamic web-based visual exploration of high dimensional bioimages with real time clustering(German conference on bioinformatics 2014, 2014) Rathke, Magnus; Kölling, Jan; Nattkemper, Tim W.Web browsers and web applications have become common tools in bioinformatics over the past decades. Many existing web applications revolve around server-client interaction, where heavy computational tasks are often outsourced to the server and the presentation is handled on the the client-side. However more recent additions to the web browser technology embrace the capability of handling more complex operations on the client-side itself, cutting out most of the server-client interaction except for data loading. This paper contributes to the exploration of the potential of approaches to implement and speed up computational expensive tasks, like image cluster analysis, within a client-side web browser environment. The experimental results, incorporating the well known k-means algorithm which serves as a platform for various parallelization approaches, indicate the possibility to achieve real time image clustering. Especially for the available MALDI-MSI data set the results look promising. Despite good results of multithreading approaches, algorithmic approaches appear to be relevant too. Therefore advancements in accelerating the k-means algorithm itself are considered.
- KonferenzbeitragRNA-seq driven gene identification(German conference on bioinformatics 2014, 2014) Zickmann, Franziska; Lindner, Martin S.; Renard, Bernhard Y.The reliable identification of genes is a challenging and crucial part of genome research. Various methods aiming at accurate predictions have evolved that predict genes ab initio on reference sequences or evidence based with help of additional information. With high-throughput RNA-Seq data reflecting currently expressed genes, a particularly meaningful source of information has become commonly available. However, a particular challenge in including RNA-Seq data is the difficult handling of ambiguously mapped reads. Therefore we developed GIIRA, a novel gene finder that is exclusively based on RNA-Seq data and inherently includes ambiguously mapped reads. Evaluation on simulated and real data and comparison with existing methods incorporating RNA-Seq information highlight the accuracy of GIIRA in identifying the expressed genes. Further, we developed a framework to integrate GIIRA and other gene finders to obtain a verified and accurate set of gene predictions.
- KonferenzbeitragTowards accurate transcription start site prediction: a modelling approach(German conference on bioinformatics 2014, 2014) Djordjevic, MarkoPromoter prediction in bacteria is a classical bioinformatics problem, where available methods for regulatory element detection exhibit a very high number of false positives. We here argue that accurate transcription start site (TSS) prediction is a complex problem, where available methods for sequence motif discovery are not in itself well adopted for solving the problem. We here instead propose that the problem requires integration of quantitative understanding of transcription initiation with careful description of promoter sequence specificity. We review evidence for this viewpoint based on our recent work, and discuss a current progress on accurate TSS detection on the example of sigma70 transcription start sites in E. coli.
- KonferenzbeitragProtein family analysis at the domain-level(German conference on bioinformatics 2014, 2014) Terrapon, Nicolas; Moore, Andrew; Bornberg-Bauer, ErichThe analysis of protein domains has gained considerable attention over the last years. Many new insights on protein modular evolution, combined with improved domain detection, have paved the way for an integrated analysis of protein families from a domain-centric perspective. We recently released DoMosaics, a JAVA application that facilitates the interactive analysis of protein domain arrangements. DoMosaics combines guided domain annotation, a highly-customisable visualization of arrangements, and a number of analysis tools. It also integrates domain-centric algorithms such as CODD, which is used for the detection of divergent domain occurences that have escaped Pfam thresholds, as well as RADS/RAMPAGE which provides means to search for proteins with a domain arrangement similar to a given query. RADS provides an alignment of domain strings as opposed to amino-acid sequences, while RAMPAGE produces an amino-acid alignment guided by RADS results. Hence, RADS/RAMPAGE produces fast and yet accurate alignments, and associated ranking, of proteins with similar domain arrangements. Together, these tools greatly simplify the domain-centric analysis of protein function, structure and evolution.
- Editiertes Buch
- KonferenzbeitragExplaining gene responses by linear modeling(German conference on bioinformatics 2014, 2014) Poeschl, Yvonne; Grosse, Ivo; Gogol-Döring, AndreasIncreasing our knowledge about molecular processes in response to a certain treatment or infection in plants, insects, or other organisms requires the identification of the genes involved in this response. In this paper, we propose the Profile Interaction Finder (PIF) to identify such genes from gene expression data which is based on a convex linear model, and we investigate its efficacy for two applications related to stimulus response. First, we seek to identify sets of putative regulatory genes that explain the expression levels of a gene under different stimuli best. Second, we aim at identifying genes that show a specific response to a stimulus or a combination of stimuli. For both applications, we study the expression response of two Arabidopsis species to treatment with the plant hormone auxin and of Apis mellifera to pathogen infection. The proposed approach may be of general utility for analyzing expression data with a focus on genes and gene sets that explain specific stimulus response.