German Conference on Bioinformatics

https://dl.gi.de/handle/20.500.12116/21203

Auflistung nach:

1 - 10 von 156

Konferenzbeitrag
2D projections of RNA folding landscapes
(German conference on bioinformatics 2009, 2009) Lorenz, Rony; Flamm, Christoph; Hofacker, Ivo L.
The analysis of RNA folding landscapes yields insights into the kinetic folding behavior not available from classical structure prediction methods. This is especially important for multi-stable RNAs whose function is related to structural changes, as in the case of riboswitches. However, exact methods such as barrier tree analysis scale exponentially with sequence length. Here we present an algorithm that computes a projection of the energy landscape into two dimensions, namely the distances to two reference structures. This yields an abstraction of the high-dimensional energy landscape that can be conveniently visualized, and can serve as the basis for estimating energy barriers and refolding pathways. With an asymptotic time complexity of O(n7) the algorithm is computationally demanding. However, by exploiting the sparsity of the dynamic programming matrices and parallelization for multi-core processors, our implementation is practical for sequences of up to 400 nt, which includes most RNAs of biological interest.
Konferenzbeitrag
Ab initio prediction of molecular fragments from tandem mass spectrometry data
(German Conference on Bioinformatics, 2006) Heinonen, Markus; Rantanen, Ari; Mielikäinen, Taneli; Pitkänen, Esa; Kokkonen, Juha; Rousu, Juho
Mass spectrometry is one of the key enabling measurement technologies for systems biology, due to its ability to quantify molecules in small concentrations. Tandem mass spectrometers tackle the main shortcoming of mass spectrometry, the fact that molecules with an equal mass-to-charge ratio are not separated. In tandem mass spectrometer molecules can be fragmented and the intensities of these fragments measured as well. However, this creates a need for methods for identifying the generated fragments. In this paper, we introduce a novel combinatorial approach for predicting the structure of molecular fragments that first enumerates all possible fragment candidates and then ranks them according the cost of cleaving a fragment from a molecule. Unlike many existing methods, our method does not rely on hand-coded fragmentation rule databases. Our method is able to predict the correct fragmentation of small-to-medium sized molecules with high accuracy.
Konferenzbeitrag
Aligning protein structures using distance matrices and combinatorial optimization
(German conference on bioinformatics 2009, 2009) Wohlers, Inken; Petzold, Lars; Domingues, Francisco S.; Klau, Gunnar W.
Structural alignments of proteins are used to identify structural similarities. These similarities can indicate homology or a common or similar function. Several, mostly heuristic methods are available to compute structural alignments. In this paper, we present a novel algorithm that uses methods from combinatorial optimization to compute provably optimal structural alignments of sparse protein distance matrices. Our algorithm extends an elegant integer linear programming approach proposed by Caprara et al. for the alignment of protein contact maps. We consider two different types of distance matrices with distances either between Cα atoms or between the two closest atoms of each residue. Via a comprehensive parameter optimization on HOMSTRAD alignments, we determine a scoring function for aligned pairs of distances. We introduce a negative score for non-structural, purely sequence-based parts of the alignment as a means to adjust the locality of the resulting structural alignments. Our approach is implemented in a freely available software tool named PAUL (Protein structural Alignment Using Lagrangian relaxation). On the challenging SISY data set of 130 reference alignments we compare PAUL to six state-of-the-art structural alignment algorithms, DALI, MATRAS, FATCAT, SHEBA, CA, and CE. Here, PAUL reaches the highest average and median alignment accuracies of all methods and is the most accurate method for more than 30% of the alignments. PAUL is thus a competitive tool for pairwise high-quality structural alignment.
Konferenzbeitrag
Annotation-based distance measures for patient subgroup discovery in clinical microarray studies
(German Conference on Bioinformatics, 2006) Lottaz, Claudio; Toedling, Joern; Spang, Rainer
Background: Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering, however, can also stratify patients by similarity of their gene expression profiles, thereby defining novel disease entities based on molecular characteristics. Several distance-based cluster algorithms have been suggested, but little attention has been given to the choice of the distance measure between patients. Even with the Euclidean metric, including and excluding genes from the analysis leads to different distances between the same objects, and consequently different clustering results. Methodology: We describe a novel clustering algorithm, in which gene selection is used to derive biologically meaningful clusterings of samples. Our method combines expression data and functional annotation data. According to gene annotations, candidate gene sets with specific functional characterizations are generated. Each set defines a different distance measure between patients, and consequently different clusterings. These clusterings are filtered using a novel resampling based significance measure. Significant clusterings are reported together with the underlying gene sets and their functional definition. Conclusions: Our method reports clusterings defined by biologically focused sets of genes. In annotation driven clusterings, we have recovered clinically relevant patient subgroups through biologically plausible sets of genes, as well as novel subgroupings. We conjecture that our method has the potential to reveal so far unknown, clinically relevant classes of patients in an unsupervised manner.
Konferenzbeitrag
An application of latent topic document analysis to large-scale proteomics databases
(German conference on bioinformatics – GCB 2007, 2007) Klie, Sebastian; Martens, Lennart; Vizcaino, Juan Antonio; Cote, Richard; Jones, Phil; Apweiler, Rolf; Hinneburg, Alexander; Hermjakob, Henning
Since the advent of public data repositories for proteomics data, readily accessible results from high-throughput experiments have been accumulating steadily. Several large-scale projects in particular have contributed substantially to the amount of identifications available to the community. Despite the considerable body of information amassed, very few successful analysis have been performed and published on this data, levelling off the ultimate value of these projects far below their potential. In order to illustrate that these repositories should be considered sources of detailed knowledge instead of data graveyards, we here present a novel way of analyzing the information contained in proteomics experiments with a ’latent semantic analysis’. We apply this information retrieval approach to the peptide identification data contributed by the Plasma Proteome Project. Interestingly, this analysis is able to overcome the fundamental difficulties of analyzing such divergent and heterogeneous data emerging from large scale proteomics studies employing a vast spec- trum of different sample treatment and mass-spectrometry technologies. Moreover, it yields several concrete recommendations for optimizing pro- teomics project planning as well as the choice of technologies used in the experiments. It is clear from these results that the analysis of large bodies of publicly available proteomics data holds great promise and is currently underexploited.
Konferenzbeitrag
Are we overestimating the number of cell-cycling genes? The impact of background models for time series data
(German conference on bioinformatics – GCB 2007, 2007) Futschik, Matthias E.; Herzel, Hanspeter
Periodic processes play fundamental roles in organisms. Prominent examples are the cell cycle and the circadian clock. Microarray array technology has enabled us to screen complete sets of transcripts for possible association with such fundamental periodic processes on a system-wide level. Frequently, quite a large number of genes has been detected as periodically expressed. However, the small overlap of identified genes between different studies has shaded considerable doubts about the reliability of the detected periodic expression. In this study, we show that a major reason for the lacking agreement is the use of an inadequate background model for the determination of significance. We demonstrate that the choice of background model has considerable impact on the statistical significance of periodic expression. For illustration, we reanalyzed two microarray studies of the yeast cell cycle. Our evaluation strongly indicates that the results of previous analyses might have been overoptimistic and that the use of more suitable background model promises to give more realistic results.
Konferenzbeitrag
Automated bond order assignment as an optimization problem
(German conference on bioinformatics 2009, 2009) Dehof, Anna Katharina; Rurainski, Alexander; Lenhof, Hans -Peter; Hildebrandt, Andreas
Numerous applications in Computational Biology process molecular structures and hence require not only reliable atomic cordinates, but also correct bond order information. Regrettably, this information is not always provided in molecular databases like the Cambridge Structural Database or the Protein Data Bank. Very different strategies have been applied to derive bond order information, most of them relying on the correctness of the atom coordinates. We extended a different ansatz proposed by Wang et al. that assigns heuristic molecular penalty scores solely based on connectivity information and tries to heuristically approximate its optimum. In this work, we present two efficient and exact solvers for the problem replacing the heuristic approximation scheme of the original approach: an ILP formulation and an A* approach. Both are integrated into the upcoming version of the Biochemical Algorithms Library BALL and have been successfully validated on the MMFF94 validation suite.
Konferenzbeitrag
Blockclust: efficient clustering and classification of non-coding rnas from short Read RNA-seq profiles
(German conference on bioinformatics 2014, 2014) Videm, Pavankumar; Rose, Dominic; Costa, Fabrizio; Backofen, Rolf
Sequence and secondary structure analysis can be used to assign putative functions to non-coding RNAs. However sequence information is changed by post-transcriptional modifications and secondary structure is only a proxy for the true 3D conformation of the RNA polymer. In order to tackle these issues we can extract a different type of description using the pattern of processing that can be observed through the traces left in small RNA-seq reads data. To obtain an efficient and scalable procedure, we propose to encode expression profiles in discrete structures, and process them using fast graph-kernel techniques.
Konferenzbeitrag
CASOP GS: computing intervention strategies targeted at production improvement in genome-scale metabolic networks
(German Conference on Bioinformatics 2010, 2010) Bohl, Katrin; Figueiredo, Luís F. de; Hädicke, Oliver; Klamt, Steffen; Kost, Christian; Schuster, Stefan; Kaleta, Christoph
Metabolic engineering aims to improve the production of desired biochemicals and proteins in organisms and therefore, plays a central role in Biotechnology. However, the design of overproducing strains is not straightforward due to the complexity of metabolic and regulatory networks. Thus, theoretical tools supporting the design of such strains have been developed. One particular method, CASOP, uses the set of elementary flux modes (EFMs) of a reaction network to propose strategies for the overproduction of a target compound. The advantage of CASOP over other approaches is that it does not consider a single specific flux distribution within the network but the whole set of possible flux distributions represented by the EFMs of the network. Moreover, its application results not only in the identification of candidate loci that can be knocked out, but additionally proposes overexpression candidates. However, the utilization of CASOP was restricted to small and medium scale metabolic networks so far, since the entire set of EFMs cannot be enumerated in such networks. This work presents an approach that allows to use CASOP even in genome-scale networks. This approach is based on an estimation of the score utilized in CASOP through a sample of EFMs within a genome-scale network. Using EFMs from the genome-scale metabolic network gives a more reliable picture of the metabolic capabilities of an organism required for the design of overproducing strains. We applied our new method to identify strategies for the overproduction of succinate and histidine in Escherichia coli. The succinate case study, in particular, proposes engineering targets which resemble known strategies already applied in E. coli. Availability: Source code and an executable are available upon request.
Konferenzbeitrag
Characterization of protein interactions
(German Conference on Bioinformatics, 2006) Küffner, Robert; Duchrow, Timo; Fundel, Kartin; Zimmer, Ralf
Available information on molecular interactions between proteins is currently incomplete with regard to detail and comprehensiveness. Although a number of repositories are already devoted to capture interaction data, only a small subset of the currently known interactions can be obtained that way. Besides further experiments, knowledge on interactions can only be complemented by applying text extraction methods to the literature. Currently, information to further characterize individual interactions can not be provided by interaction extraction approaches and is virtually nonexistent in repositories. We present an approach to not only confirm extracted interactions but also to characterize interactions with regard to four attributes such as activation vs. inhibition and protein-protein vs. protein-gene interactions. Here, training corpora with positional annotation of interacting proteins are required. As suitable corpora are rare, we propose an extensible curation protocol to conveniently characterize interactions by manual annotation of sentences so that machine learning approaches can be applied subsequently. We derived a training set by manually reading and annotating 269 sentences for 1090 candidate interactions; 439 of these are valid interactions, predicted via support vector machines at a precision of 83% and a recall of 87%. The prediction of interaction attributes from individual sentences on average yielded a precision of about 85% and a recall of 73%.

Auflistung German Conference on Bioinformatics nach Titel

Treffer pro Seite

Sortieroptionen