- KonferenzbeitragData processing effects on the interpretation of microarray gene expression experiments(German Conference on Bioinformatics 2005 (GCB 2005), 2005) Fundel, Katrin; Küffner, Robert; Aigner, Thomas; Zimmer, Ralf; Torda, Andrew; Kurtz, Stefan; Rarey, MatthiasMotivation: Microarray gene expression data is collected at an increasing pace and numerous methods and tools exist for analyzing this kind of data. The aim of this study is to evaluate the effect of the basic statistical processing steps of microarray data on the final outcome for gene expression analysis; these effects are most problematic for one-channel cDNA measurements, but also affect other types of microarrays, especially when dealing with grouped samples. It is crucial to determine an appropriate combination of individual processing steps for a given dataset in order to improve the validity and reliability of expression data analysis. Results: We analyzed a large gene expression data set obtained from a one-channel cDNA microarray experiment conducted on 83 human samples that have been classified into four Osteoarthritis related groups. We compared different normalization methods regarding the effect on the identification of differentially expressed genes. Furthermore, we compared different methods for combining spot p-values into gene p-values, and propose Stouffer's method for this purpose. We developed several quality and robustness measures which allow to estimate the amount of errors made in the statistical data preparation. Conclusion: The apparently straight forward steps of gene expression data analysis, i.e. normalization and identification of differentially expressed genes, can be accomplished by numerous different methods. We analyzed multiple combinations of a number of methods to demonstrate the possible effects and therefore the importance of the single decisions taken during data processing. An overview of these effects is essential for the biological interpretation of gene expression measurements. We give guidelines and tools for evaluating methods for normalization, spot combination and detection of differentially regulated genes.
- KonferenzbeitragEfficient mapping of large cDNA/EST databases to genomes: A comparison of two different strategies(German Conference on Bioinformatics 2005 (GCB 2005), 2005) Wawra, Christian; Abouelhoda, Mohamed I.; Ohlebusch, Enno; Torda, Andrew; Kurtz, Stefan; Rarey, MatthiasThis paper presents a comparison of two strategies for cDNA/EST mapping: The seed-and-extend strategy and the fragment-chaining strategy. We derive theoretical results on the statistics of fragments of type maximal exact match. Moreover, we present efficient fragment-chaining algorithms that are simpler than previous ones. In experiments, we compared our implementation of the fragment-chaining strategy with the seed-and-extend strategy implemented in the software tool BLAT.
- KonferenzbeitragComposite module analyst: A fitness-based tool for prediction of transcription regulation(German Conference on Bioinformatics 2005 (GCB 2005), 2005) Kel, Alexander; Konovalova, Tatiana; Waleev, Tagir; Cheremushkin, Evgeny; Kel-Margoulis, Olga; Wingender, Edgar; Torda, Andrew; Kurtz, Stefan; Rarey, MatthiasFunctionally related genes involved in the same molecular-genetic, biochemical, or physiological process are often regulated coordinately Such regulation is provided by precisely organized binding of a multiplicity of special proteins (transcription factors) to their target sites (cis-elements) in regulatory regions of genes. Cis-element combinations provide a structural basis for the generation of unique patterns of gene expression. Here we present a new approach for defining promoter models based on composition of transcription factor binding sites and their pairs. We utilize a multicomponent fitness function for selection of that promoter model fitting best to the observed gene expression profile. We demonstrate examples of successful application of the fitness function with the help of a genetic algorithm for the analysis of functionally related or co-expressed genes as well as testing on simulated data.
- KonferenzbeitragUsing N-terminal targeting sequences, amino acid composition, and sequence motifs for predicting protein subcellular localization(German Conference on Bioinformatics 2005 (GCB 2005), 2005) Höglund, Annette; Dönnes, Pierre; Blum, Torsten; Adolph, Hans-Werner; Kohlbacher, Oliver; Torda, Andrew; Kurtz, Stefan; Rarey, MatthiasFunctional annotation of unknown proteins is a major goal in proteomics. A key step in this annotation process is the definition of a protein's subcellular localization. As a consequence, numerous prediction techniques for localization have been developed over the years. These methods typically focus on a single underlying biological aspect or predict a subset of all possible subcellular localizations. There is a clear need for new methods that utilize and represent available protein specific biological knowledge from several sources, in order to improve accuracy and localization coverage for a wide range of organisms. Here we present a novel Support Vector Machine (SVM)-based approach for predicting protein subcellular localization, which integrates information about N-terminal targeting sequences, amino acid composition, and protein sequence motifs. An important step is taken towards emulating the protein sorting process by capturing and bringing together biologically relevant information. Our novel approach has been used to develop two new prediction methods, TargetLoc and MultiLoc. TargetLoc is restricted to analysis of proteins containing N-terminal targeting sequences, whereas MultiLoc covers all major eukaryotic subcellular localizations for animal, plant, and fungal proteins. Compared to similar methods, TargetLoc performs better than these. MultiLoc performs considerably better than comparable prediction methods predicting all major eukaryotic subcellular localizations, and shows better or comparable results to methods that are specialized on fewer localizations or for one organism.
- KonferenzbeitragInvited talk: SARS molecular epidemiology and SARS-CoV evolution: Combating an emerging infectious disease with the regimen of genomics and bioinformatics(German Conference on Bioinformatics 2005 (GCB 2005), 2005) Zhao, Guo-Ping; Torda, Andrew; Kurtz, Stefan; Rarey, Matthias
- KonferenzbeitragMemory efficient folding algorithms for circular RNA secondary structures(German Conference on Bioinformatics 2005 (GCB 2005), 2005) Hofacker, Ivo L.; Stadler, Peter F.; Torda, Andrew; Kurtz, Stefan; Rarey, MatthiasA small class of RNA molecules, in particular the tiny genomes of viroids, are circular. Yet most structure prediction algorithms handle only linear RNAs. The most straightforward approach is to compute circular structures from "internal" and "external" substructures separated by a base pair. This is incompatible, however, with the memory-saving approach of the Vienna RNA Package which builds a linear RNA structure from shorter (internal) structures only. Here we describe how circular secondary structures can be obtained without additional memory requirements as a kind of "post-processing" of the linear structures.
- KonferenzbeitragInferring regulatory systems with noisy pathway information(German Conference on Bioinformatics 2005 (GCB 2005), 2005) Spieth, Christian; Streichert, Felix; Speern, Nora; Zell, Andreas; Torda, Andrew; Kurtz, Stefan; Rarey, MatthiasWith increasing number of pathways available in public databases, the process of inferring gene regulatory networks becomes more and more feasible. The major problem of most of these pathways is that they are very often faulty or describe only parts of a regulatory system due to limitations of the experimental techniques or due to a focus specifically only on a subnetwork of a larger process. To address this issue, we propose a new multi-objective evolutionary algorithm in this paper, which infers gene regulatory systems from experimental microarray data by incorporating known pathways from publicly available databases. These pathways are used as an initial template for creating suitable models of the regulatory network and are then refined by the algorithm. With this approach, we were able to infer regulatory systems with incorporation of pathway information that is incomplete or even faulty.
- KonferenzbeitragInvited talk: A large-scale application of comparative genomics for biodefense(German Conference on Bioinformatics 2005 (GCB 2005), 2005) Slezak, Tom; Torda, Andrew; Kurtz, Stefan; Rarey, Matthias
- Editiertes BuchGerman Conference on Bioinformatics 2005 (GCB 2005)(2005) Torda, Andrew; Kurtz, Stefan; Rarey, Matthias
- KonferenzbeitragPlanning isotopomer measurements for estimation of metabolic fluxes(German Conference on Bioinformatics 2005 (GCB 2005), 2005) Rantanen, Ari; Mielikäinen, Taneli; Rousu, Juho; Ukkonen, Esko; Torda, Andrew; Kurtz, Stefan; Rarey, MatthiasFlux estimation by using isotopomer information of metabolites is currently the only method that can give quantitative estimates of the activity of metabolic pathways. However, the measurement of isotopomer distributions of intermediate metabolites is costly and tedious with current technologies. In this paper we study the question of finding the smallest subset of metabolites to measure that ensure an adequate level of the isotopomer information. We study the computational complexity of this optimization problem in the case of the so-called positional enrichment data, give exact and fast heuristic solutions and evaluate empirically the efficacy of the proposed methods.