Using N-terminal targeting sequences, amino acid composition, and sequence motifs for predicting protein subcellular localization

Functional annotation of unknown proteins is a major goal in proteomics. A key step in this annotation process is the definition of a protein's subcellular localization. As a consequence, numerous prediction techniques for localization have been developed over the years. These methods typically focus on a single underlying biological aspect or predict a subset of all possible subcellular localizations. There is a clear need for new methods that utilize and represent available protein specific biological knowledge from several sources, in order to improve accuracy and localization coverage for a wide range of organisms. Here we present a novel Support Vector Machine (SVM)-based approach for predicting protein subcellular localization, which integrates information about N-terminal targeting sequences, amino acid composition, and protein sequence motifs. An important step is taken towards emulating the protein sorting process by capturing and bringing together biologically relevant information. Our novel approach has been used to develop two new prediction methods, TargetLoc and MultiLoc. TargetLoc is restricted to analysis of proteins containing N-terminal targeting sequences, whereas MultiLoc covers all major eukaryotic subcellular localizations for animal, plant, and fungal proteins. Compared to similar methods, TargetLoc performs better than these. MultiLoc performs considerably better than comparable prediction methods predicting all major eukaryotic subcellular localizations, and shows better or comparable results to methods that are specialized on fewer localizations or for one organism.

Höglund, Annette; Dönnes, Pierre; Blum, Torsten; Adolph, Hans-Werner; Kohlbacher, Oliver (2005): Using N-terminal targeting sequences, amino acid composition, and sequence motifs for predicting protein subcellular localization. German Conference on Bioinformatics 2005 (GCB 2005). Bonn: Gesellschaft für Informatik e.V.. PISSN: 1617-5468. ISBN: 3-88579-400-4. pp. 45-59. Regular Research Papers. Hamburg. 5.-7. Oktober 2005

Sammlungen

P071 - GCB 2005 - German Conference on Bioinformatics

Komplettanzeige

Using N-terminal targeting sequences, amino acid composition, and sequence motifs for predicting protein subcellular localization

Volltext URI

Dokumententyp

Dateien

Zusatzinformation

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Zusammenfassung

Beschreibung

Schlagwörter

Zitierform

DOI

Tags

Sammlungen