An application of latent topic document analysis to large-scale proteomics databases

Since the advent of public data repositories for proteomics data, readily accessible results from high-throughput experiments have been accumulating steadily. Several large-scale projects in particular have contributed substantially to the amount of identifications available to the community. Despite the considerable body of information amassed, very few successful analysis have been performed and published on this data, levelling off the ultimate value of these projects far below their potential. In order to illustrate that these repositories should be considered sources of detailed knowledge instead of data graveyards, we here present a novel way of analyzing the information contained in proteomics experiments with a ’latent semantic analysis’. We apply this information retrieval approach to the peptide identification data contributed by the Plasma Proteome Project. Interestingly, this analysis is able to overcome the fundamental difficulties of analyzing such divergent and heterogeneous data emerging from large scale proteomics studies employing a vast spec- trum of different sample treatment and mass-spectrometry technologies. Moreover, it yields several concrete recommendations for optimizing pro- teomics project planning as well as the choice of technologies used in the experiments. It is clear from these results that the analysis of large bodies of publicly available proteomics data holds great promise and is currently underexploited.

Klie, Sebastian; Martens, Lennart; Vizcaino, Juan Antonio; Cote, Richard; Jones, Phil; Apweiler, Rolf; Hinneburg, Alexander; Hermjakob, Henning (2007): An application of latent topic document analysis to large-scale proteomics databases. German conference on bioinformatics – GCB 2007. Bonn: Gesellschaft für Informatik e. V.. PISSN: 1617-5468. ISBN: 978-3-88579-209-3. pp. 174-184. Regular Research Papers. Potsdam. September 26-28, 2007, Potsdam,

Sammlungen

P115 - GCB 2007 - German Conference on Bioinformatics 2007

Komplettanzeige

An application of latent topic document analysis to large-scale proteomics databases

Volltext URI

Dokumententyp

Dateien

Zusatzinformation

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Zusammenfassung

Beschreibung

Schlagwörter

Zitierform

DOI

Tags

Sammlungen