Logo des Repositoriums
 

A general paradigm for fast, adaptive clustering of biological sequences

dc.contributor.authorReinert, Knut
dc.contributor.authorBauer, Markus
dc.contributor.authorDöring, Andreas
dc.contributor.authorKlau, Gunnar W.
dc.contributor.authorHalpern, Aaron L.
dc.contributor.editorFalter, Claudia
dc.contributor.editorSchliep, Alexander
dc.contributor.editorSelbig, Joachim
dc.contributor.editorVingron, Martin
dc.contributor.editorWalther, Dirk
dc.date.accessioned2019-05-15T08:32:31Z
dc.date.available2019-05-15T08:32:31Z
dc.date.issued2007
dc.description.abstractThere are numerous methods that compute clusterings of biological sequences based on pairwise distances. This necessitates the computation of O(n2) sequence comparisons. Users usually want to apply the most sensitive distance measure which normally is the most expensive in terms of runtime. This poses a problem if the number of sequences is large or the computation of the measure is slow. In this paper we present a general heuristic to speed up distance based clustering methods considerably while compromising little on the accuracy of the results. The speedup comes from using fast comparison methods to perform an initial ‘top-down’ split into relatively homogeneous clusters, while the slower measures are used for smaller groups. Then profiles are computed for the final groups and the resulting profiles are used in a bottom-up phase to compute the final clustering. The algorithm is general in the sense that any sequence comparison method can be employed (e.g. for DNA, RNA or amino acids). We test our algorithm using a prototypical imple- mentation for agglomerative RNA clustering and show its effectiveness.en
dc.identifier.isbn978-3-88579-209-3
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/22371
dc.language.isoen
dc.publisherGesellschaft für Informatik e. V.
dc.relation.ispartofGerman conference on bioinformatics – GCB 2007
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-115
dc.titleA general paradigm for fast, adaptive clustering of biological sequencesen
dc.typeText/Conference Paper
gi.citation.endPage29
gi.citation.publisherPlaceBonn
gi.citation.startPage15
gi.conference.dateSeptember 26-28, 2007, Potsdam,
gi.conference.locationPotsdam
gi.conference.sessiontitleRegular Research Papers

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
15.pdf
Größe:
615.48 KB
Format:
Adobe Portable Document Format