Keyword extraction for text characterization
ISSN der Zeitschrift
Gesellschaft für Informatik e.V.
Keywords are valuable means for characterizing texts. In order to extract keywords we propose an efficient and robust, languageand domainindependent approach which is based on small word parts (quadgrams). The basic algorithm can be improved by re-examining and re-ranking keywords using edit distance (i.e. Levenshtein distance) and an algorithm based on the relativistic addition of velocities (here: weights). For the purpose of evaluation, we compare our approach to frequency-based keyword extraction (exemplary text collection: 45000 intranet documents in German and English).