GI LogoGI Logo
  • Login
Digital Library
    • All of DSpace

      • Communities & Collections
      • Titles
      • Authors
      • By Issue Date
      • Subjects
    • This Collection

      • Titles
      • Authors
      • By Issue Date
      • Subjects
Digital Library Gesellschaft für Informatik e.V.
GI-DL
    • English
    • Deutsch
  • English 
    • English
    • Deutsch
View Item 
  •   DSpace Home
  • Lecture Notes in Informatics
  • Proceedings
  • INFORMATIK - Jahrestagung der Gesellschaft für Informatik e.V.
  • P275 - INFORMATIK 2017
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.
  •   DSpace Home
  • Lecture Notes in Informatics
  • Proceedings
  • INFORMATIK - Jahrestagung der Gesellschaft für Informatik e.V.
  • P275 - INFORMATIK 2017
  • View Item

Word Embeddings for Practical Information Retrieval

Author:
Galke, Lukas [DBLP] ;
Saleh, Ahmed [DBLP] ;
Scherp, Ansgar [DBLP]
Abstract
We assess the suitability of word embeddings for practical information retrieval scenarios. Thus, we assume that users issue ad-hoc short queries where we return the first twenty retrieved documents after applying a boolean matching operation between the query and the documents. We compare the performance of several techniques that leverage word embeddings in the retrieval models to compute the similarity between the query and the documents, namely word centroid similarity, paragraph vectors, Word Mover’s distance, as well as our novel inverse document frequency (IDF) re-weighted word centroid similarity. We evaluate the performance using the ranking metrics mean average precision, mean reciprocal rank, and normalized discounted cumulative gain. Additionally, we inspect the retrieval models’ sensitivity to document length by using either only the title or the full-text of the documents for the retrieval task. We conclude that word centroid similarity is the best competitor to state-of-the-art retrieval models. It can be further improved by re-weighting the word frequencies with IDF before aggregating the respective word vectors of the embedding. The proposed cosine similarity of IDF re-weighted word vectors is competitive to the TF-IDF baseline and even outperforms it in case of the news domain with a relative percentage of 15%.
  • Citation
  • BibTeX
Galke, L., Saleh, A. & Scherp, A., (2017). Word Embeddings for Practical Information Retrieval. In: Eibl, M. & Gaedke, M. (Hrsg.), INFORMATIK 2017. Gesellschaft für Informatik, Bonn. (S. 2155-2167). DOI: 10.18420/in2017_215
@inproceedings{mci/Galke2017,
author = {Galke, Lukas AND Saleh, Ahmed AND Scherp, Ansgar},
title = {Word Embeddings for Practical Information Retrieval},
booktitle = {INFORMATIK 2017},
year = {2017},
editor = {Eibl, Maximilian AND Gaedke, Martin} ,
pages = { 2155-2167 } ,
doi = { 10.18420/in2017_215 },
publisher = {Gesellschaft für Informatik, Bonn},
address = {}
}
DateienGroesseFormatAnzeige
B29-2.pdf421.0Kb PDF View/Open

Sollte hier kein Volltext (PDF) verlinkt sein, dann kann es sein, dass dieser aus verschiedenen Gruenden (z.B. Lizenzen oder Copyright) nur in einer anderen Digital Library verfuegbar ist. Versuchen Sie in diesem Fall einen Zugriff ueber die verlinkte DOI: 10.18420/in2017_215

Haben Sie fehlerhafte Angaben entdeckt? Sagen Sie uns Bescheid: Send Feedback

More Info

DOI: 10.18420/in2017_215
ISBN: 978-3-88579-669-5
ISSN: 1617-5468
xmlui.MetaDataDisplay.field.date: 2017
Language: en (en)

Keywords

  • Word embeddings
  • Document representation
  • Information retrieval
Collections
  • P275 - INFORMATIK 2017 [266]

Show full item record


About uns | FAQ | Help | Imprint | Datenschutz

Gesellschaft für Informatik e.V. (GI), Kontakt: Geschäftsstelle der GI
Diese Digital Library basiert auf DSpace.

 

 


About uns | FAQ | Help | Imprint | Datenschutz

Gesellschaft für Informatik e.V. (GI), Kontakt: Geschäftsstelle der GI
Diese Digital Library basiert auf DSpace.