Auflistung nach Autor:in "Galke, Lukas"
1 - 3 von 3
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragInductive Learning of Concept Representations from Library-Scale Bibliographic Corpora(INFORMATIK 2019: 50 Jahre Gesellschaft für Informatik – Informatik für Gesellschaft, 2019) Galke, Lukas; Melnychuk, Tetyana; Seidlmayer, Eva; Trog, Steffen; Förstner, Konrad U.; Schultz, Carsten; Tochtermann, KlausAutomated research analyses are becoming more and more important as the volume of research items grows at an increasing pace. We pursue a new direction for the analysis of research dynamics with graph neural networks. So far, graph neural networks have only been applied to small-scale datasets and primarily supervised tasks such as node classification. We propose to use an unsupervised training objective for concept representation learning that is tailored towards bibliographic data with millions of research papers and thousands of concepts from a controlled vocabulary. We have evaluated the learned representations in clustering and classification downstream tasks. Furthermore, we have conducted nearest concept queries in the representation space. Our results show that the representations learned by graph convolution with our training objective are comparable to the ones learned by the DeepWalk algorithm. Our findings suggest that concept embeddings can be solely derived from the text of associated documents without using a lookup-table embedding. Thus, graph neural networks can operate on arbitrary document collections without re-training. This property makes graph neural networks useful for the analysis of research dynamics, which is often conducted on time-based snapshots of bibliographic data.
- KonferenzbeitragWhat If We Encoded Words as Matrices and Used Matrix Multiplication as Composition Function?(INFORMATIK 2019: 50 Jahre Gesellschaft für Informatik – Informatik für Gesellschaft, 2019) Galke, Lukas; Mai, Florian; Scherp, AnsgarWe summarize our contribution to the International Conference on Learning Representations CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model, 2019.We construct a text encoder that learns matrix representations of words from unlabeled text, while using matrix multiplication as composition function. We show that our text encoder outperforms continuous bag-of-word representations on 9 out of 10 linguistic probing tasks and argue that the learned representations are complementary to the ones of vector-based approaches. Hence, we construct a hybrid model that jointly learns a matrix and a vector for each word. This hybrid model yields higher scores than purely vector-based approaches on 10 out of 16 downstream tasks in a controlled experiment with the same capacity and training data. Across all 16 tasks, the hybrid model achieves an average improvement of 1.2%. These results are insofar promising, as they open up new opportunities to efficiently incorporate order awareness into word embedding models.
- TextdokumentWord Embeddings for Practical Information Retrieval(INFORMATIK 2017, 2017) Galke, Lukas; Saleh, Ahmed; Scherp, AnsgarWe assess the suitability of word embeddings for practical information retrieval scenarios. Thus, we assume that users issue ad-hoc short queries where we return the first twenty retrieved documents after applying a boolean matching operation between the query and the documents. We compare the performance of several techniques that leverage word embeddings in the retrieval models to compute the similarity between the query and the documents, namely word centroid similarity, paragraph vectors, Word Mover’s distance, as well as our novel inverse document frequency (IDF) re-weighted word centroid similarity. We evaluate the performance using the ranking metrics mean average precision, mean reciprocal rank, and normalized discounted cumulative gain. Additionally, we inspect the retrieval models’ sensitivity to document length by using either only the title or the full-text of the documents for the retrieval task. We conclude that word centroid similarity is the best competitor to state-of-the-art retrieval models. It can be further improved by re-weighting the word frequencies with IDF before aggregating the respective word vectors of the embedding. The proposed cosine similarity of IDF re-weighted word vectors is competitive to the TF-IDF baseline and even outperforms it in case of the news domain with a relative percentage of 15%.