Logo des Repositoriums
 

The Best of Both Worlds: Combining Hand-Tuned and Word-Embedding-Based Similarity Measures for Entity Resolution

dc.contributor.authorChen, Xiao
dc.contributor.authorCampero Durand, Gabriel
dc.contributor.authorZoun, Roman
dc.contributor.authorBroneske, David
dc.contributor.authorLi, Yang
dc.contributor.authorSaake, Gunter
dc.contributor.editorGrust, Torsten
dc.contributor.editorNaumann, Felix
dc.contributor.editorBöhm, Alexander
dc.contributor.editorLehner, Wolfgang
dc.contributor.editorHärder, Theo
dc.contributor.editorRahm, Erhard
dc.contributor.editorHeuer, Andreas
dc.contributor.editorKlettke, Meike
dc.contributor.editorMeyer, Holger
dc.date.accessioned2019-04-11T07:21:16Z
dc.date.available2019-04-11T07:21:16Z
dc.date.issued2019
dc.description.abstractRecently word embedding has become a beneficial technique for diverse natural language processing tasks, especially after the successful introduction of several popular neural word embedding models, such as word2vec, GloVe, and FastText. Also entity resolution, i.e., the task of identifying digital records that refer to the same real-world entity, has been shown to benefit from word embedding. However, the use of word embeddings does not lead to a one-size-fits-all solution, because it cannot provide an accurate result for those values without any semantic meaning, such as numerical values. In this paper, we propose to use the combination of general word embedding with traditional hand-picked similarity measures for solving ER tasks, which aims to select the most suitable similarity measure for each attribute based on its property. We provide some guidelines on how to choose suitable similarity measures for different types of attributes and evaluate our proposed hybrid method on both synthetic and real datasets. Experiments show that a hybrid method reliant on correctly selecting required similarity measures can outperform the method of purely adopting traditional or word-embedding-based similarity measures.en
dc.identifier.doi10.18420/btw2019-14
dc.identifier.isbn978-3-88579-683-1
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/21698
dc.language.isoen
dc.publisherGesellschaft für Informatik, Bonn
dc.relation.ispartofBTW 2019
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) – Proceedings, Volume P-289
dc.subjectEntity resolution
dc.subjectWord embedding
dc.subjectSimilarity measures
dc.subjectLearning-based entity resolution
dc.titleThe Best of Both Worlds: Combining Hand-Tuned and Word-Embedding-Based Similarity Measures for Entity Resolutionen
gi.citation.endPage224
gi.citation.startPage215
gi.conference.date4.-8. März 2019
gi.conference.locationRostock
gi.conference.sessiontitleWissenschaftliche Beiträge

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
B5-2.pdf
Größe:
289.37 KB
Format:
Adobe Portable Document Format