WannaDB: Ad-hoc SQL Queries over Text Collections
dc.contributor.author | Hättasch, Benjamin | |
dc.contributor.author | Bodensohn, Jan-Micha | |
dc.contributor.author | Vogel, Liane | |
dc.contributor.author | Urban, Matthias | |
dc.contributor.author | Binnig, Carsten | |
dc.contributor.editor | König-Ries, Birgitta | |
dc.contributor.editor | Scherzinger, Stefanie | |
dc.contributor.editor | Lehner, Wolfgang | |
dc.contributor.editor | Vossen, Gottfried | |
dc.date.accessioned | 2023-02-23T14:00:24Z | |
dc.date.available | 2023-02-23T14:00:24Z | |
dc.date.issued | 2023 | |
dc.description.abstract | n this paper, we propose a new system called WannaDB that allows users to interactively perform structured explorations of text collections in an ad-hoc manner. Extracting structured data from text is a classical problem where a plenitude of approaches and even industry-scale systems already exists. However, these approaches lack in the ability to support the ad-hoc exploration of texts using structured queries. The main idea of WannaDB is to include user interaction to support ad-hoc SQL queries over text collections using a new two-phased approach. First, a superset of information nuggets from the texts is extracted using existing extractors such as named entity recognizers. Then, the extractions are interactively matched to a structured table definition as requested by the user based on embeddings. In our evaluation, we show that WannaDB is thus able to extract structured data from a broad range of (real-world) text collections in high quality without the need to design extraction pipelines upfront. | en |
dc.identifier.doi | 10.18420/BTW2023-08 | |
dc.identifier.isbn | 978-3-88579-725-8 | |
dc.identifier.uri | https://dl.gi.de/handle/20.500.12116/40390 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik e.V. | |
dc.relation.ispartof | BTW 2023 | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-331 | |
dc.subject | interactive text exploration | |
dc.subject | text to table | |
dc.subject | matching embeddings | |
dc.title | WannaDB: Ad-hoc SQL Queries over Text Collections | en |
dc.type | Text/Conference Paper | |
gi.citation.endPage | 181 | |
gi.citation.publisherPlace | Bonn | |
gi.citation.startPage | 157 | |
gi.conference.date | 06.-10. März 2023 | |
gi.conference.location | Dresden, Germany |
Dateien
Originalbündel
1 - 1 von 1