Logo des Repositoriums
 
Konferenzbeitrag

WannaDB: Ad-hoc SQL Queries over Text Collections

Lade...
Vorschaubild

Volltext URI

Dokumententyp

Text/Conference Paper

Zusatzinformation

Datum

2023

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Gesellschaft für Informatik e.V.

Zusammenfassung

n this paper, we propose a new system called WannaDB that allows users to interactively perform structured explorations of text collections in an ad-hoc manner. Extracting structured data from text is a classical problem where a plenitude of approaches and even industry-scale systems already exists. However, these approaches lack in the ability to support the ad-hoc exploration of texts using structured queries. The main idea of WannaDB is to include user interaction to support ad-hoc SQL queries over text collections using a new two-phased approach. First, a superset of information nuggets from the texts is extracted using existing extractors such as named entity recognizers. Then, the extractions are interactively matched to a structured table definition as requested by the user based on embeddings. In our evaluation, we show that WannaDB is thus able to extract structured data from a broad range of (real-world) text collections in high quality without the need to design extraction pipelines upfront.

Beschreibung

Hättasch, Benjamin; Bodensohn, Jan-Micha; Vogel, Liane; Urban, Matthias; Binnig, Carsten (2023): WannaDB: Ad-hoc SQL Queries over Text Collections. BTW 2023. DOI: 10.18420/BTW2023-08. Bonn: Gesellschaft für Informatik e.V.. ISBN: 978-3-88579-725-8. pp. 157-181. Dresden, Germany. 06.-10. März 2023

Zitierform

Tags