Auflistung nach:
Auflistung BTW - Datenbanksysteme für Business, Technologie und Web nach Autor:in "Abedjan, Ziawasch"
1 - 3 von 3
Treffer pro Seite
Sortieroptionen
- TextdokumentCombining Programming-by-Example with Transformation Discovery from large Databases(BTW 2021, 2021) özmen, Aslihan; Esmailoghli, Mahdi; Abedjan, ZiawaschData transformation discovery is one of the most tedious tasks in data preparation. In particular, the generation of transformation programs for semantic transformations is tricky because additional sources for look-up operations are necessary. Current systems for semantic transformation discovery face two major problems: either they follow a program synthesis approach that only scales to a small set of input tables, or they rely on extraction of transformation functions from large corpora, which requires the identification of exact transformations in those resources and is prone to noisy data. In this paper, we try to combine approaches to benefit from large corpora and the sophistication of program synthesis. To do so, we devise a retrieval and pruning strategy ensemble that extracts the most relevant tables for a given transformation task. The extracted resources can then be processed by a program synthesis engine to generate more accurate transformation results than state-of-the-art.
- KonferenzbeitragDuplicate Table Discovery with Xash(BTW 2023, 2023) Koch, Maximilian; Esmailoghli, Mahdi; Auer, Sören; Abedjan, ZiawaschData lakes are typically lightly curated and as such prone to data quality problems and inconsistencies. In particular, duplicate tables are common in most repositories. The goal of duplicate table detection is to identify those tables that display the same data.Comparing tables is generally quite expensive as the order of rows and columns might differ for otherwise identical tables. In this paper, we explore the application of Xash, a hash function previously proposed for the discovery of multi-column join candidates, for the use case of duplicate table detection. With Xash, it is possible to generate a so-called super key, which serves like a bloom filter and instantly identifies the existence of particular cell values. We show that using Xash it is possible to speed up the duplicate table detection process significantly. In comparison to other hash functions, such as SimHash and other competitors, Xash results in fewer false positive candidates.
- TextdokumentExplanation of Air Pollution Using External Data Sources(BTW 2019 – Workshopband, 2019) Esmailoghli, Mahdi; Redyuk, Sergey; Martinez, Ricardo; Abedjan, Ziawasch; Rabl, Tilmann; Markl, Volker