Discovering Multi-Dimensional Subsequence Queries from Traces -- From Theory to Practice
Vorschaubild nicht verfügbar
ISSN der Zeitschrift
Gesellschaft für Informatik e.V.
Subsequence-queries with wildcards and gap-size constraints (swg-queries, for short) are an expressive model for sequence data, in which queries are described by patterns over an alphabet of variables and types, along with a global window size and a number of gap-size constraints. They are evaluated over a trace, i.e., a sequence of types, by replacing variables by single types, while satisfying the window and the gap-size constraints. Kleest-Meißner et al. (Proc. ICDT 2022) formalised the task of discovering an swg-query that describes best a given sample consisting of a finite number of traces, and developed a discovery algorithm solving this task. However, in practical application scenarios, traces are often multi-dimensional, i.e., a trace corresponds to a sequence of tuples of types, which renders the existing technique inapplicable.In this paper, we lift the notion of swg-queries to such a multi-dimensional setting, thereby enlarging the applicability of the query model and the techniques for query discovery. We introduce a mapping between one-dimensional and multi-dimensional sequence data, such that a multi-dimensional trace matches a multi-dimensional query if and only if the corresponding one-dimensional trace matches the corresponding one-dimensional query. We complement our formal results with a description of our prototypical implementation of query discovery for multi-dimensional sequence data. Results from evaluation experiments with real-world data indicate feasibility of our approach.