Schwab, Peter K.Röckl, JonasLangohr, Maximilian S.Meyer-Wegener, Klaus2022-01-272022-01-2720212021http://dx.doi.org/10.1007/s13222-021-00385-9https://dl.gi.de/handle/20.500.12116/38046Data science must respect privacy in many situations. We have built a query repository with automatic SQL query classification according to data-privacy directives. It can intercept queries that violate the directives, since a JDBC proxy driver inserted between the end-users’ SQL tooling and the target data consults the repository for the compliance of each query. Still, this slows down query processing. This paper presents two optimizations implemented to increase classification performance and describes a measurement environment that allows quantifying the induced performance overhead. We present measurement results and show that our optimized implementation significantly reduces classification latency. The query metadata (QM) is stored in both relational and graph-based databases. Whereas query classification can be done in a few ms on average using relational QM, a graph-based classification is orders of magnitude more expensive at 137 ms on average. However, the graphs contain more precise information, and thus in some cases the final decision requires to check them, too. Our optimizations considerably reduce the number of graph-based classifications and, thus, decrease the latency to 0.35 ms in $$87\%$$ 87 % of the classification cases.Data privacyPerformancePolicy rulesQuery classificationPerformance Evaluation of Policy-Based SQL Query Classification for Data-Privacy ComplianceText/Journal Article10.1007/s13222-021-00385-91610-1995