An Empirical Study of Flaky Tests in Python

This is a summary of our work presented at the International Conference on Software Testing 2021 [Gr21b]. Tests that cause spurious failures without code changes, i.e., flaky tests, hamper regression testing and decrease trust in tests. While the prevalence and importance of flakiness is well established, prior research focused on Java projects, raising questions about generalizability. To provide a better understanding of flakiness, we empirically study the prevalence, causes, and degree of flakiness within 22 352 Python projects containing 876 186 tests. We found flakiness to be equally prevalent in Python as in Java. The reasons, however, are different: Order dependency is a dominant problem, causing 59% of the 7 571 flaky tests we found. Another 28% were caused by test infrastructure problems, a previously less considered cause of flakiness. The remaining 13% can mostly be attributed to the use of network and randomness APIs. Unveiling flaky tests also requires more runs than often assumed: A 95% confidence that a passing test is not flaky on average would require 170 reruns. Additionally, through our investigations, we created a large dataset of flaky tests that other researchers already started building on [MM21; Ni21].

Gruber, Martin; Lukasczyk, Stephan; Kroiß, Florian; Fraser, Gordon (2022): An Empirical Study of Flaky Tests in Python. Software Engineering 2022. DOI: 10.18420/se2022-ws-009. Bonn: Gesellschaft für Informatik e.V.. PISSN: 1617-5468. ISBN: 978-3-88579-714-2. pp. 37-38. Wissenschaftliches Hauptprogramm. Berlin/Virtuell. 21.-25. Feburar 2022

Schlagwörter

Flaky Test , Python , Empirical Study

DOI

10.18420/se2022-ws-009

Sammlungen

P320 - Software Engineering 2022

Komplettanzeige

An Empirical Study of Flaky Tests in Python

Volltext URI

Dokumententyp

Dateien

Zusatzinformation

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Zusammenfassung

Beschreibung

Schlagwörter

Zitierform

DOI

Tags

Sammlungen