Mining Input Grammars

Gopinath, RahulMathis, BjörnZeller, AndreasKoziolek, AnneSchaefer, InaSeidl, Christoph2020-12-172020-12-172021978-3-88579-704-3https://dl.gi.de/handle/20.500.12116/34507To assess the behavior of a program, one needs to understand its inputs---their sources, their structure, and how they lead to individual behavior. But as syntax and semantics of inputs are almost never completely specified, humans and computers constantly have to figure out how to produce a particular behavior. In this abstract, we show how to automatically extract accurate, well-structured input grammars from existing programs. Such input grammars are useful for software testing, as they can serve as producers of valid, high-quality inputs for software testing that easily pass through parsing and validation to reliably trigger the desired program behavior. Moreover, they allow testers to control which inputs are to be produced - in contrast to the majority of fuzzers, that operate as black boxes. Our Mimid prototype uses dynamic tainting to extract input grammars from given programs - grammars that are well-structured and highly readable, even for complex recursive input formats such as JavaScript or JSON. Specific parser-directed test generators systematically explore the input syntax, such that grammars can be mined even without any given inputs.engrammargrammar miningautomated testingfuzzinginput generationMining Input GrammarsText/ConferencePaper10.18420/SE2021_131617-5468