Auflistung nach Schlagwort "Benchmarks"
1 - 3 von 3
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragGenBenchDroid: Fuzzing Android Taint Analysis Benchmarks(Software Engineering 2023, 2023) Schott, Stefan; Pauck, FelixThe conventional approach of assessing the performance of Android taint analysis tools consists of applying the tool to already existing benchmarks and calculating its performance on the contained benchmark cases. Creating and maintaining a benchmark requires a lot of effort, since it needs to comprise various analysis challenges, and since each benchmark case needs a well documented ground-truth - otherwise one cannot know whether a tool’s analysis is accurate. This effort is further increased by the frequently changing Android API. All these factors lead to the same, usually manually created, benchmarks being reused over and over again. In consequence analysis tools are often over-adapted to these benchmarks. To overcome these issues we propose the concept of benchmark fuzzing , which allows the generation of previously unknown and unique benchmarks, alongside their ground-truths, at evaluation time. We implement this approach in our tool GenBenchDroid and additionally show that we are able to find analysis faults that remain uncovered when solely relying on the conventional benchmarking approach.
- WorkshopbeitragHow informative is informative? Benchmarks and optimal cut points for E-Health Websites(Mensch und Computer 2019 - Workshopband, 2019) Thielsch, Meinald T.; Thielsch, Carolin; Hirschfeld, GerritScores of different evaluation measures resulting from website tests are difficult to interpret without comparative data. Benchmarks and optimal cut points provide such interpretation aids. Benchmarks are usually built with test score means based on a tested pool of comparable websites. Optimal cut points are calculated with an external criterion using receiver-operating-characteristic (ROC) based methods applied on website evaluations. Due to relevance and sensitivity of the topic, making the right decision based on evaluation data is of particular importance for creators and owners of websites presenting health-related information. Thus, we combined data of two studies, with a total of n = 2.614 participants, evaluating m=33 health-related websites. Established questionnaires were applied: Web-CLIC (website content), PWU-G and UMUX-Lite (usability), VisAWI-S (aesthetics), and trusting belief scales of McKnight et al. [7]. We calculated overall and specific values for four categories of e-health websites. Benchmarks were quite comparable among categories while optimal cut points differed more. Particularly, cut points were high for charity websites and partly lower for the category “Personal sites & support groups”. In general, user requirements for e-health websites appear to be significantly higher than available published benchmarks and cut points for websites in other areas.
- KonferenzbeitragReproducing Taint-Analysis Results with ReproDroid(Software Engineering 2020, 2020) Pauck, Felix; Bodden, Eric; Wehrheim, HeikeMore and more Android taint-analysis tools appear each year. Any paper proposing such a tool typically comes with an in-depth evaluation of its supported features, accuracy and ability to be applied on real-world apps. Although the authors spent a lot of effort to come up with these evaluations, comparability is often hindered since the description of their experimental targets is usually limited. To conduct a comparable, automatic and unbiased evaluation of different analysis tools, we propose the framework ReproDroid. The framework enables us to precisely declare our evaluation targets, in consequence we refine three well-known benchmarks: DroidBench, ICCBench and DIALDroidBench. Furthermore, we instantiate this framework for six prominent taint-analysis tools, namely Amandroid, DIALDroid, DidFail, DroidSafe, FlowDroid and IccTA. Finally, we use these instances to automatically check whether different promises commonly made in the associated proposing papers are kept.