Rosà, AndreaRosales, EduardoBinder, WalterFelderer, MichaelHasselbring, WilhelmRabiser, RickJung, Reiner2020-02-032020-02-032020978-3-88579-694-7https://dl.gi.de/handle/20.500.12116/31724Our article published in ACM Transactions on Programming Languages and Systems (TOPLAS) (which extends our work published in the proceedings of the 2018 IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2018))presents a new methodology to accurately and efficiently collect the granularity of each executed task. Task granularity, i.e., the amount of work performed by parallel tasks, is a key performance attribute of parallel applications. On the one hand, fine-grained tasksmay introduce considerable parallelization overheads. On the other hand, coarse-grained tasks may not fully utilize the available CPU cores, leading to missed parallelization opportunities. We implement our methodology in tgp, a novel task-granularity profiler that collects carefully selected metrics from the whole system stack with low overhead, and helps developers locate performance and scalability problems. We analyze task granularity in the DaCapo, ScalaBench, and Spark Perf benchmark suites, revealing inefficiencies related to fine-grained and coarse-grained tasks in several applications We demonstrate that the collected task-granularity profiles are actionable by optimizing task granularity in several applications, achieving speedups up to a factor of 5.9x. tgp is available open-source at https://github.com/fithos/tgp/enTask granularitytask parallelismperformance analysis and optimizationvertical profilingactionable profilesJava Virtual MachineAnalysis and Optimization of Task Granularity on the Java Virtual MachineText/Conference Paper10.18420/SE2020_451617-5468