Post-Debugging in Large Scale Big Data Analytic Systems

Bergen, EduardEdlich, StefanMitschang, BernhardNicklas, DanielaLeymann, FrankSchöning, HaraldHerschel, MelanieTeubner, JensHärder, TheoKopp, OliverWieland, Matthias2017-06-212017-06-212017978-3-88579-660-2Data scientists often need to fine tune and resubmit their jobs when processing a large quantity of data in big clusters because of a failed behavior of currently executed jobs. Consequently, data scientists also need to filter, combine, and correlate large data sets. Hence, debugging a job locally helps data scientists to figure out the root cause and increases efficiency while simplifying the working process. Discovering the root cause of failures in distributed systems involve a different kind of information such as the operating system type, executed system applications, the execution state, and environment variables. In general, log files contain this type of information in a cryptic and large structure. Data scientists need to analyze all related log files to get more insights about the failure and this is cumbersome and slow. Another possibility is to use our reference architecture. We extract remote data and replay the extraction on the developer’s local debugging environment.enSoftware debuggingBug detectionlocalization and diagnosisJava Virtual MachineJVMTIBytecode instrumentationApache FlinkApplication-level failuresPost-Debugging in Large Scale Big Data Analytic SystemsText/Conference Paper1617-5468