SCALA-Speech: An Interactive System for Finding and Analyzing Speech Content in Audio Data

Cornaggia-Urrigshardt,AlessiaJarocky,NikitaKurth,FrankUrrigshardt,SebastianWilkinghoff,KevinDemmler, DanielKrupka, DanielFederrath, Hannes2022-09-282022-09-282022978-3-88579-720-3https://dl.gi.de/handle/20.500.12116/39561Audio data does not contain as much static information as images and texts and thus analyses inherently require more time. Although in monitoring applications it is likely that large quantities of the captured audio files do not contain meaningful information, without prior knowledge investigators need to listen to all audio files in full length. In this work, a system for automatically finding and analyzing speech content in audio data is presented. The system provides different speech processing algorithms as well as a graphical interface (SCALA) for assisting investigators in audio analysis. The system consists of four components: speech detection, language recognition, speaker diarization/recognition and keyword spotting. SCALA-Speech structures audio data by recognizing speech regions, used languages and speaker changes, thus enabling investigators to listen to audio data more efficiently. Furthermore, specific speakers and keywords can be annotated and searched for. Usage of SCALA-Speech is demonstrated on audio tracks of videos linked in Twitter posts related to an exemplary topic.enAudio MonitoringSpeech DetectionLanguage RecognitionSpeaker DiarizationKeyword SpottingDeep LearningSCALA-Speech: An Interactive System for Finding and Analyzing Speech Content in Audio Data10.18420/inf2022_061617-5468