Curricular SincNet: Towards Robust Deep Speaker Recognition by Emphasizing Hard Samples in Latent Space

Chowdhury, Labib; Kamal, Mustafa; Tasnim,  Najia; Mohammed, Nabeel

Curricular SincNet: Towards Robust Deep Speaker Recognition by Emphasizing Hard Samples in Latent Space

dc.contributor.author	Chowdhury, Labib
dc.contributor.author	Kamal, Mustafa
dc.contributor.author	Tasnim, Najia
dc.contributor.author	Mohammed, Nabeel
dc.contributor.editor	Brömme, Arslan
dc.contributor.editor	Busch, Christoph
dc.contributor.editor	Damer, Naser
dc.contributor.editor	Dantcheva, Antitza
dc.contributor.editor	Gomez-Barrero, Marta
dc.contributor.editor	Raja, Kiran
dc.contributor.editor	Rathgeb, Christian
dc.contributor.editor	Sequeira, Ana
dc.contributor.editor	Uhl, Andreas
dc.date.accessioned	2021-10-04T08:43:51Z
dc.date.available	2021-10-04T08:43:51Z
dc.date.issued	2021
dc.description.abstract	Deep learning models have become an increasingly preferred option for biometric recognition systems; such as speaker recognition. SincNet, a deep neural network architecture gained popularity in speaker recognition tasks, due to its use of parameterized sinc functions that allow it to work directly on the speech signal. The original SincNet architecture uses the softmax loss which may not be the most suitable choice for recognition-based tasks, as such loss functions do not impose inter-class margins nor does it differentiate between easy and hard training samples. Curriculum learning, particularly those leveraging angular margin-based losses has proven to be very successful in other biometric applications such as face recognition. The advantage of such a curriculum learning-based techniques is that it will impose inter-class margins as well as taking to account easy and hard samples. In this paper, we propose Curricular SincNet (CL-SincNet), an improved SincNet model where we use a curricular loss function to do the training on the SincNet architecture. The proposed model is evaluated on multiple datasets using intra-dataset and inter-dataset evaluation protocol. In both settings, the model performs competitively with other previously published work and in the case of inter-dataset testing, it achieves the best overall results with a reduction of 4% error rate compare to SincNet and other published work.	en
dc.identifier.isbn	978-3-88579-709-8
dc.identifier.pissn	1617-5468
dc.identifier.uri	https://dl.gi.de/handle/20.500.12116/37471
dc.language.iso	en
dc.publisher	Gesellschaft für Informatik e.V.
dc.relation.ispartof	BIOSIG 2021 - Proceedings of the 20th International Conference of the Biometrics Special Interest Group
dc.relation.ispartofseries	Lecture Notes in Informatics (LNI) - Proceedings, Volume P-315
dc.subject	Biometric Authentication
dc.subject	Speaker Recognition
dc.subject	Curriculum Learning
dc.title	Curricular SincNet: Towards Robust Deep Speaker Recognition by Emphasizing Hard Samples in Latent Space	en
dc.type	Text/Conference Paper
gi.citation.endPage	50
gi.citation.publisherPlace	Bonn
gi.citation.startPage	43
gi.conference.date	15.-17. September 2021
gi.conference.location	International Digital Conference
gi.conference.sessiontitle	Regular Research Papers

Dateien

Originalbündel

1 - 1 von 1

Name:: biosig2021_proceedings_05.pdf
Größe:: 107.03 KB
Format:: Adobe Portable Document Format

Herunterladen

Sammlungen

P315 - BIOSIG 2021 - Proceedings of the 20th International Conference of the Biometrics Special Interest Group