Hernandez A, Perez Toro PA, Arias-Vergara T, Vasquez-Correa JC, Yang SH, Orozco-Arroyave JR, Maier A (2024)
Publication Type: Conference contribution
Publication year: 2024
Publisher: Springer Science and Business Media Deutschland GmbH
Book Volume: 15049 LNAI
Pages Range: 149-160
Conference Proceedings Title: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Event location: Brno, CZE
ISBN: 9783031705656
DOI: 10.1007/978-3-031-70566-3_14
Acquiring speech data is a crucial step in the development of speech recognition systems and related speech-based machine learning models. However, protecting privacy is an increasing concern that must be addressed. This study investigates voice conversion (VC) as a strategy for anonymizing the speech of individuals with dysarthria. We specifically focus on training a variety of VC models using self-supervised speech representations, such as Wav2Vec and its multi-lingual variant, Wav2Vec2.0 (XLSR). The converted voices maintain a word error rate that is within 1% with respect to the original recordings. The Equal Error Rate (EER) showed a significant increase, from 1.52% to 41.18% on the LibriSpeech test set, and from 3.75% to 42.19% on speakers from the VCTK corpus, indicating a substantial decrease in speaker verification performance. A similar trend is observed with dysarthric speech, where the EER varied from 16.45% to 43.46%. Additionally, our study includes classification experiments on dysarthric vs. healthy speech data to demonstrate that anonymized voices can still yield speech features essential for distinguishing between healthy and pathological speech. The impact of voice conversion is investigated by covering aspects such as articulation, prosody, phonation, and phonology.
APA:
Hernandez, A., Perez Toro, P.A., Arias-Vergara, T., Vasquez-Correa, J.C., Yang, S.H., Orozco-Arroyave, J.R., & Maier, A. (2024). Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation. In Elmar Nöth, Aleš Horák, Petr Sojka (Eds.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 149-160). Brno, CZE: Springer Science and Business Media Deutschland GmbH.
MLA:
Hernandez, Abner, et al. "Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation." Proceedings of the 27th International Conference on Text, Speech, and Dialogue, TSD 2024, Brno, CZE Ed. Elmar Nöth, Aleš Horák, Petr Sojka, Springer Science and Business Media Deutschland GmbH, 2024. 149-160.
BibTeX: Download