Strahl S, Müller M (2025)
Publication Type: Journal article, other
Publication year: 2025
Book Volume: 33
Pages Range: 2622-2633
DOI: 10.1109/TASLPRO.2025.3581119
Open Access Link: https://ieeexplore.ieee.org/document/11044787
Fundamental frequency (F0) estimation is a critical task in audio, speech, and music processing applications, such as speech analysis and melody extraction. F0 estimation algorithms generally fall into two paradigms: classical signal processing-based methods and neural network-based approaches. Classical methods, like YIN and SWIPE, rely on explicit signal models, offering interpretability and computational efficiency, but their non-differentiable components hinder integration into deep learning pipelines. Neural network-based methods, such as CREPE, are fully differentiable and flexible but often lack interpretability and require substantial computational resources. In this paper, we propose differentiable variants of two classical algorithms, dYIN and dSWIPE, which combine the strengths of both paradigms. These variants enable gradient-based optimization while preserving the efficiency and interpretability of the original methods. Through several case studies, we demonstrate their potential: First, we use gradient descent to reverse-engineer audio signals, showing that dYIN and dSWIPE produce smoother gradients compared to CREPE. Second, we design a two-stage vocal melody extraction pipeline that integrates music source separation with a differentiable F0 estimator, providing an interpretable intermediate representation. Finally, we optimize dSWIPE's spectral templates for timbre-specific F0 estimation on violin recordings, demonstrating its enhanced adaptability over SWIPE. These case studies highlight that dYIN and dSWIPE successfully combine the flexibility of neural network-based methods with the interpretability and efficiency of classical algorithms, making them valuable tools for building end-to-end trainable and transparent systems.
APA:
Strahl, S., & Müller, M. (2025). dYIN and dSWIPE: Differentiable Variants of Classical Fundamental Frequency Estimators. IEEE/ACM Transactions on Audio, Speech and Language Processing, 33, 2622-2633. https://doi.org/10.1109/TASLPRO.2025.3581119
MLA:
Strahl, Sebastian, and Meinard Müller. "dYIN and dSWIPE: Differentiable Variants of Classical Fundamental Frequency Estimators." IEEE/ACM Transactions on Audio, Speech and Language Processing 33 (2025): 2622-2633.
BibTeX: Download