dYIN and dSWIPE: Differentiable Variants of Classical Fundamental Frequency Estimators

Strahl S, Müller M (2025)

Publication Type: Journal article, other

Publication year: 2025

Journal

IEEE/ACM Transactions on Audio, Speech and Language Processing Institute of Electrical and Electronics Engineers (IEEE)

Book Volume: 33

Pages Range: 2622-2633

DOI: 10.1109/TASLPRO.2025.3581119

Open Access Link: https://ieeexplore.ieee.org/document/11044787

Abstract

Fundamental frequency (F0) estimation is a critical task in audio, speech, and music processing applications, such as speech analysis and melody extraction. F0 estimation algorithms generally fall into two paradigms: classical signal processing-based methods and neural network-based approaches. Classical methods, like YIN and SWIPE, rely on explicit signal models, offering interpretability and computational efficiency, but their non-differentiable components hinder integration into deep learning pipelines. Neural network-based methods, such as CREPE, are fully differentiable and flexible but often lack interpretability and require substantial computational resources. In this paper, we propose differentiable variants of two classical algorithms, dYIN and dSWIPE, which combine the strengths of both paradigms. These variants enable gradient-based optimization while preserving the efficiency and interpretability of the original methods. Through several case studies, we demonstrate their potential: First, we use gradient descent to reverse-engineer audio signals, showing that dYIN and dSWIPE produce smoother gradients compared to CREPE. Second, we design a two-stage vocal melody extraction pipeline that integrates music source separation with a differentiable F0 estimator, providing an interpretable intermediate representation. Finally, we optimize dSWIPE's spectral templates for timbre-specific F0 estimation on violin recordings, demonstrating its enhanced adaptability over SWIPE. These case studies highlight that dYIN and dSWIPE successfully combine the flexibility of neural network-based methods with the interpretability and efficiency of classical algorithms, making them valuable tools for building end-to-end trainable and transparent systems.

Authors with CRIS profile

Sebastian Strahl International Audio Laboratories Erlangen (AudioLabs) Meinard Müller Lehrstuhl für Semantische Audiosignalverarbeitung (AudioLabs)

How to cite

APA:

Strahl, S., & Müller, M. (2025). dYIN and dSWIPE: Differentiable Variants of Classical Fundamental Frequency Estimators. IEEE/ACM Transactions on Audio, Speech and Language Processing, 33, 2622-2633. https://doi.org/10.1109/TASLPRO.2025.3581119

MLA:

Strahl, Sebastian, and Meinard Müller. "dYIN and dSWIPE: Differentiable Variants of Classical Fundamental Frequency Estimators." IEEE/ACM Transactions on Audio, Speech and Language Processing 33 (2025): 2622-2633.

BibTeX: Download