Skip to the content.

Music Source Separation with Generative Flow

Ge Zhu, Jordan Darefsky, Fei Jiang, Anton Selitskiy, and Zhiyao Duan
AIRLab (University of Rochester),

Abstract

Full supervision models for source separation are trained on mixture-source parallel data and have achieved superior performance in recent years. However, large-scale and naturally mixed parallel training data are difficult to obtain for music, and such models are difficult to adapt to mixtures with new sources. Source-only supervision models, in contrast, only require clean sources for training; They learn source models and then apply these models to separate the mixture. In this paper, we leverage flow-based implicit generators to train music source priors and likelihood-based objectives to separate music mixtures. Experiments show that in singing voice separation and music separation tasks, our proposed approach achieves competitive performance to one of the full supervision systems. We also demonstrate that our approach is capable of separating new source tracks without the need of retraining the separation model from scratch as what full supervision models do.



Audio Samples

This mixture audio clip is from 'Zeno - Signs' in MUSDB18 test partition:

Separated sources:

Vocal Bass Drums Other
Ground Truth
Open-Unmix
Demucs(v2)
Wave-U-Net
Tasnet
InstGlow (Ours)


Bonus tracks

InstGlow: anyways here's wonderwall, and smoke on the water:

Mixture Vocals Accompaniment
Wonderwall
SOTW