🎵 AudioTextHTDemucs - Text-Conditioned Stem Separation

Upload an audio file and enter a text prompt to separate specific stems from the mixture.

Example prompts:

drums - Extract drum sounds
bass - Extract bass guitar
vocals - Extract singing voice
other - Extract other instruments
Or any natural language description like "extract the guitar" or "piano sound"

Input

Upload Audio File

YouTube Video URL (optional)

Text Prompt

Click to use example prompts

Status

Input Mixture

Input Audio (Original Mix)

Input Spectrogram

Separated Output

Separated Audio

Output Spectrogram

Notes

The model works best with music audio sampled at 44.1kHz
Processing time depends on audio length (segments processed in 6-second chunks)
The model was trained on stems: drums, bass, vocals, and other instruments
You can use natural language descriptions thanks to CLAP text embeddings