🎵 AudioTextHTDemucs - Text-Conditioned Stem Separation

Upload an audio file and enter a text prompt to separate specific stems from the mixture.

Example prompts:

  • drums - Extract drum sounds
  • bass - Extract bass guitar
  • vocals - Extract singing voice
  • other - Extract other instruments
  • Or any natural language description like "extract the guitar" or "piano sound"

Input

Click to use example prompts

Input Mixture

Separated Output


Notes

  • The model works best with music audio sampled at 44.1kHz
  • Processing time depends on audio length (segments processed in 6-second chunks)
  • The model was trained on stems: drums, bass, vocals, and other instruments
  • You can use natural language descriptions thanks to CLAP text embeddings