Getting started with AudioCraft

Aug 20, 2023

What is AudioCraft?

AudioCraft is a PyTorch library for deep learning research on audio generation. AudioCraft contains inference and training code for two state-of-the-art AI generative models producing high-quality audio: AudioGen and MusicGen.

AudioCraft Github

Similar to text generation with GPT-4 and image generation with Stable Diffusion, there are new mediums emerging in Audio and Music. There are plenty other music generation projects like Riffusion and MusicLM.

If you are familiar with these other ML mediums you will find there is similar overlap in how the tools are emerging as well as bottlenecks.

audiocraft plus

I found a fork of AudioCraft, aptly named “AudioCraft Plus” which has a very good Gradio UI similar to stable-diffusion-webui:

Prompt sequencing

You might want to create 3-4 prompts and have them be divided up into sections, this gives you more control over the structure of the song.

High level prompting

Add global prompts or set the BPM or key signature.

Temperature and Config Scale

Similar to Stable Diffusion, there are plenty of knobs to tweak which have an interesting effect on music. I’ve had a lot of good and bad results with different settings.

Audio input

You can input audio from your computer and either sample the melody or audio. You can edit the audio clip in the UI as well.

If you forget what your prompt is you can always upload the track and the UI will extract it.

Performance

This is the biggest hurdle right now. Although it runs on a GPU it is currently CPU only on Mac. Running on HuggingFace will cost you ~1 dollar an hour though. For me I found 2 minutes to be the sweet spot for an interesting clip of music, and its something that doesn’t happen usually on the first take.

I’m tracking the Mac performance issues and they are similar to most other ML Mac problems: PyTorch. In the meantime I will be keeping an eye on GGML to see if a Mac quantized version of the models are available.

Training

And yes you can train your own models, which is a mountain I haven’t climbed yet but the Github project has some resources on it

Next Garage Band?

I think this simplifies music software perhaps more than other mediums. I can type “miles davis guitar” and its going to have some influence but not copy it. This music is commercial free remember, but who is making any money on music anyways? 😂With the barrier to create music lower it will level the playing field in the music industry. More importantly I think allow people to greater levels of self expression, aka good vibes!

Here is my SoundCloud where I will be posting the best tunes I get out of MusicGen, check it out and feel free to make some requests! https://soundcloud.com/trumpectetera

Matt’s Substack

Discussion about this post