ML Artist Engineer Starter Pack: Image Generation
My goal with these posts is to create a simple overview of how to get started with “ML Artist Engineering”. I’m not necessarily an expert in making the art, but hopefully these guides can help lower the barrier of entry for folks!
Choose your compute style
As you get started you will see two distinct paths with how you go about running projects. Most likely you will end up doing a hybrid of both, so lets go over them.
Cloud
This is probably the best option to get started because there is no up front cost, and plenty different services to try. From my observations I’ll point out two.
HuggingFace is essentially a service that combines Github and GCP/AWS into 1 site. You can create projects and get visibility using their spaces (similar to Github Pages) and run on high end GPUs with a pay as you go structure.
Google Colab is often the easiest path to getting something to work as its tied to a Python notebook. This means the build environment is usually not a concern, and you can run the code in your browser. They offer a plan with GPU credits you can use up. I haven’t done much of a cost breakdown but the most expensive option is around 50 dollars per month currently.
Bare Metal
Running bare metal is a fancy way for saying using your own computer. You probably need some kind of high end GPU. This is where the debate over Macs and PCs is revived. I’m a Mac user, but most of the ML tools I use do not work, or I should say, work incredibly slow because they can only run on the CPU.
With Llama.cpp we are seeing Apple performance on bar with a high end NVIDIA setup. I’m betting that over time support for Apple on things like MusicGen and ZeroScope will improve.
Getting started with SDXL
Stable Diffusion is the state of art text-to-image open source model. With it being open source has unlocked many variations and custom models. The most recent release of Stable Diffusion is a higher quality version that supports larger images, hence the XL. You’ll find plenty of variations on SD on sites like civitai and HuggingFace. The key concept that is new with SDXL is there are two models, a base and refiner one. This complicates things a bit which is why people prefer to use tools like comfy-ui. You’ll also see improved SDXL models that may not require a refiner, or maybe you don’t need the refiner for what you are doing.
stable-diffusion-webui
The emerging standard tool for using Stable Diffusion is stable-diffusion-webui. You can follow instructions there to install. If you are using a newer Mac check out their specific instructions as well: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon.
From there you will want to download models and save them into the Models directory. One of the main reasons to use this UI is to install extensions. Another nice feature is the ability to load an image (See PNG Info) and get the prompt info again.
If you are looking for some cloud based options I’ve put together a short list here:
Colab notebook: https://github.com/camenduru/sdxl-colab
If you are looking for an even more low code option I recommend using this HuggingFace space:
SDXL space: https://huggingface.co/spaces/hysts/SD-XL
Do you have a specific Stable Diffusion UI you like? While stable-diffusion-webui might not be the prettiest, its necessary to have a Gradio based Python UI given how dependent ML is on Python. This isn’t necessarily a requirement though, I imagine there are plenty of better UIs out there, but the extension framework has made it really useful as new tools are invented.