In this article, we will learn about LoRA for Stable Diffusion, its amazing benefits and how to install and train it.
The release of the Large text-to-image model Stable Diffusion generated waves in Generative AI space. However, it did not have the ability to generate personalized, contextual images. Here enters DreamBooth, which enables the generation of customised, contextualised images.
Revisiting DreamBooth
Given few subject images as input, a pre-trained text-to-image model like stable diffusion is fine-tuned in a way that it binds a unique identifier with the subject. The subject can be a person or any object like a particular car model.
This fine-tuning process can work given only 3-5 subject images, making it easier and more accessible. By fine-tuning a stable diffusion model using DreamBooth we can generate different images of the subject instance. In different environments, with high preservation of subject details and realistic interaction between the scene and the subject.
What is Low-Rank Adaptation (LoRA)?
Low-rank adaptation, or LoRA, of large language models is a method to accelerate the training process of large models while consuming less memory.
Low-Rank Adaptation (LoRA) is a Parameter Efficient Fine Tuning (PEFT) technique that modifies the pre-trained model’s attention mechanism, and it significantly reduces the number of trainable parameters.
There are many dense layers in a neural network that perform matrix multiplication. Lora updates the weight matrix based on the hypothesis that updates to these weights during fine-tuning have a low “intrinsic rank” during adaptation. So Lora freezes the pre-trained weights and constraints its update matrix by representing it in a low-rank decomposition.
W0 + ∆W = W0 + BA, where B ∈ Rd×r , A ∈ Rr×k, and the rank r ->min(d, k)
During training, W0 is frozen and does not receive gradient updates, while A and B contain trainable parameters. Note both W0 and ∆W = BA are multiplied with the same input, and their respective output vectors are summed coordinate-wise. For h = W0x, our modified forward pass yields:
h = W0x + ∆W x = W0x + BAx
Benefits of LoRA
LoRA can reduce memory and storage usage. You can share and use a pre-trained model to build many small LoRA modules for different tasks. It makes training more efficient and lowers the hardware barrier to entry by up to 3 times when using adaptive optimizers.
LoRA is orthogonal to many prior methods and can be combined with many of them, such as prefix-tuning.
The biggest benefit of fine-tuning Stable Diffusion with LoRA is it trains new subjects and concepts in minutes. The trained outputs of Lora are much smaller than DreamBooth outputs. This makes the trained model easy to share and store. (In the size of megaBytes instead of gigabytes)
Installation & Training
We’ll use the kohya-trainer repository to train our stable diffusion model in colab. Automatic111 web UI extension provides an easy way to train stable diffusion with Lora by downloading webui extensions(of git repositories). As Colab has restricted the use of Automatic1111 under the free version of Colab. It can be used if you are running on a local GPU.
Step 1: Install Dependencies and choose the model version that you want to fine-tune. I have used model_name: Stable-Diffusion-v1-5.
Step 2: Upload the compressed dataset file on which you want to fine-tune the stable diffusion model. You can directly upload the dataset in the directory or upload the dataset to google drive and mount your google drive to Colab. Run the data directory block and Unzip the dataset block.
Step 3: Run BLIP Captioning block which generates captions to the uploaded image dataset. And stores them in the train_data directory.
Step 4: Fill in the Project name and copy the path to the pre-trained models safetensors file from a directory to fill pretrained_model_name_or_path as shown in the following image. Run all the code blocks in the Training Model section to train the model with Lora.
Step 5: After training, test the pre-trained model by running code blocks under the testing section. You can visualize loss over the training process, test pre-trained models with various prompts and even launch a portable web-ui by running the blocks
I urge you to check and change the parameter options based on the data you are using to fine-tune the model. Though the following image generated is not of the exact dog in the training set, the model got features like size, colour, and neckband of the dog right.
I trained for only 70 steps/ 10 epochs. Training for more epochs will improve the model to generate the exact image of the dog.
Conclusion
Dreambooth helps us teach new concepts to Stable Diffusion Model. In the Kohya Colab implementation, the concept that we made the model learn was about a dog. In addition to Dreambooth, Lora, the textual inversion method is a popular method used to teach new concepts to Stable Diffusion models.
Textual inversion-trained models are also easy to share. However, fine-tuning with LoRA offers the advantage of being usable for general-purpose fine-tuning. The above-mentioned methods improve the personalized image generation process.
Here is an exciting read on how to control the image generation process with ControlNet for Stable Diffusion.