LoRA

On his personal page on the CivitAI website, the user BigHeadTF promotes his recent creation, a small model called The Incredible Hulk (2008). Compared to earlier movies of the Hulk, the 2008 version shows a tormented Bruce Banner who transforms into a green creature with "detailed musculature, dark green skin, and an almost tragic sense of isolation". The model helps generate characters resembling this iconic version of Hulk in new images.

To demonstrate the capabilities of his model, BigHeadTF has selected a few pictures of his own. Hulk is in turn depicted as cajoling a teddy bear or crossdressing as Shrek's Princess Fiona. The images play with the contrast between Hulk's overblown virility and childlike or female connotations. The images demonstrate the model's ability to expand the hero's universe into other registers or fictional worlds. The Incredible Hulk (2008) doesn't just reproduce faithfully existing images of Hulk, it also opens new avenues for creation and combinations for the green hero.

This blend of pop and remix culture that strives on the blurring of boundaries between genres infuses a large number of creations made with generative AI. However what distinguishes BigHeadTF is that he doesn't share only images but the software component that makes his images distinctive. The model he distributes on his page is called a LoRA. The most famous models such as Stable Diffusion or Flux are rather general-purpose. These 'base' or 'foundation' models can be used to generate images in many styles and can handle a huge variety of prompts. But they may show limitations when a user wants a specific output such as a particular genre of manga, a style that emulates black and white film noir or when an improvement is needed for some details (specific hands positions, etc) or to produce legible text. This is where LoRAs come in. A LoRA is a smaller model created with a technique that makes it possible to improve the performance of a base model on a given task.

What is a LoRA?

Initially developed for LLMs, the Low-Rank Adaptation (LoRA) technique is a fine-tuning method that freezes an existing model and inserts a smaller number of weights to adjust the model's behaviour to a particular need. Instead of a full retraining of the model, LoRAs only require the training of the weights that have been inserted in the model's attention layers. Therefore LoRAs are quite lightweight and able to leverage the capabilities of larger models. Users equipped with a consumer-grade GPU can train their own LoRAs reasonably fast (on a mac M3, a LoRA can be produced in 30 minutes). LoRAs are quite popular within communities of amateurs and developers alike. At the time of writing, the AI platform Hugging Face lists 71,312 LoRAs.

What is the network that sustains this object?

Making a Lora is like baking a cake, a post by knxo on CivitAI, https://civitai.com/articles/138/making-a-lora-is-like-baking-a-cake

The process of LoRA training is very similar to training a model, but at a different scale. Even if it requires dramatically less compute, it still involves the same kind of highly complex technical decisions. In fact, training a LoRA mobilizes the whole network of operation of decentralized image generation and offers a privileged view on its mode of production.

Software dependencies

Various layers of software libraries tame this complexity. A highly skilled user can train a LoRA locally with a series of scripts like kohya_ss and pore through the vertiginous list of options. Platforms like Hugging Face distribute software libraries (peft) that abstract away the complexity of integration of the various components such as LoRAs in the AI generation pipeline. And for those who don't want to fiddle with code or lack access to a local GPU, the option of training LoRA are offered by websites such as Runway ML, Eden AI, Hugging Face or CivitAI for different price schemes.

LoRA as a contact zone between communities with different expertise

'Making a LoRA is like baking a cake', says an widely read tutorial, ' a lot of preparation, and then letting it bake. If you didn't properly make the preparations, it will probably be inedible.' To guide the wannabe LoRA creator in their journey, a wealth of tutorials and documentations in various forms are available from sources such as subreddits, Discord channels, YouTube videos, forums and the platforms that release the code or offer the training and hosting services. They are diverse in tone and they offer varying forms of expertise. A significant portion of this documentation effort consists in code snippets, detailed explanations of parameters and options, bug reports, detailed instructions for the installation of software, tests of hardware compatibility. By professionals, hackers, amateurs, newbies. With access to very different infrastructure. Users that take for granted unlimited access to compute and others with a struggling local installation. This diversity reflects the position of LoRA's in the AI ecosystem. Between expertise and informed amateurism and between resource hungry and consumer grade technology. Whereas foundational model training still remains in the hands of a (happy) few, LoRA training opens up a perspective of democratization of the means of production for those who have time, persistence and a small capital to invest.

Free software libraries and apps
Tutorials and documentations
- https://education.civitai.com/lora-training-glossary/
- https://civitai.com/articles/4/make-your-own-loras-easy-and-free
- https://civitai.com/articles/8310/lora-training-using-illustrious-report-by-a-beginner Since Illustrious learned on danbooru, she has memorized many of the Art Styles of the artists who have contributed to danbooru. Therefore, it is better to check if it is already learned Art Style you want to resemble, so that you do not have to waste your time. Note that I wasted several hours trying to create the Art Style of my favorite doujinshi and eroge illustrators

Curation as an operational practice

There is more to LoRA than the technicalities of installing libraries and training. LoRAs are objects of curation. Many tutorials begin with a primer on dataset curation. Indeed, if a user decides to embark on the adventure of creating a LoRA, the reason is that a model fails to generate convincing images for a given subject or style. The remedy is to create a dataset that provides better samples for a given character, object or genre. Fans, artists and amateurs produce an abundant literature on the various questions raised by dataset curation: the identification of sources, the selection of images (criteria of quality, diversity, etc), the annotation (tagging), scale (LoRAs can be trained on datasets containing as little as one image and can include collections of thousands of images). Curatorial practice very different than for large scale models and broad scraping. Manually picking and scraping. Connecting to established collections. Archival dimension. Prevalence of manga culture and porn.

Technical communities and Manga fans
- https://civitai.com/articles/197/not-maintained-anymore-use-tag-search-instead-arknights-lora-collection-in-alphabetical-order
- https://civitai.com/articles/166/lora-training-guide-rz-passage-21-15

How does it evolve through time?

From the Microsoft lab to platforms and informed amateurs, diversification of offer
Expansion of the image generation pipeline

How does it create value? Or decrease / affect value?

Adding a LoRA to a generation pipeline is made easy by software. However, LoRA training provokes an engagement with the material plane of genAI ecosystem. Either cost, or availability of GPU.

Rewards for the best creative use of a LoRA, https://civitai.com/bounties/8690/5k-crazywhatever

Adds value to the base model. Combined with the LoRA, its capabilities are expanded
Different forms of value creation -> cultural
Huge popularity, there are 66,846 LoRAs available on Hugging Face https://huggingface.co/models?sort=trending&search=lora
The Mass-Produced LoRAs of Civitai https://civitai.com/articles/10068/the-mass-produced-loras-of-civitai
🌟 Do you want your own LoRA created by me? 🌟 https://civitai.com/articles/13480/do-you-want-your-own-lora-created-by-me
Bounties and LoRAs https://civitai.com/articles/16214/new-bounties-and-loras

What is its place/role in techno cultural strategies?

Re-modelling as filling the gaps

Baking a LoRA is a way to deal with one's dependency on existing models. It can be done to fill some gap in the model. For instance, the model might not be able to generate convincingly a given anime character.

Changing the representation of a female artist in the model. Example of Louise Bourgeois experiment.

An image of Louise Bourgeois with the Real Vision model
Louise Bourgeois (Real Vision)
A screenshot of a search query for Louise Bourgeois
Selected images from the search results
Annotations for the dataset in the Draw Thing interface
An image generated by Real Vision with LoRA
An image generated by Real Vision with LoRA

It can also be done because the model's aesthetics are too present in the output. As a motivation for the creation of the Amateur Snapshot Photo LoRA, the user AI_Characters states:

'This LoRa model was designed to generate images that look more "real" and less "artificially AI-generated by FLUX". It achieves this by making the natural and artificial lighting look more real and making the bokeh less strong. It also adds details to every part of the image, including skin, eyes, hair, and foliage. It also aims to reduce somewhat the common "FLUX chin and skin" and other such issues.' ^[1]

The Flux model is considered by many image makers as a state of the art tool. But as pointed above, it tends to imprint a specific aesthetics on the human characters it generates and produce pictures that are too "clean". Images bear its signature. Amateur Snapshot Photo, like many other LoRAs, tries to mitigate these issues. The creation of the LoRA works both with and against the underlying model. The art of fine-tuning is to select the model that generates the closest results to what one strives to achieve and improve it. But also to limit the model's propensities. This back and forth dialogue with the model requires a highly reflexive understanding of the underlying model's affordances. It requires an intimate knowledge of the model's visual capabilities and its semantics.

Remodelling as rewording

There are different means of annotating images. To select the right one, the annotator must know how the original model has been trained. For photorealistic images, most models have been annotated with a piece of software called BLIP ^{[Footnotes 1]}. BLIP produces descriptions in "natural language" such as "a high resolution photograph of a man sitting on a beach in the moonlight". In the case of LoRAs in anime style, understanding the semantic logic of tagging brings the annotator in the booru universe. Boorus (the word board pronounced in Japanese ) are image boards designed to host collections of animes. Boorus are targets of choice for AI scrapers as they contain huge amounts of images and are frantically annotated by their creators. As Knxo aptly notes:

Danbooru: Danbooru style captioning is based in the Booru tagging system and implemented in all NAI derivatives and mixes which accounts for most SD1.5 non photorealistic models. It commonly appears in the following form "1girl, green_eyes, brown_hair, walking, forest, green_dress, eating, burrito, (sauce)". This tagging style is named after the site, as in https://danbooru.donmai.us/. Whenever you have doubt on the meaning of a tag you can navigate to danbooru, search for the tag and open it's wiki.

Take for example the following, search for the tag "road" when we open it's wiki we will see the exact definition as well as derivative tags like street, sidewalk or alley as well as the amount of times the image has been used(13K). In Practice what this means is that the concept is trained to some degree in NAI based models and mixes. The amount of times the tag appears in danbooru actually correlates to the strength of the training(as NAI was directly trained on Danbooru data). So any concept below 500 coincidences are a bit iffy. Keep that in mind when captioning as sometimes it makes more sense to use a generic tag instead of the proper one, for example "road" appears 13k times while "dirt_road" only does so 395 times. In this particular case using dirt_road shouldn't be problematic as "dirt_road" contains "road" anyway and SD is able to see the association.^[2]

The LoRA's creator skills include a knowledge about the cultures from which the underlying model has learned. For the vocabulary and syntax. And for the comparative weight given to individual concepts learned by the model. The tagging of the LoRA's dataset mirrors and rewords the tagging of the underlying model. This means that the user gradually develops an acute sense of the model's biases (how it weighs more some terms than others, and exclude/ignore terms). Exploiting the bias, reversing the problem, work with it. Even if the object of the annotator's effort might seem superficial (adding yet another LoRA for a character that is already featured in hundreds of others), it represents a form of specialized conceptual labour.