Cards-A3

// THESE are the little text snippets to be used for editing A4 cards and A3 posters - to be used copy/paste-style

NB! USE the Card texts for making cards/posters

// Perhaps this page is to be deleted?

Cards A3 - objects

Objects of Interest and Necessity

Most people’s experiences with generative AI image creation come from platforms like OpenAI’s DALL-E or other services. Nevertheless, there are also communities who for different reasons seek some kind of independence and autonomy from the mainstream platforms. The outset for this catalogue is ‘Stable Diffusion’, a so-called Free and Open Source Software system for AI image creation.

With the notion of an ‘object of interest’ a guided tour of a place, a museum or collection likely comes to mind. One may easily read this compilation of texts as a catalogue for such a tour in a social and technical system, where we stop and wonder about the different objects that, in one way or the other, take part in the generation of images with Stable Diffusion.

'A guided tour' perhaps also limits the understanding of what objects of interest are? In science, for instance, an object of interest sometimes refers to what one might call the potentiality of an object. Take for instance, the famous Kepler telescope whose mission was to search the Milky Way for exoplanets (planets outside our own solar system). Among all the stars, there are candidates for this, or so-called Kepler Objects of Interest (KOI).

In similar ways, this catalogue is the outcome of an investigative process where we – by trying out different software, reading documentation and research, looking into communities of practice that experiment with AI image creation, and more – have sought to understand the things that make generative AI images with Stable Diffusion possible. We have tried to describe not only the objects, but also their underlying dependencies on relations between communities, models, capital, technical units, and more.

Objects, however, also contain an associative power, that literally can create memories and make a story come alive. This catalogue is therefore not just a collection of the objects that makes generative AI images, but an exploration of an imaginary of AI image creation through the collection and exhibition of objects – and in particular, an imaginary of ‘autonomy’ from mainstream capital platforms.

Maps

There is little knowledge of what AI really looks like. The maps presented here are an attempt to abstract the different objects that one may come across when entering the world of autonomous and decentralised AI image creation. It can serve as a useful guide to experience what the objects of this world are called, how they connect each other, to communities or underlying infrastructures – perhaps also as an outset for one's own exploration. A distinction between 'pixel space' and 'latent space' can be helpful. That is, what you see from what you do not see.

Latent space refers to the invisible space that exists between the capture of images in datasets and the generation of new images. Images are encoded with 'noise', and the machine then learns how to how to de-code them back into images (aka 'image diffusion'). Contrary to common belief, there is not just one dataset used to make image generation work, but multiple models and datasets to 'upscale' images of low resolution, 'refine' the details in the image, and much more. Behind every model and dataset there is a community and organisation.

Pixel space is where one encounters objects of visual culture. Large-scale datasets are for instance compiled by crawling and scraping repositories of visual culture, such as museum collections. Whereas conventional interfaces for generating images only offer the possibility to 'prompt', interfaces to Stable Diffusion offer advanced parameters, as well as options to train one's own models, aka LoRAs. This demands technical insights into latent space as well as aesthetic/cultural understandings of visual culture (say, of manga, gaming or art).

Both images and LoRAs are organised and shared on dedicated platforms (e.g., Danbooru or CivitAI). The generation of images and use of GPU/hardware can also be distributed to a community of users in a peer-to-peer network (Stable Horde). This points to how models, software, datasets and other objects always also exist suspended between different planes of dependencies - organisational, material, or other.

Pixel Space

In pixel space, you find a range of visible objects that a typical user would normally meet. This includes the interfaces for creating images. In conventional interfaces like DALL-E or Bing Image Creator, users prompt in order to generate images. What is particular for autonomous and decentralised AI image generation is that the interfaces have many more parameters and ways to interact with the models that generate the images. It functions more like an 'expert' interface.

In pixel space one finds many objects of visual culture. Apart from the interface itself, this includes both all the images generated by AI, and all the images used to train the models behind. These images are, as described above, used to create datasets, compiled by crawling the internet and scraping images that all belong to different visual cultures – ranging, e.g., from museum collections of paintings to criminal records with mug shots.

Many users also have specific aesthetic requirements to the images they want to generate. Say, to generate images in a particular manga style or setting. The expert interfaces therefore also contains the possibility to combine different models and even to post-train one's own models, also known as a LoRA (Low-Rank Adaptation). When sharing the images on platforms like Danbooru (one of the first and largest image boards for manga and anime) images are typically well categorised – both descriptively ('tight boots', 'open mouth', 'red earrings', etc.) and according to visual cultural style ('genshin impact', 'honkai', 'kancolle', etc.). Therefore they can also be used to train more models.

A useful annotated and categorised dataset – be it for a foundation model or a LoRA – typically involves specialised knowledge of both the technical requirements of model training (latent space) and the aesthetics and cultural values of visual culture itself. For instance of common visual conventions, such as realism, beauty, horror, and also (in the making of LoRAs) of more specialised conventions – say a visual style that an artist or a cultural community want to generate.

[IMAGES - SUGGESTIONS - VARIOUS INTERFACES + GENERATED IMAGES FROM CIVIT AI + DATASETS ]

Latent space

Latent space is a highly abstract space consisting of compressed representations of images and texts. A key object is the Variational Autoencoder (VAE) that makes the image-texts available to different kinds of operations – whose results are then decoded back into images. An important operation happening in the latent space is the training of an algorithm. In diffusion-based algorithms, the algorithm is trained by learning to apply noise to an image and then reconstruct an image, from complete or random noise (this process is discussed more in-dept in the entry on diffusion).

To continue our mapping, it is important to note that the latent space is nurtured by various sources. In the process of model training, datasets play a crucial role. Many of the datasets that are used to train models are made by 'scraping' the internet, while others are built on repositories like Instagram, flickr, or Getty images. Open Images and ImageNet are commonly used as the backbone of visually training generative AI, built on web-pages, but corporate organisations like Meta and Google also offer open source datasets, as do e.g., research institutions and others. Contrary to common belief, there is not just one dataset used to make a model work, but multiple models and datasets to, for instance, reconstruct missing facial or other bodily details (such as too many fingers on one hand), 'upscale' images of low resolution or 'refine' the details in the image. LoRAs trained on users own curated datasets are also often used in AI imaging with Stable Diffusion. The latent space is therefore an interpretation of a large pool of visual and textual resources, external to it.

When it comes to autonomous AI imaging, there is typically an organisation and a community behind each dataset and training. LAION (Large-scale-Artificial Intelligence Open Network) is a good example of this, and a very important one. It is a non-profit community organisation that develops and offers free models and datasets. Stable Diffusion was trained on datasets created by LAION, using Common Crawl (another non-profit organisation that has built a repository of 250 billion web pages) and CLIP (OpenAI's neural network which learns visual concepts from natural language supervision) to compile an extensive record of links to images with 'alt text' (a descriptive text for non-text content, created by 'web masters' for increased accessibility) – that is a useful set of annotated images, to be used for model training. We begin to see that a model's dependencies have large organisational, social and technical ramifications.

Prompt

LoRA

GPU

The Graphics Processing Unit (GPU) is an electronic circuit for processing computer graphics. It was designed to add graphical interfaces to computing devices and expand the possibilities for interaction. While commonly used for videogames or 3D computer graphics enterprises, the GPU's particular way of computing have made it an object of interest and necessity for the cryptocurrency rush and, more recently, for training and use of generative AI. One of the more material elements of our map, it is often an invisible hardware, playing a crucial role of translating between visual and textual human-understanding and computer models for prediction (for example, between existing images and the "calculation" of new ones). The GPU is both an object that sits next to a desktop computer, and that populates massive cloud data centres racing towards the latest flagship large language model. In a way, a domestic object, used by enthusiast to play and modify the landscape of AI, and the holy grail of the big tech AI industry.

Currencies

While a familiar object, currencies can take wild shapes. Traditionally, a currency is a material medium (commonly, metal or paper) of exchange: coins and bills are one still a common form of currency. Digital platforms and services, however, have multiplied the forms these exchange objects can take. Videogames and blockchain technology have helped to explode what can be understood as currency, and have subverted the dependants on larger organisations: if traditionally states, banks, and large organisations designed and managed these objects, their digital counterparts tend to be more untamed and sometimes community orientated. Autonomous currencies have always existed, as exchange modes for small communities or explicit counter-objects to hegemonic economies, but their digital versions are easy to set-up, distribute, and share with communities around the world. Within our objects of interest, digital currencies make possible to directly generate images, or commodify the possibility of new ones (for example, as a bounty for the ones creating or fine-tuning models). Immaterial versions of currencies also act as the exchange network for sharing non-hegemonic networks of image generation, adding value and circulation to a larger infrastructure of imaginaries of autonomy.

Stable Horde

Horde AI or Stable Horde is a distributed cluster of GPUs. The project describes itself as a "volunteer crowd-sourced distributed cluster of image and text generation workers". This translates as s network of individual GPU users that "lend" their devices and stored large language models. This means that one can generate an image from any device connected to this network through an interface, e.g. a website through a phone. While the visible effects are the same as using chatGPT, co-pilot or any other proprietary service, the images in this network are "community" generated. The request is not sent to a server farm or a company, but to a user that is willing to share their GPU power and stored models. Haidra, the non-profit associated with HordeAI, seeks to make AI free, open-source, and collaborative, effectively circumventing the reliance on AI bit-tech players.

Projects like Stable Horde/HordeAI offer a glimpse into the possibilities of autonomy in the world of image generation, and offer other ways of volunteering through technical means. In a way, this project inherits some of the ethos of P2P sharing and recursive publics, yet updated for the world of LLMs. The GPU used in this project is (intermittently), part of the HordeAI network, generatig and using the kudos currency.

Hugging Face

Hugging Face initially stated out in 2016 out as a chatbot for teenagers, but is now a (if bot the) collaborative hub for AI development – not specifically targeted AI image creation, but generative AI more broadly (including speech synthesis, text-to-video. image-to-video, image-to-3D, and much more). It attracts amateur developers who use the platform to experiment with AI models, as well as professionals who typically use the platform as an outset for entrepreneurship. By making AI models and datasets available it can also be labelled as an attempt to democratise AI and delink from the key commercial platforms, yet at the same time Hugging Face is deeply intertwined with these companies and various commercial interests.

Hugging face has (as of 2023) an estimated market value of $4,5 billion. It has received high amounts of venture capital from Amazon, Google, Intel, IBM, NVIDIA and other key corporation in AI and has, because of the company's expertise in handling AI models at large scale, also collaborations with both Meta and Amazon Web Services. Yet, at the same time, it also remains a platform for amateur developers and entrepreneurs who use the platform as an infrastructure for experimentation with advanced configurations that the conventional platforms do not offer.

Hugging Face is a key example of how generative AI - also when seeking autonomy – depends on a specialised technical infrastructure. In a constantly evolving field reliability, security, scalability and adaptability become important parameters, and Hugging Face offers this in the form of a platform.

IMAGES: Business + interface/draw things + mgane + historical/contemporary interfaces.