Cards-A3: Difference between revisions

Revision as of 15:55, 27 August 2025

Cards A3 - objects

Maps

There is little knowledge of what AI really looks like. The maps presented here are an attempt to abstract the different objects that one may come across when entering the world of autonomous and decentralised AI image creation. It can serve as a useful guide to experience what the objects of this world are called, how they connect each other, to communities or underlying infrastructures – perhaps also as an outset for one's own exploration. A distinction between 'pixel space' and 'latent space' can be helpful. That is, what you see from what you do not see.

Latent space refers to the invisible space that exists between the capture of images in datasets and the generation of new images. Images are encoded with 'noise', and the machine then learns how to how to de-code them back into images (aka 'image diffusion'). Contrary to common belief, there is not just one dataset used to make image generation work, but multiple models and datasets to 'upscale' images of low resolution, 'refine' the details in the image, and much more. Behind every model and dataset there is a community and organisation.

Pixel space is where one encounters objects of visual culture. Large-scale datasets are for instance compiled by crawling and scraping repositories of visual culture, such as museum collections. Whereas conventional interfaces for generating images only offer the possibility to 'prompt', interfaces to Stable Diffusion offer advanced parameters, as well as options to train one's own models, aka LoRAs. This demands technical insights into latent space as well as aesthetic/cultural understandings of visual culture (say, of manga, gaming or art).

Both images and LoRAs are organised and shared on dedicated platforms (e.g., Danbooru or CivitAI). The generation of images and use of GPU/hardware can also be distributed to a community of users in a peer-to-peer network (Stable Horde). This points to how models, software, datasets and other objects always also exist suspended between different planes of dependencies - organisational, material, or other.

[Images: 'Our' map(s) - perhaps surround by other maps]

Pixel Space

In pixel space, you find a range of visible objects that a typical user would normally meet. This includes the interfaces for creating images. In conventional interfaces like DALL-E or Bing Image Creator, users prompt in order to generate images. What is particular for autonomous and decentralised AI image generation is that the interfaces have many more parameters and ways to interact with the models that generate the images. It functions more like an 'expert' interface.

In pixel space one finds many objects of visual culture. Apart from the interface itself, this includes both all the images generated by AI, and all the images used to train the models behind. These images are, as described above, used to create datasets, compiled by crawling the internet and scraping images that all belong to different visual cultures – ranging, e.g., from museum collections of paintings to criminal records with mug shots.

Many users also have specific aesthetic requirements to the images they want to generate. Say, to generate images in a particular manga style or setting. The expert interfaces therefore also contains the possibility to combine different models and even to post-train one's own models, also known as a LoRA (Low-Rank Adaptation). When sharing the images on platforms like Danbooru (one of the first and largest image boards for manga and anime) images are typically well categorised – both descriptively ('tight boots', 'open mouth', 'red earrings', etc.) and according to visual cultural style ('genshin impact', 'honkai', 'kancolle', etc.). Therefore they can also be used to train more models.

A useful annotated and categorised dataset – be it for a foundation model or a LoRA – typically involves specialised knowledge of both the technical requirements of model training (latent space) and the aesthetics and cultural values of visual culture itself. For instance of common visual conventions, such as realism, beauty, horror, and also (in the making of LoRAs) of more specialised conventions – say a visual style that an artist or a cultural community want to generate.

Latent space

Prompt

LoRA

GPU

The Graphics Processing Unit (GPU) is an electronic circuit for processing computer graphics. It was designed to add graphical interfaces to computing devices and expand the possibilities for interaction. While commonly used for videogames or 3D computer graphics enterprises, the GPU's particular way of computing have made it an object of interest and necessity for the cryptocurrency rush and, more recently, for training and use of generative AI. One of the more material elements of our map, it is often an invisible hardware, playing a crucial role of translating between visual and textual human-understanding and computer models for prediction (for example, between existing images and the "calculation" of new ones). The GPU is both an object that sits next to a desktop computer, and that populates massive cloud data centres racing towards the latest flagship large language model. In a way, a domestic object, used by enthusiast to play and modify the landscape of AI, and the holy grail of the big tech AI industry.

Currencies

While a familiar object, currencies can take wild shapes. Traditionally, a currency is a material medium (commonly, metal or paper) of exchange: coins and bills are one still a common form of currency. Digital platforms and services, however, have multiplied the forms these exchange objects can take. Videogames and blockchain technology have helped to explode what can be understood as currency, and have subverted the dependants on larger organisations: if traditionally states, banks, and large organisations designed and managed these objects, their digital counterparts tend to be more untamed and sometimes community orientated. Autonomous currencies have always existed, as exchange modes for small communities or explicit counter-objects to hegemonic economies, but their digital versions are easy to set-up, distribute, and share with communities around the world. Within our objects of interest, digital currencies make possible to directly generate images, or commodify the possibility of new ones (for example, as a bounty for the ones creating or fine-tuning models). Immaterial versions of currencies also act as the exchange network for sharing non-hegemonic networks of image generation, adding value and circulation to a larger infrastructure of imaginaries of autonomy.

Stable Horde

Horde AI or Stable Horde is a distributed cluster of GPUs. The project describes itself as a "volunteer crowd-sourced distributed cluster of image and text generation workers". This translates as s network of individual GPU users that "lend" their devices and stored large language models. This means that one can generate an image from any device connected to this network through an interface, e.g. a website through a phone. While the visible effects are the same as using chatGPT, co-pilot or any other proprietary service, the images in this network are "community" generated. The request is not sent to a server farm or a company, but to a user that is willing to share their GPU power and stored models. Haidra, the non-profit associated with HordeAI, seeks to make AI free, open-source, and collaborative, effectively circumventing the reliance on AI bit-tech players.

Projects like Stable Horde/HordeAI offer a glimpse into the possibilities of autonomy in the world of image generation, and offer other ways of volunteering through technical means. In a way, this project inherits some of the ethos of P2P sharing and recursive publics, yet updated for the world of LLMs. The GPU used in this project is (intermittently), part of the HordeAI network, generatig and using the kudos currency.

Hugging Face

Hugging Face initially stated out in 2016 out as a chatbot for teenagers, but is now a (if bot the) collaborative hub for AI development – not specifically targeted AI image creation, but generative AI more broadly (including speech synthesis, text-to-video. image-to-video, image-to-3D, and much more). It attracts amateur developers who use the platform to experiment with AI models, as well as professionals who typically use the platform as an outset for entrepreneurship. By making AI models and datasets available it can also be labelled as an attempt to democratise AI and delink from the key commercial platforms, yet at the same time Hugging Face is deeply intertwined with these companies and various commercial interests.

Hugging face has (as of 2023) an estimated market value of $4,5 billion. It has received high amounts of venture capital from Amazon, Google, Intel, IBM, NVIDIA and other key corporation in AI and has, because of the company's expertise in handling AI models at large scale, also collaborations with both Meta and Amazon Web Services. Yet, at the same time, it also remains a platform for amateur developers and entrepreneurs who use the platform as an infrastructure for experimentation with advanced configurations that the conventional platforms do not offer.

Hugging Face is a key example of how generative AI - also when seeking autonomy – depends on a specialised technical infrastructure. In a constantly evolving field reliability, security, scalability and adaptability become important parameters, and Hugging Face offers this in the form of a platform.

IMAGES: Business + interface/draw things + mgane + historical/contemporary interfaces.

@@ Line 33: / Line 33: @@
 The Graphics Processing Unit (GPU) is an electronic circuit for processing computer graphics. It was designed to add graphical interfaces to computing devices and expand the possibilities for interaction. While commonly used for videogames or 3D computer graphics enterprises, the GPU's particular way of computing have made it an object of interest and necessity for the cryptocurrency rush and, more recently, for training and use of generative AI. One of the more material elements of our map, it is often an invisible hardware, playing a crucial role of translating between visual and textual human-understanding and computer models for prediction (for example, between existing images and the "calculation" of new ones). The GPU is both an object that sits next to a desktop computer, and that populates massive cloud data centres racing towards the latest flagship large language model. In a way, a domestic object, used by enthusiast to play and modify the landscape of AI, and the holy grail of the big tech AI industry.
-=== '''Currency''' ===
+=== '''Currencies''' ===
 While a familiar object, currencies can take wild shapes. Traditionally, a currency is a material medium (commonly, metal or paper) of exchange: coins and bills are one still a common form of currency. Digital platforms and services, however, have multiplied the forms these exchange objects can take. Videogames and blockchain technology have helped to explode what can be understood as currency, and have subverted the dependants on larger organisations: if traditionally states, banks, and large organisations designed and managed these objects, their digital counterparts tend to be more untamed and sometimes community orientated. Autonomous currencies have always existed, as exchange modes for small communities or explicit counter-objects to hegemonic economies, but their digital versions are easy to set-up, distribute, and share with communities around the world. Within our objects of interest, digital currencies make possible to directly generate images, or commodify the possibility of new ones (for example, as a bounty for the ones creating or fine-tuning models). Immaterial versions of currencies also act as the exchange network for sharing non-hegemonic networks of image generation, adding value and circulation to a larger infrastructure of imaginaries of autonomy.