Maps

Mapping decentralised AI image generation

If one considers generative AI as an object, there is also a world of ‘para objects’ (surrounding AI and shaping its reception and interpretation) in the form of maps or diagrams of AI. They are drawn by both amateurs and professionals who need to represent processes that are otherwise sealed off in technical systems, but more generally reflect a need for abstraction – a need for conceptual models of how generative AI functions. However, as Alfred Korzybski famously put it, one should not confuse the map with the territory: the map is not how reality is, but a representation of reality.

Following on from this, mapping the objects of interest in autonomous AI image creation is not to be understood as essential, a map of what it is. Rather, it is a map of encounters of objects; encounters that can be documented and catalogued, but also positioned in a spatial dimension – representing an experience of what objects are called, how they look, how they connect to other objects, communities or underlying infrastructures. Perhaps, the map can even be used by others to navigate autonomous generative AI and create their own experiences.

A fundamental map to enter the world of autonomous AI image generation is a map that separates the territories of ‘pixel space’ from ‘latent space’ – the objects you see, from those who cannot be seen. In a pixel space, you would find both the images that are generated by a prompt (a textual input), as well as the many images that are used to compile the textually annotated data set used for training the image generation models.

A diagram of AI image generation separating 'pixel space' from 'latent space' - what you see, and what cannot be seen.

Latent space is more complicated to explain. It relies on computational models that can encode images with noise (using a 'Variational Autoencoder', VAE), and learn how to de-code them back into images, thereby giving the model the ability to generate new images using image diffusion. It therefore deeply depends on both the prompt and the dataset (in pixel space).

Apart from pixel space and latent space, there is also a territory of objects that can be seen, but you typically as a user do not. For instance, in Stable Diffusion you find LAION, a non-profit organization that uses the software Clip to scrape the internet for textually annotated images to generate a free and open-source data set for training models in latent space. You would also find communities who contribute to LAION, or who refine the models of latent space using so-called LoRAs, but also models and datasets to, for instance, reconstruct missing facial or other bodily details (such as too many fingers on one hand) – often with both specialised knowledge of the properties of the foundational diffusion models, and of the visual culture they feed into (for instance manga or gaming). These communities are also organized on different platforms, such as CivitAI or Hugging Face, where communities can exhibit their specialised image creations or share their LoRAs, often with the involvement of different tokens or virtual currencies.

In every level, there is also a dependency on a material infrastructure of computers with heavy processing capability to generate images as well as develop and refine the diffusion models. This relies on energy consumption, and not least on GPUs. In Stable Diffusion, people who generate images or develop LoRAs can have their own GPU (built into their computer or specifically acquired), but they can also benefit from a distributed network, allowing them to access other people’s GPUs, using the so-called Stable Horde. That is, there is a different plane of material infrastructure underneath.

There are also other planes that are not captured in the map, such as for instance a plane of governance and regulation that the map does not currently reflect, but is equally important. For instance, The EU AI Act, laws on copyright infringement, or different policies for AI innovation building on Stable Diffusion and communities like Hugging Face.

There are many other maps each in their own way build conceptual models of generative AI. In this sense, maps and cartographies are not just shaping the reception and interpretation of genrative AI, but can also be regarded as objects of interest in themselves and intrinsic parts of AI’s existence: generative AI depends on cartography to build, shape and also negotiate and criticise its being in the world. The collection of cartopgraphies and maps is in this sense also what makes AI a reality.

There are for instance maps

Other maps…. And a map of the process of producing a pony image