Maps: Difference between revisions

Revision as of 16:30, 1 July 2025

Mapping decentralised AI image generation

If one considers generative AI as an object, there is also a world of ‘para objects’ (surrounding AI and shaping its reception and interpretation) in the form of maps or diagrams of AI. They are drawn by both amateurs and professionals who need to represent processes that are otherwise sealed off in technical systems, but more generally reflect a need for abstraction – a need for conceptual models of how generative AI functions. However, as Alfred Korzybski famously put it, one should not confuse the map with the territory: the map is not how reality is, but a representation of reality.

Following on from this, mapping the objects of interest in autonomous AI image creation is not to be understood as a map of what it 'really is'. Rather, it is a map of encounters of objects; encounters that can be documented and catalogued, but also positioned in a spatial dimension – representing an experience of what objects are called, how they look, how they connect to other objects, communities or underlying infrastructures. Perhaps, the map can even be used by others to navigate autonomous generative AI and create their own experiences.

A fundamental map to enter the world of autonomous AI image generation is a map that separates the territories of ‘pixel space’ from ‘latent space’ – the objects you see, from those who cannot be seen. In a pixel space, you would find both the images that are generated by a prompt (a textual input), as well as the many images that are used to compile the textually annotated data set used for training the image generation models (the foundation models).

A diagram of AI image generation separating 'pixel space' from 'latent space' - what you see, and what cannot be seen.

Latent space is more complicated to explain. It is, in a sense, a purely computational space that relies on models that can encode images with noise (using a 'Variational Autoencoder', VAE), and learn how to de-code them back into images, thereby giving the model the ability to generate new images using image diffusion. It therefore also deeply depends on both the prompt and the dataset in pixel space – to generate images, and to build or train models for image generation.

Apart from pixel space and latent space, there is also a territory of objects that can be seen, but you typically as a user do not. For instance, in Stable Diffusion you find LAION, a non-profit organization that uses the software Clip to scrape the internet for textually annotated images to generate a free and open-source data set for training models in latent space. You would also find communities who contribute to LAION, or who refine the models of latent space using so-called LoRAs, but also models and datasets to, for instance, reconstruct missing facial or other bodily details (such as too many fingers on one hand) – often with both specialised knowledge of the properties of the foundational diffusion models, and of the visual culture they feed into (for instance manga or gaming). These communities are also organized on different platforms, such as CivitAI or Hugging Face, where communities can exhibit their specialised image creations or share their LoRAs, often with the involvement of different tokens or virtual currencies.

In every level, there is also a dependency on a material infrastructure of computers with heavy processing capability to generate images as well as develop and refine the diffusion models. This relies on energy consumption, and not least on GPUs. In Stable Diffusion, people who generate images or develop LoRAs can have their own GPU (built into their computer or specifically acquired), but they can also benefit from a distributed network, allowing them to access other people’s GPUs, using the so-called Stable Horde. That is, there is a different plane of material infrastructure underneath.

There are also other planes that are not captured in the map, such as for instance a plane of governance and regulation that the map does not currently reflect, but is equally important. For instance, The EU AI Act, laws on copyright infringement, or different policies for AI innovation building on Stable Diffusion and communities like Hugging Face.

There are many other maps each in their own way build conceptual models of generative AI. In this sense, maps and cartographies are not just shaping the reception and interpretation of genrative AI, but can also be regarded as objects of interest in themselves and intrinsic parts of AI’s existence: generative AI depends on cartography to build, shape and also negotiate and criticise its being in the world. The collection of cartopgraphies and maps is in this sense also what makes AI a reality.

Mapping other planes of AI

There are for instance maps of the corporate plane of AI, such as entrepreneur, investor and pod cast host Matt Turck’s “ultimate annual market map of the data/AI industry” (https://firstmark.com/team/matt-turck/). Since 2012 Matt Turck has documented the ecosystem of AI not just to identify key corporate actors, but also developments of trends in business. One can see how the division of companies dealing with infrastructure, data analytics, applications, data sources, and open source becomes fine grained over the years, and forking out into, for instance, applications in health, finannce and agriculture; or how privcy and security become of increased concern in the business of infrastructures.

Maps and hierarchies

"counter maps"

Conceptual mapping / maps of maps

Atlas of AI

Joler and Pasquinelli

Other maps…. And a map of the process of producing a pony image

@@ Line 10: / Line 10: @@
 Latent space is more complicated to explain. It is, in a sense, a purely computational space that relies on models that can encode images with noise (using a [['Variational Autoencoder', VAE]]), and learn how to de-code them back into images, thereby giving the model the ability to generate new images using [[Diffusion|image diffusion]]. It therefore also deeply depends on both the [[prompt]] and the [[dataset]] in pixel space – to generate images, and to build or train models for image generation.
-Apart from pixel space and latent space, there is also a territory of objects that can be seen, but you typically as a user do not. For instance, in Stable Diffusion you find [[LAION]], a non-profit organization that uses the software [[Clip]] to scrape the internet for textually annotated images to generate a free and open-source data set for training models in latent space. You would also find communities who contribute to LAION, or who refine the models of latent space using so-called [[LoRA]]<nowiki/>s, but also models and datasets to, for instance, reconstruct missing facial or other bodily details (such as too many fingers on one hand) – often with both specialised knowledge of the properties of the foundational diffusion models, and of the visual culture they feed into (for instance manga or gaming). These communities are also organized on different platforms, such as [[CivitAI]] or [[Hugging Face]], where communities can exhibit their specialised image creations or share their LoRAs, often with the involvement of different tokens or virtual currencies.
+Apart from pixel space and latent space, there is also a territory of objects that can be seen, but you typically as a user do not. For instance, in Stable Diffusion you find [[LAION]], a non-profit organization that uses the software [[Clip]] to scrape the internet for textually annotated images to generate a free and open-source data set for training models in latent space. You would also find communities who contribute to LAION, or who refine the models of latent space using so-called [[LoRA]]<nowiki/>s, but also models and datasets to, for instance, reconstruct missing facial or other bodily details (such as too many fingers on one hand) – often with both specialised knowledge of the properties of the foundational diffusion models, and of the visual culture they feed into (for instance manga or gaming). These communities are also organized on different platforms, such as [[CivitAI]] or [[Hugging Face]], where communities can exhibit their specialised image creations or share their LoRAs, often with the involvement of different tokens or virtual [[currencies]].
 [[File:Map objects and planes.jpg|none|thumb|640x640px|This map reflects the separation of pixel space from latent space, and adds a third layer of objects that are visible, but not seen by typical users. Underneath the three layers one finds a second plane of material infrastructures (such as processing power and electricity), and one can potentially also add more planes, such as for instance regulation or governance of AI]]