Maps

Mapping 'objects of interest and necessity'

If one considers generative AI as an object, there is also a world of ‘para objects’, surrounding AI and shaping its reception and interpretation in the form of maps or diagrams of AI. They are drawn by both amateurs and professionals who need to represent processes that are otherwise sealed off in technical systems, but more generally reflect a need for abstraction – a need for conceptual models of how generative AI functions. However, as Alfred Korzybski famously put it, one should not confuse the map with the territory: the map is not how reality is, but a representation of reality.

Following on from this, mapping the objects of interest in autonomous AI image creation is not to be understood as a map of what it 'really is'. Rather, it is a map of encounters of objects; encounters that can be documented and catalogued, but also positioned in a spatial dimension – representing a 'guided tour', and an experience of what objects are called, how they look, how they connect to other objects, communities or underlying infrastructures (see also Objects of interest and necessity). Perhaps, the map can even be used by others to navigate autonomous generative AI and create their own experiences. But, importantly to note, what is particular about the map of this catalogue of objects of interest and necessity, is that it purely maps autonomous and decentralised generative AI. It is therefore not to be considered a plane that necessarily reflects what happens in, for instance, Open AI og Google's generative AI systems. The map is in other words not a map of what is otherwise concealed in generative AI. In fact, we know very little of how to navigate proprietary systems, and one might speculate if there even exists a complete map of their relations and dependencies.

Perhaps because of this lack of view, maps and cartographies are not just shaping the reception and interpretation of generative AI, but can also be regarded as objects of interest and necessity in themselves, and intrinsic parts of AI’s existence: generative AI depends on an abundance of cartography to model, shape, navigate, and also negotiate and criticise its being in the world. There seems to be an inbuilt need to 'map the territory', and the collection of cartographies and maps is therefore also what makes AI a reality – making AI real by externalising its abstraction onto a map, so to speak

A map of 'objects of interest and necessity' (autonomous AI image generation)

To enter the objects of autonomous AI image generation, a map that separates the territories of ‘pixel space’ from ‘latent space’ can be useful as a starting point – that is, a map that separates the objects you see from those who cannot be seen because they exist in a more abstract, computational space.

Pixel space

In pixel space, you find a range of visible objects that a typical user would normally meet. This would include, of course, the interface for creating images (using a prompt and other parameters), but also the many images that are generated and part of a cultural practice of generative AI. One would also include the objects (images) of visual culture more generally, as they are used to train the image generation models.

Latent space

Latent space is more complicated to explain. It is, in a sense, a purely computational space that relies on models that can encode images with noise (using a Variational Autoencoder, VAE), and learn how to de-code them back into images. In this process it deeply depends on pixel space – both the prompt itself, but also the dataset, and the many categories and annotations of what an image contains. This 'model training' or technique, called image diffusion, in other words, gives the model the ability to generate new images.

'Back end' space

Apart from pixel space and latent space, there is also a territory of objects that can be seen, but you typically (as a user) do not – a kind of 'back end' space to autonomous AI image generation. For instance, to compile a dataset for training a model in latent space one often 'scrapes' the internet or social media platforms for images (i.e., automatically browsing and building a database of publicly available images). Users generally do not see this and typically only agree to this (if at all) by way of complicated terms of use. Open Images and ImageNet are two examples of datasets that are often used as the backbone of visually training generative AI. In Stable Diffusion, LAION is a key object in this process. It is a non-profit organisation that uses the software Clip to scrape the internet for textually annotated images to generate a free and open-source dataset. Indeed, many of the objects of interest and necessity in autonomous AI image generation are found in this territory. For instance, LoRAs that are used to refine the diffusion models of latent space.

Contrary to common belief, there is not just one dataset used to make the model work. There are also models and datasets to, for instance, reconstruct missing facial or other bodily details (such as too many fingers on one hand). Both LoRAs and other ways of refining the foundational models typically demand specialised knowledge of both the properties of image diffusion (latent space), and of the visual culture they feed into (for instance manga or gaming).

An organisational plane

What is important to realise when navigating this 'back end' territory, is therefore that behind every dataset, model or software lies also a specific community. This points to a different organisational plane of autonomous AI image generation; that is, how the technical objects are dependent on people and communities that develop, sustain and maintain them. For instance, there is a community around LAION that is built on principles of access and openness, to ensure alternatives to Big Tech AI corporations. In this sense, it resembles what the anthropologist Chris Kelty has also labelled a 'recursive publics' – a community that care for and self-maintain the means of its own existence. However, communities can also be organised into online platforms, such as CivitAI which also functions as a market place for AI image creations with the involvement of tokens or virtual currencies.

These organisations, and the technical objects affiliated with them, may potentially also (at the same time) be subject to value extraction. Hugging Face is a prime example of this - a community hub as well as a $4.5 billion company with investments from Amazon, IBM, Google, Intel, and many more; as well as collaborations with Meta and Amazon Web Services. This indicates that there are other dependencies on not only communities, but also on corporate collaboration and venture capital.

A material plane

AI image generation is (and not least) always also dependent on material infrastructures. First of all on hardware and specifically GPUs that are needed to both generate images as well as develop and refine the diffusion models (e.g. developing LoRAs). GPUs can be found in personal computers, but very often individuals and communities will invest in expensive GPUs with high processing capability. Subsequently, people who generate images or develop LoRAs with Stable Diffusion can have their own GPU (built into their computer or specifically acquired), but they can also benefit from a distributed network, allowing them to access other people’s GPUs, using the so-called Stable Horde. This points to how autonomous AI image generation not only depends on, but often also chooses dependencies. For example, to be dependent on the distributed resources of a community, rather than a centralised resource (e.g., a platform in 'the cloud'). At this material plane, there are also other dependencies that one can choose. For instance, energy. Both generating images and tracing models require a massive energy consumption, and the question of how much and from which source (green or black) may be a key factor in choosing Stable Horde over one's own GPU, or a corporate platform. The labour and minerals that go into the production of required hardware can be considered another dependency on this material plane.

Mapping the many different planes and dependencies of generative AI

What is particular about the map of this catalogue of objects of interest and necessity, is that it purely attempts to map autonomous and decentralised generative AI, serving as a map for a guided tour and experience of autonomous AI. However, both Hugging Face' dependency in venture capital and Stable Diffusion's dependency on hardware and infrastructure point to the fact that there are several planes that are not captured in the above map of this catalogue, but which are equally important. For instance, The EU AI Act or laws on copyright infringement, which Stable Diffusion (like any other AI ecology) will also depend on, point to a plane of governance and regulation. AI, including Stable Diffusion, also connects to the depends on the organisation of human labour, or the extraction of resources.

In describing the plane of objects of interest and necessity, we attempt to describe how Stable Diffusion and autonomous AI image generation build on dependencies to these different planes, but an overview of the many planes of AI and how it 'stacks' can of course also be the centre of a map in itself. One example of this is Kate Crawford's Atlas of AI, a book that displays different maps (and also images) that link AI to 'Earth' and the exploition of energy and minerals, or 'Labour' and the workers who do micro tasks ('clicking' tasks) or the workers in Amazon's warehouses. In continuation, Crawford's book also contains chapters on 'Data', 'Classification', 'Affect', 'State' and 'Power'.

Also Gertraud Koch has drawn a map of all the different layers that she connects to "technological activity", and which would also pertain to AI. On top of a layer of technology (the 'data models and algorithms') one will find other layers that are interdependent, and which contribute to the political and technological qualities of AI. As such, the map is also meant for navigation – to identify starting points for rethinking its concepts or reimagining alternative futures (in their work, particularly in relation to a potential delinking from a colonial past, and reimagining a pluriversality of technology)

Within the many planes and stacks of AI one can find many different maps that build other types of overviews and conceptual models of AI – perhaps pointing to how maps themselves take part in making AI a reality.

The corporate landscape

The entrepreneur, investor and pod cast host Matt Turck has made the “ultimate annual market map of the data/AI industry”. Since 2012 hehas documented the corporate landscape of AI not just to identify key corporate actors, but also developments of trends in business. Comparing the 2012 version with the most recent map from 2024, one can see how the corporate landscape how the division of companies dealing with infrastructure, data analytics, applications, data sources, and open source becomes fine grained over the years, and forking out into, for instance, applications in health, finance and agriculture; or how privacy and security become of increased concern in the business of infrastructures – clearly, AI reconfigures and intersects with many other realities. As Matt Turk also notes in his blog, the first map from 2012 has merely 139 logos, whereas the 2024 version has 2,011 logos. This reflects the massive investment in AI entrepreneurship, following first 'big data' and now 'generative AI' (and machine learning) - how AI has become a business reality.

Critical cartography in the mapping of AI

In mapping AI there are also 'counter maps' or 'critical cartography'. Conventional world maps are built on set principles of, for instance, North facing up, and Europe at the centre. The map is therefore not just a map for navigation, but also a map of more abstract imaginaries and histories originating in colonial times, where maps was the outset of Europe and an intrinsic part of the conquest of territories. In this sense, a map always also reflects hierarchies of power and control that can be inverted or exposed (for instance by turning the map upside down, letting the south be a point of departure). Counter-mapping technological territories would, following this logic, involve what the French research and design group Bureau d´Études has called "maps of contemporary political, social and economic systems that allow people to inform, reposition and empower themselves." They are maps that reveal underlying structures of social, political or economic dependencies to expose what ought to be of common interest, or the hidden grounds on which a commons rests. Félix Guattari and Gilles Deleuze' notion of 'deterritorialization' can be useful, here, as a way to conceptualise the practices that expose and mutate the social, material, financial, political, or other organisation of relations and dependencies. The aim is ultimately not only to destroy this 'territory' of relations and dependencies, but ultimately a 'reterritorialization' – a reconfiguration of the relations and dependencies.

Utilising the opportunities of info-graphics in mapping can be a powerful tool. At the plane of financial dependencies, one can map, as Matt Turck, the corporate landscape of AI, but one can also draw a different map that reveals how the territory of 'startups' does not compare to a geographical map of land and continents. Strikingly, The United States is double the size of Europe and Asia, whereas there are whole countries and continents that are missing (such as Russia and Africa). This map thereby not only reflects the number of startups, but also how venture capital is dependent on other planes, such as politics and the organisation of capital, or infrastructural gaps. In Africa, for instance, the AI divide is very much also a 'digital divide', as argued by AI researcher Jean-Louis Fendji.

Counter-mapping the organisation of relations and dependencies is also prevalent in the works of the Barcelona-based artist collective Estampa, which exposes how generative AI depends on different planes: venture capital, energy consumption, a supply chain of minerals, human labour, as well as other infrastructures, such as the internet, which is 'scraped' for images or other media, using e.g. software like Clip).

Epistemic mapping of AI

Maps of AI often also address how AI functions as what Celia Lury has called an 'epistemic infrastructure'. That is, AI is an apparatus that builds on knowledge, creates knowledge, but also shapes what knowledge is and we consider to be knowledge. To Lury, the question of 'methods' here becomes central - not as a neutral, 'objective' stance, as one typically regards good methodology in science, but as a cultural and social practice that help articulate the questions we ask and what we consider to be a problem in the first place. When one for, instance, criticises the social, racial or other biases in generative AI (such as all doctors being white males in generative AI image creation), we are not just dealing with bias in the dataset that can be fixed with 'negative prompts' or other technical means. Rather, AI is fundamentally – in its very construction and infrastructure – based in a Eurocentric history of modernity and knowledge production. For instance, as pointed out by Rachel Adams, AI belongs to a genealogy of intelligence, and one also ought to ask, whose intelligence and understanding of knowledge is modelled within the technology – and whose is left out?

There are several attempts to map this territory in the plane of knowledge production, and its many social, material, political or other relations and dependencies. Sharing many of the concerns of Lury and Adams, Vladan Joler and Matteo Pasquinelli's 'Nooscope' is a good example of this. In their understanding AI belongs to a much longer history of knowledge instruments ('nooscopes', from the Greek skopein ‘to examine, look’ and noos ‘knowledge’) that would also include optical instruments, but which in AI is a form of knowledge magnification of patterns and statistical correlations in data. The nooscope map is an abstraction of how AI functions as "Instrument of Knowledge Extractivism". It is therefore not a map of 'intelligence' and logical reasoning, but rather of a "regime of visibility and intelligibility" whose aim is the automation of labour, and of how this aim rests on (as other capitalist extractions of value in modernity) a division of labour – between humans and technology, between for instance historical biases in the selection and labelling of data, and their formalisation in sensors, databases and metadata. The map also refers to how selection, labelling and other laborious tasks in the training of models is done by "ghost workers" thereby referring to a broader geo-politics and body-politics of AI where human labour is often done by subjects of the Global South (although they might oppose being referred to as 'ghosts').

A map of AI as an instrument of knowledge by Vladan Joler and Matteo Pasquinelli (2020)

++++++++++++++++++++++++++++++++++++++++++++++++++++++

[CARD TEXT – possibly needs shortening]

A Map of 'Objects of Interest and Necessity'

There is little knowledge of what AI really looks like. This might explain the abundance of maps that in each their own way 'maps' AI. Like any of these, a map of 'Objects of Interest and Necessity', is not to be confused with the actual thing - 'the map is not the territory', as the saying goes. The map presented here, is an attempt to abstract the different objects that one may come across when entering the world of autonomous and decentralised AI image creation (and in particular Stable Diffusion). It can serve as a useful guide to experience what the objects of this world are called, how they connect each other, to communities or underlying infrastructures – perhaps also as an outset for one's own exploration.

There is a fundamental distinction between three territories.

Firstly, there is 'pixel space'. In this space one encounters, for instance, all the images that are generated by AI 'prompts') in one of the many user 'interfaces' to Stable Diffusion, but also the many other images of visual culture that serve as the outset (or, 'datasets') for generating AI models. Images are 'scraped' from the Internet or different repositories using software (such as 'Clip') that can automatically capture images and generate categories and descriptions of the images.

Secondly, there is a 'latent space'. Image latency refers to the invisible space in between the capture of images in datasets and the generation of new images. It is a algorithmic space of computational models where images are, for instance, encoded with 'noise', and the machine then learns how to how to de-code them back into images (aka 'image diffusion').

In autonomous and decentralised AI image generation, there is also a third space of objects that are usually not seen. As an intrinsic part of a visual culture communities, for instance, create a so-called 'LoRAs' – a 're-modelling' of latent space, in order to meet specific visual requirements. Both image-creations and LoRAs are usually shared on designated platforms (such as 'CivitAI'). There are many other objects in this space, too, such as 'Stable Horde' that is used to create a distributed network where users can use each other's hardware ('GPUs') for 'virtual currencies' in order to speed up the process of generating images.

The many objects that one would encounter in on or the other of the three 'territories' connect to many different planes. They deeply depend on (and sometimes also reconfigure) material infrastructures, capital and value, the automation and extraction of labour, the organisation of knowledge, politics, regulation, and much more.

Images: 'Our' map -surround by other maps

++++

Not sure how to fit this map (a map of the process of producing a pony image) in this entry. Perhaps another entry?