Diffusion: Difference between revisions

Revision as of 10:22, 17 July 2025

What is the network that sustains this object?

Rather than a single object, diffusion is treated here as a network of meanings that binds together a technique from physics (diffusion), an algorithm for image generation, a model (Stable Diffusion), an operative metaphor relevant to cultural analysis and by extension a company (Stability AI) and its founder with roots in hedge fund investment.

In her text Diffused Seeing, Joanna Zylinska aptly captures the multivalence of the term:

... the incorporation of ‘diffusion’ as both a technical and rhetorical device into many generative models is indicative of a wider tendency to build permeability and instability not only into those models’ technical infrastructures but also into our wider data and image ecologies. Technically, ‘diffusion’ is a computational process that involves iteratively removing ‘noise’ from an image, a series of mathematical procedures that leads to the production of another image. Rhetorically, ‘diffusion’ operates as a performative metaphor – one that frames and projects our understanding of generative models, their operations and their outputs. ^[1]

From physics to AI, the diffusion algorithm

Our first move in this network of meanings is to follow the trajectory of the concept of diffusion from the 19th century laboratory to the computer lab. If diffusion had been studied since antiquity, Adolf Fink published the first laws of diffusion" based on his experimental work in 1855. As Stanford AI researchers Russakovsky et al put it:

"In physics, the diffusion phenomenon describes the movement of particles from an area of higher concentration to a lower concentration area till an equilibrium is reached [1]. It represents a stochastic random walk of molecules"

To understand how this idea has been translated in image generation, it is worth looking at the example given by Sohl-Dickstein and colleagues (2015) who authored the seminal paper on diffusion in image generation. The author propose the following experiment: take an image and gradually apply noise to it until it becomes totally noisy; then train an algorithm to “learn” all the steps that have been applied to the image and ask it to apply them in reverse to find back the image (see #figure 1). By introducing some movement in the image, the algorithm detects some tendencies in the noise. It then gradually follows and amplifies these tendencies in order to arrive to a point where an image emerges. When the algorithm is able to recreate the original image from the noisy picture, it is said to be able to de-noise. When the algorithm is trained with billions of examples, it becomes able to generate an image from any arbitrary noisy image. And the most remarkable aspect of this process is that the algorithm is able to generalise from its training data: it is able to de-noise images that it never “saw” during the phase of training.

Another aspect of diffusion in physics is of importance in image generation can be seen at the end of the definition of the concept as stated in Wikipedia (emphasis is ours):

diffusion is the movement of a substance from a region of high concentration to a region of low concentration without bulk motion^[2]

Diffusion doesn't capture the movement of a bounded entity (a bulk, a whole block of content), it is a mode of spreading that flexibly accommodates structure. "Diffusion" is the gradual movement/dispersion of concentration within a body with no net movement of matter"^[2]. This characteristics makes it particularly apt at capturing multi level relations between image parts without having to identify a source that constraints these relations. Access to implicit structure. Metaphorically this can be compared to a process of looking for faces in clouds (or reading signs in tea leaves). We do not see immediately a face in a cumulus, but the faint movement of the mass stimulates our curiosity until we gradually delineate the nascent contours of a shape we can begin to identify. Notice the emphasis on the virtual in the image theory underlying image generation. Noise is the precondition for the generation of any image because it virtually contains all images that can be actualised through a process of decryption or de-noising. To generate an image is not a process of creation but of actualisation (of an image that already exists virtually).

the process of adding noise goes from left to right and the de-noising runs the process backwards to obtain the spiral back from noise.(Sohl-Dickstein et al., 2015) — Figure 1. The process of adding noise goes from left to right and the de-noising runs the process backwards to obtain the spiral back from noise.(Sohl-Dickstein *et al.*, 2015)

Stabilizing diffusion

From algorithm to software. No user deals directly with diffusion. It is encapsulated into software. There is a whole architecture that mediates between the algorithm and its environment. See diagram. Stable Diffusion as an algorithm that encapsulates the diffusion algorithm and makes it tractable at scale. It is also experienced as a trained instance. It is hard to disentangle the model from the training data. See prompting and the feel for the model's singularity. The foundational model called Stable diffusion builds on the diffusion algorithm but supplements it with other techniques and components. See High-Resolution Image Synthesis with Latent Diffusion Models.^[3] For instance, an important step in the adoption of the diffusion technique is its translation in the latent space by Rombach et al. By porting diffusion in matrix of embeddings, the authors manage to reduce the computational cost of training and inference. Therefore popularize the use of the technique, make it accessible to a larger community of developers. It also adds important features to the process of image synthesis:

By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner.

Diffusion can be guided by text prompts and other forms of conditioning inputs such as images. Opening it up to multiple forms of manipulation and use such as inpainting. It stabilizes diffusion in the sense that it allows for different forms of control. The diffusion algorithm in itself doesn't contain any guidance. This is an important step to move the algorithm outside of the worlds of github and tech tutorials into a domain where image makers can experiment with it. The pure algorithm cannot move alone. It needs an architecture and an interface.

Once in circulation, it moves both as a product, now accessible to users but also as a metaphor and within a set of ested metaphors that include the brain as computer, concepts such as 'hallucinations' or deep 'dreams' that respond to a more general cultural condition. As Zylinska notes:

We could perhaps suggest that generative AI produces what could be called an ‘unstable’ or ‘wobbly’ understanding – and a related phenomenon of ‘shaky’ perception. Diffusion, to be discussed in the section that follows, can be seen as an imaging template for this model.^[1]
instability as the organising concept and technology for the emergence of our picture of the world. Indeed, it is not just the perception of images but their very constitution that is fundamentally unstable,^[1]

A general condition of instability due to the extensive disruptions brought on by the flows of capital. Stable Diffusion is very much part of this condition. The company Stability AI, funded by former edge fund manager Emad Mostaque, helped finance the transformation of the "technical" product into a software available to users and powered by an expensive infrastructure. And also sold as a service. To access large scale computing power, Mostaque raised $100 millions in venture capital (Wiggers 2022). His experience in the financial sector helped convince donors and secure the financial base. The investment was sufficient to give a chance to Stability to enter the market. Moving from tech demo to service required to ground the diffusion algorithm into another material environment. Amazon servers, the JUWELS Booster supercomputer, tailor made data centers around the world. This dispersion of the infrastructure corresponds with the global distribution of the company's legal structure: one lg in the UK and one leg in Delaware. The latter offering a welcoming tax environment for companies. Dense networks of investors and servers to supplement code. Risk investment. Bringing a long a string of controversies and lawsuits, especially for copyright infringement.

Stabilizing diffusion means a huge range of problems happening simultaneously requiring extremely different skills and competences such as identifying faulty GPUs, decide on batch sizes in training, and the impact of different floating-point formats on training stability, securing investment and managing delays in payment, pushing against legal actions, and aligning prompts and images.

Pictet group advertises its services with the rhetoric of stability, a response to global instability.

Ironically reminiscent of the framing of Stability as a selling point for Mostaque's former employer, the Pictet group. How to achieve stability in a turbulent world? By engaging with turbulence and avoiding its effect, diverting its effects onto others. Example of the ad on the former employer of Emad Mostaque.

How does it evolve through time?

Evolution of size, number of parameters, size of model. Multiple branches and versions, remixes.

How does it create value? Or decrease / affect value?

The question of value needs to be addressed at different levels as we have chosen to treat diffusion as a complex of techniques, algorithm, software, metaphors and finance. How is value expressed in the different aspects of this constellation?

First as a model concretised in a material form: the weights. The model is at the core of a series of online platforms that monetize access to the model. With a subscription fee, users can generate images. Its value stems from the model's ability to generate images in a given style (ie Midjourney), with a good prompt adherence, etc. It is a familiar value form for models: AI as a service. This can be expressed in the form of revenue, the size of a userbase etc.

As the model is open source, it also can be shared and used in different ways. For instance, users can use the model locally without paying a fee to Stability AI as it is open source. It can also be integrated in peer-to-peer systems of image generation such as Stable Horde or shared installations through non-commercial APIs. In this case, the model gains value with adoption. And as interest grows, users start to build things with it. LoRAs, bespoke models, other forms of conditioning. Through this burgeoning activity, the model's affordances are growing. It gains a form of reputation and enters different economies of attention where users gain visibility by tweaking it, or generating 'great art'.

Gains value by comparison. In parallel, in scientific circles, the model's value is measured by different metrics. Computer vision papers, comparative graphs: state of the art vs our method. Ability to do what cannot be done by others or less well. "inversion" (or the ability to flexibly transform an image attribute without unwanted changes) Multiple modalities. Speed. Authors such as Sohl Dickstein or Rombach have gained in reputation that can be evaluated through citation index.

Decreases value of the singular image. Increases value of the image ensemble. To learn how to generate images, algorithms such as Stable Diffusion or Imagen need to be fed with examples. These images are given to the algorithm one by one. Through its learning phase, the algorithm treats them as one moment of an uninterrupted process of variation, not as singular specimens. At this level, the process of image generation is radically anti-representational. It treats the image as a mere moment (“quelconque”^[4]), a variation among many. But the model gains singularity.

In the stable diffusion ecosystem, the ability to experiment is one of the highest value^[5]. The dynamics instilled by the project are well captured by Patrick Esser, a lead researcher on diffusion algorithms, who defined his ideal contributor as someone who would “not overanalyze too much” and “just experiment” (Jennings 2022). The project’s politics of openness was motivated by the realization that its ambitions exceeded the narrow goal of crafting a good product:

“It’s not that we're running out of ideas, we’re mostly running out of time to follow up on them all. By open sourcing our models, there's so many more people available to explore the space of possibilities.” (Jennings 2022)

What is its place/role in techno cultural strategies?

As a concept that traverses multiple dimensions of culture and technology, diffusion begs questions about strategies operating on different planes. In that sense, it constitutes an interesting lens to discuss the question of the democratization of generative AI. As a premise, we adopt the view^[6] that the relation between genAI and democracy cannot be reduced either as one of apocalypse where artificial intelligence signals the end of democracy nor that we inevitably move towards a better optimized future where a more egalitarian world emerges out of technical progress. Both democracy and genAI are unaccomplished projects, risky works in progress.

Our aesthetic critique maintains that any democratic implication of GAI images—including their use for propaganda,spread of disinformation, perpetuation of discriminatory stereotypes, and challenges to authorship, authenticity,originality, etc.—should be understood through the context of how the modern democratization of art as a means of cultural production situates the aestheticization of politics within democracy itself (Benjamin 2007; Park 2024).

(PDF) Democratization and generative AI image creation: aesthetics, citizenship, and practices. Available from: https://www.researchgate.net/publication/384841612_Democratization_and_generative_AI_image_creation_aesthetics_citizenship_and_practices [accessed Jul 17 2025].

It is a game changer when it comes to the democratization of generative AI:

accessibility to a concrete resource, the model's weights without fee.
different forms of knowledge about AI: papers, code, tutorials
different levels of engagement: as a user of a service, as a dataset curator, as a LoRA creator, as a Stable Horde node manager etc
freedom of use in the sense that the platform's censorship is either absent or can be bypassed locally

How does it relate to autonomous infrastructure?

It enables the distribution of the infrastructure. Not just as an object but through all its dimensions.

Different forms of dependencies.

References

↑ ^1.0 ^1.1 ^1.2 https://mediatheoryjournal.org/2024/09/30/joanna-zylinska-diffused-seeing/
↑ ^2.0 ^2.1 https://en.wikipedia.org/wiki/Diffusion
↑ https://arxiv.org/abs/2112.10752
↑ For a discussion of the difference between privileged instants and “instants quelconques” see Deleuze’s theory of cinema, in particular https://www.webdeleuze.com/textes/295 (find translation)
↑ Open Infrastructure article
↑ Discussed here in details http://dx.doi.org/10.1007/s00146-024-02102-y

Unused references

From "Maps": _Secondly, there is a 'latent space'. Image latency refers to the space in between the capture of images in datasets and the generation of new images. It is a algorithmic space of computational models where images are, for instance, encoded with 'noise', and the machine then learns how to how to de-code them back into images (aka 'image diffusion')._

[:1-1] 1.0 ^1.1 ^1.2 https://mediatheoryjournal.org/2024/09/30/joanna-zylinska-diffused-seeing/

[:0-2] 2.0 ^2.1 https://en.wikipedia.org/wiki/Diffusion

[3] ttps://arxiv.org/abs/2112.10752

[4] For a discussion of the difference between privileged instants and “instants quelconques” see Deleuze’s theory of cinema, in particular https://www.webdeleuze.com/textes/295 (find translation)

[5] Open Infrastructure article

[6] Discussed here in details http://dx.doi.org/10.1007/s00146-024-02102-y

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 59: / Line 59: @@
 As a concept that traverses multiple dimensions of culture and technology, diffusion begs questions about strategies operating on different planes. In that sense, it constitutes an interesting lens to discuss the question of the democratization of generative AI. As a premise, we adopt the view<ref>Discussed here in details http://dx.doi.org/10.1007/s00146-024-02102-y</ref> that the relation between genAI and democracy cannot be reduced either as one of apocalypse where artificial intelligence signals the end of democracy nor that we inevitably move towards a better optimized future where a more egalitarian world emerges out of technical progress. Both democracy and genAI are unaccomplished projects, risky works in progress.
-Our aesthetic critique maintains that any democratic implication of GAI images—including their use for propaganda,spread of disinformation, perpetuation of discriminatory stereotypes, and challenges to authorship, authenticity,originality, etc.—should be understood through the context of how the modern democratization of art as a means of cultural production situates the aestheticization of politics within democracy itself (Benjamin 2007; Park 2024).
+ Our aesthetic critique maintains that any democratic implication of GAI images—including their use for propaganda,spread of disinformation, perpetuation of discriminatory stereotypes, and challenges to authorship, authenticity,originality, etc.—should be understood through the context of how the modern democratization of art as a means of cultural production situates the aestheticization of politics within democracy itself (Benjamin 2007; Park 2024).
 (PDF) Democratization and generative AI image creation: aesthetics, citizenship, and practices. Available from: <nowiki>https://www.researchgate.net/publication/384841612_Democratization_and_generative_AI_image_creation_aesthetics_citizenship_and_practices</nowiki> [accessed Jul 17 2025].