Diffusion: Difference between revisions

Revision as of 19:15, 10 July 2025

What is the network that sustains this object?

Rather than a single object, diffusion is treated here as a network of meanings that binds together a technique from physics (diffusion), an algorithm for image generation, a model (Stable Diffusion), an operative metaphor relevant to cultural analysis and by extension a company (Stability AI) and its founder with roots in hedge fund investment.

In her text Diffused Seeing, Joanna Zylinska aptly captures the multivalence of the term:

... the incorporation of ‘diffusion’ as both a technical and rhetorical device into many generative models is indicative of a wider tendency to build permeability and instability not only into those models’ technical infrastructures but also into our wider data and image ecologies. Technically, ‘diffusion’ is a computational process that involves iteratively removing ‘noise’ from an image, a series of mathematical procedures that leads to the production of another image. Rhetorically, ‘diffusion’ operates as a performative metaphor – one that frames and projects our understanding of generative models, their operations and their outputs. ^[1]

From physics to AI, the diffusion algorithm

Our first move in this network of meanings is to follow the trajectory of the concept of diffusion from the 19th century laboratory to the computer lab. If diffusion had been studied since antiquity, Adolf Fink published the first laws of diffusion" based on his experimental work in 1855. As Stanford AI researchers Russakovsky et al put it:

"In physics, the diffusion phenomenon describes the movement of particles from an area of higher concentration to a lower concentration area till an equilibrium is reached [1]. It represents a stochastic random walk of molecules"

To understand how this idea has been translated in image generation, it is worth looking at the example given by Sohl-Dickstein and colleagues (2015) who authored the seminal paper on diffusion in image generation. The author propose the following experiment: take an image and gradually apply noise to it until it becomes totally noisy; then train an algorithm to “learn” all the steps that have been applied to the image and ask it to apply them in reverse to find back the image (see #figure 1). By introducing some movement in the image, the algorithm detects some tendencies in the noise. It then gradually follows and amplifies these tendencies in order to arrive to a point where an image emerges. When the algorithm is able to recreate the original image from the noisy picture, it is said to be able to de-noise. When the algorithm is trained with billions of examples, it becomes able to generate an image from any arbitrary noisy image. And the most remarkable aspect of this process is that the algorithm is able to generalise from its training data: it is able to de-noise images that it never “saw” during the phase of training.

Another aspect of diffusion in physics is of importance in image generation can be seen at the end of the definition of the concept as stated in Wikipedia (emphasis is ours):

diffusion is the movement of a substance from a region of high concentration to a region of low concentration without bulk motion^[2]

Diffusion doesn't capture the movement of a bounded entity (a bulk, a whole block of content), it is a mode of spreading that flexibly accommodates structure. "Diffusion" is the gradual movement/dispersion of concentration within a body with no net movement of matter"^[2]. This characteristics makes it particularly apt at capturing multi level relations between image parts without having to identify a source that constraints these relations. Access to implicit structure. Metaphorically this can be compared to a process of looking for faces in clouds (or reading signs in tea leaves). We do not see immediately a face in a cumulus, but the faint movement of the mass stimulates our curiosity until we gradually delineate the nascent contours of a shape we can begin to identify. Notice the emphasis on the virtual in the image theory underlying image generation. Noise is the precondition for the generation of any image because it virtually contains all images that can be actualised through a process of decryption or de-noising. To generate an image is not a process of creation but of actualisation (of an image that already exists virtually).

the process of adding noise goes from left to right and the de-noising runs the process backwards to obtain the spiral back from noise.(Sohl-Dickstein et al., 2015) — Figure 1. The process of adding noise goes from left to right and the de-noising runs the process backwards to obtain the spiral back from noise.(Sohl-Dickstein *et al.*, 2015)

From algorithm to software

Stable Diffusion as an algorithm that encapsulates the diffusion algorithm and makes it tractable at scale.

Stabilizing diffusion

We could perhaps suggest that generative AI produces what could be called an ‘unstable’ or ‘wobbly’ understanding – and a related phenomenon of ‘shaky’ perception. Diffusion, to be discussed in the section that follows, can be seen as an imaging template for this model.^[1]

A series of nested metaphors that include the brain as computer, concepts such as 'hallucinations' or deep 'dreams'.

instability as the organising concept and technology for the emergence of our picture of the world. Indeed, it is not just the perception of images but their very constitution that is fundamentally unstable,^[1]

Berman’s analysis pointed to the liquidising of all the certainties of “the traditional world”,^[1] A general condition of instability due to the extensive disruptions brought on by the flows of capital.

Pictet group advertises its services with the rhetoric of stability, a response to global instability.

Stability in finance. How to achieve stability in a turbulent world? By engaging with turbulence and avoiding its effect, diverting its effects onto others. Example of the ad on the former employer of Emad Mostaque.

How does it evolve through time?

Evolution of size

How does it create value? Or decrease / affect value?

With its material form: the weights. Gains value with adoption.

Gains value by comparison. Ability to do what cannot be done by others or less well.

Decreases value of the singular image. Increases value of the image ensemble. To learn how to generate images, algorithms such as Stable Diffusion or Imagen need to be fed with examples. These images are given to the algorithm one by one. Through its learning phase, the algorithm treats them as one moment of an uninterrupted process of variation, not as singular specimens. At this level, the process of image generation is radically anti-representational. It treats the image as a mere moment (“quelconque”^[3]), a variation among many.

What is its place/role in techno cultural strategies?

Circulation as model

How does it relate to autonomous infrastructure?

References

↑ ^1.0 ^1.1 ^1.2 ^1.3 https://mediatheoryjournal.org/2024/09/30/joanna-zylinska-diffused-seeing/
↑ ^2.0 ^2.1 https://en.wikipedia.org/wiki/Diffusion
↑ For a discussion of the difference between privileged instants and “instants quelconques” see Deleuze’s theory of cinema, in particular https://www.webdeleuze.com/textes/295 (find translation)

Unused references

From "Maps": _Secondly, there is a 'latent space'. Image latency refers to the space in between the capture of images in datasets and the generation of new images. It is a algorithmic space of computational models where images are, for instance, encoded with 'noise', and the machine then learns how to how to de-code them back into images (aka 'image diffusion')._

[:1-1] 1.0 ^1.1 ^1.2 ^1.3 https://mediatheoryjournal.org/2024/09/30/joanna-zylinska-diffused-seeing/

[:0-2] 2.0 ^2.1 https://en.wikipedia.org/wiki/Diffusion

[3] For a discussion of the difference between privileged instants and “instants quelconques” see Deleuze’s theory of cinema, in particular https://www.webdeleuze.com/textes/295 (find translation)

[1]

[2]

[3]

@@ Line 31: / Line 31: @@
 Berman’s analysis pointed to the liquidising of all the certainties of “the traditional world”,<ref name=":1" /> A general condition of instability due to the extensive disruptions brought on by the flows of capital.
+[[File:Image-pictet-group.png|none|thumb|500x500px|Pictet group advertises its services with the rhetoric of stability, a response to global instability.]]
 Stability in finance. How to achieve stability in a turbulent world? By engaging with turbulence and avoiding its effect, diverting its effects onto others. Example of the ad on the former employer of Emad Mostaque.