Pixel space: Difference between revisions

From CTPwiki

CUA (talk | contribs)
CUA (talk | contribs)
Line 9: Line 9:


[[File:Pixel space map.jpg|none|thumb|640x640px|alt=A diagram of pixel space in AI imaging|A map of AI image generation separating 'pixel space' from 'latent space', but particularly emphasising the objects of pixel space, operated by the users. Pixel space is the home of both conventional visual culture and  a more specialised visual culture. Conventionally, image generation will involve simple 'prompt' interfaces, and models will be built on accessible image, scraped from archives on the Internet, for instance. The specialised visual culture takes advantage of the openness of Stable Diffusion to, for instance, generate specific manga or gaming images with advanced settings and parameters. Often, users build and share their own models, too, so-called <nowiki>''</nowiki>LoRAs'.  (by Christian Ulrik Andersen, Nicolas Malevé, and Pablo Velasco)]]
[[File:Pixel space map.jpg|none|thumb|640x640px|alt=A diagram of pixel space in AI imaging|A map of AI image generation separating 'pixel space' from 'latent space', but particularly emphasising the objects of pixel space, operated by the users. Pixel space is the home of both conventional visual culture and  a more specialised visual culture. Conventionally, image generation will involve simple 'prompt' interfaces, and models will be built on accessible image, scraped from archives on the Internet, for instance. The specialised visual culture takes advantage of the openness of Stable Diffusion to, for instance, generate specific manga or gaming images with advanced settings and parameters. Often, users build and share their own models, too, so-called <nowiki>''</nowiki>LoRAs'.  (by Christian Ulrik Andersen, Nicolas Malevé, and Pablo Velasco)]]
[[Category:Objects of Interest and Necessity]]

Revision as of 14:19, 27 August 2025

Pixel space

In pixel space, you find a range of visible objects that a typical user would normally meet. This includes the interfaces for creating images. In conventional interfaces like DALL-E or Bing Image Creator, users prompt in order to generate images. What is particular for autonomous and decentralised AI image generation is that the interfaces have many more parameters and ways to interact with the models that generate the images. It functions more like an 'expert' interface.

In pixel space one finds many objects of visual culture. Apart from the interface itself, this includes both all the images generated by AI, and all the images used to train the models behind. These images are, as described above, used to create datasets, compiled by crawling the internet and scraping images that all belong to different visual cultures – ranging, e.g., from museum collections of paintings to criminal records with mug shots.

Many users also have specific aesthetic requirements to the images they want to generate. Say, to generate images in a particular manga style or setting. The expert interfaces therefore also contains the possibility to combine different models and even to post-train one's own models, also known as a LoRA (Low-Rank Adaptation). When sharing the images on platforms like Danbooru (one of the first and largest image boards for manga and anime) images are typically well categorised – both descriptively ('tight boots', 'open mouth', 'red earrings', etc.) and according to visual cultural style ('genshin impact', 'honkai', 'kancolle', etc.). Therefore they can also be used to train more models.

A useful annotated and categorised dataset – be it for a foundation model or a LoRA – typically involves specialised knowledge of both the technical requirements of model training (latent space) and the aesthetics and cultural values of visual culture itself. For instance of common visual conventions, such as realism, beauty, horror, and also (in the making of LoRAs) of more specialised conventions – say a visual style that an artist or a cultural community want to generate.

A diagram of pixel space in AI imaging
A map of AI image generation separating 'pixel space' from 'latent space', but particularly emphasising the objects of pixel space, operated by the users. Pixel space is the home of both conventional visual culture and a more specialised visual culture. Conventionally, image generation will involve simple 'prompt' interfaces, and models will be built on accessible image, scraped from archives on the Internet, for instance. The specialised visual culture takes advantage of the openness of Stable Diffusion to, for instance, generate specific manga or gaming images with advanced settings and parameters. Often, users build and share their own models, too, so-called ''LoRAs'. (by Christian Ulrik Andersen, Nicolas Malevé, and Pablo Velasco)