Interfaces: Difference between revisions
No edit summary |
|||
Line 6: | Line 6: | ||
What is of particular interest here is, however, the user interface to autonomous AI image generation. | What is of particular interest here is, however, the user interface to autonomous AI image generation. | ||
=== What is the network that sustains | === What is the network that sustains the interface to AI image creation? === | ||
At the level of 'pixel space' where users typically generate images (see [[Maps]]), one is accustomed to mainstream generators, sometimes integrated into other services. For instance, Microsoft Bing is not merely a search engine, but also integrates all the other services offered by Microsoft, including Microsoft Image Creator. The Image Creator is, as expressed in the setting made for 'Happy Creation!' There is in other words, also an expected 'smoothness' in the interface – a simplicity, and low entry threshold for the user. | At the level of 'pixel space' where users typically generate images (see [[Maps]]), one is accustomed to mainstream generators, sometimes integrated into other services. For instance, Microsoft Bing is not merely a search engine, but also integrates all the other services offered by Microsoft, including Microsoft Image Creator. The Image Creator is, as expressed in the setting made for 'Happy Creation!' There is in other words, also an expected 'smoothness' in the interface – a simplicity, and low entry threshold for the user. | ||
One would also notice | What is also noticeable in this smoothness, is that the user is offered very few parameters in the configuration; basically just a prompt (and a few more for credit in video-generation). | ||
The interfaces for generating images with Stable Diffusion vary, and there are many options depending on the configuration of one's own computer. [https://comfyui.org ComfyUI], for instance, is designed specifically for Stable Diffusion and employs a node based workflow (as may also be encountered in other softwares, such as MaxMSP), making it particularly suitable to reproduce pipelines (see also [[Hugging Face]]) . It works for both Windows and Ubunty (Linux) users. [https://tinybots.net/artbot/create ArtBot] is another graphical user interface, that has a web interface. [https://drawthings.ai Draw Things] is suitable for MacOS users. Stability AI also offers an API, meaning a more command-line interface, that allows developers to integrate image generation capabilities into their applications. | |||
In autonomous AI image creation the interfaces are very different from, say, Bing's Image Creator. They offer some of the same features of configuration, and also of integration with other services, but they do so in much more open-ended ways. There are, for instnance, endless settings and parameters. To generate an image, there is naturally a [[prompt]], but also the option of a negative prompt. One can also combine models, e.g., use a 'base model' of Stable Diffusion, and add LoRA's from [[CivitAI]] or [[Hugging Face]]. There is also the option of determine how much weight the added models should have in the generation of image, the size of the image, or a 'seed' that allows for variations (of style, for instance) while maintaining some consistency in the image. | |||
=== How do interfaces create value? === | |||
Using a commercial platform, one would also notice the use of '[[currencies]]'. In Microsoft Image Creator there are 'credits' that allow users a front pocket to the [[GPU|GPU.]] These are called [https://rewards.bing.com/welcome Microsoft Reward points], and are basically earned by either waiting (as a punishment) or being a good customer, using Microsoft's other products and sharing one's data. One earns, for instance, points for searching through Bing, using Windows search box, buying Microsoft products, and playing games in Microsoft Xbox. Use is in other words related to different planes in the ecology, such as an infrastructure and a business model (see [[Maps]]). | |||
Like the commercial platform interfaces, also the interfaces for Stable Diffusion depend on different planes of AI, but they do so in distinct ways. For instance, installing Draw Things would allow a user to generate images on one's own GPU, on a material plane. With ArtBot it is also possible to connect to [[Stable Horde]], accessing the processing power of a distributed network and building 'currencies' if one is willing to let one's own GPU be part of the network. | |||
=== What is its place/role in techno cultural strategies? === | |||
The commercial interfaces for AI image creation are sometimes criticised for their biases or copyright infringements, but many people see them as [https://medium.com/@analyticsinsight/what-is-bing-ai-image-creator-ba8ac1e8eb1e useful tools]. They are used both by creatives to test out ideas and get inspired, and frequently also in teaching and communication, to easily display ideas (at times, just to laugh at how the generated images relate to the prompt - indeed that seems to be the source of a whole new visual genre). [EXAMPLES / IMAGES FROM ACADEMICS] | |||
By its many settings, interface to Stable Diffusion accommodate a much more fine grained visual culture. Notably, this is found on [[CivitAI]] and also sites like [https://danbooru.donmai.us Danbooru]. However, the visual cultural preferences (for, say, a particular style of manga) also leads to further complexity on the user interface. That is, an interface like Draw Things and ComfyUI are not merely interfaces for generating images, but also for training one's own models (i.e., to build [[LoRA|LoRAs]]). Like the generation of images, there are endless settings that point to how highly skilled the practitioners are within the visual culture of autonomous AI image creation. For instance, in building and annotating datasets and creating 'trigger words' (for prompts), in understanding the impact on 'learning rates' on the use of resources, the implications of 'training steps', and much more. That is, the interface to generative AI is typically not solely a conventional user interface, but a combined user and (amateur) developer interface to a computational 'latent space' of models (see [[Maps]]). | |||
=== How do interfaces relate to autonomous infrastructures? === | |||
To conclude, interfaces to autonomous and decentralised AI image generation seem to rely on a high need for parameters and configurations that allow communities to integrate into other software, and also accommodate particular and highly specialised visual expressions. The interfaces rely on material infrastructures, but in contrast to flagship platforms, they offer the potential to use one's own resources (GPU), or a decentralised network of GPUs. They are, however, rarely completely detached commercial interests and centralised infrastructures. The integration and dependency on a platform like [[Hugging Face]] is a good example of this. | |||
Revision as of 17:25, 9 July 2025
What is an interface?
Interfaces to generative AI come in many forms. There is Hugging Face, which offers an interface to the many models of generative AI, usually via a command-like interface. In this sense there is also interfaces between Hugging Face and a computational latent space, interfaces to the racks of servers in generative AI, and also between hardware (such as a GPU, on a more material plane).
What is of particular interest here is, however, the user interface to autonomous AI image generation.
What is the network that sustains the interface to AI image creation?
At the level of 'pixel space' where users typically generate images (see Maps), one is accustomed to mainstream generators, sometimes integrated into other services. For instance, Microsoft Bing is not merely a search engine, but also integrates all the other services offered by Microsoft, including Microsoft Image Creator. The Image Creator is, as expressed in the setting made for 'Happy Creation!' There is in other words, also an expected 'smoothness' in the interface – a simplicity, and low entry threshold for the user.
What is also noticeable in this smoothness, is that the user is offered very few parameters in the configuration; basically just a prompt (and a few more for credit in video-generation).
The interfaces for generating images with Stable Diffusion vary, and there are many options depending on the configuration of one's own computer. ComfyUI, for instance, is designed specifically for Stable Diffusion and employs a node based workflow (as may also be encountered in other softwares, such as MaxMSP), making it particularly suitable to reproduce pipelines (see also Hugging Face) . It works for both Windows and Ubunty (Linux) users. ArtBot is another graphical user interface, that has a web interface. Draw Things is suitable for MacOS users. Stability AI also offers an API, meaning a more command-line interface, that allows developers to integrate image generation capabilities into their applications.
In autonomous AI image creation the interfaces are very different from, say, Bing's Image Creator. They offer some of the same features of configuration, and also of integration with other services, but they do so in much more open-ended ways. There are, for instnance, endless settings and parameters. To generate an image, there is naturally a prompt, but also the option of a negative prompt. One can also combine models, e.g., use a 'base model' of Stable Diffusion, and add LoRA's from CivitAI or Hugging Face. There is also the option of determine how much weight the added models should have in the generation of image, the size of the image, or a 'seed' that allows for variations (of style, for instance) while maintaining some consistency in the image.
How do interfaces create value?
Using a commercial platform, one would also notice the use of 'currencies'. In Microsoft Image Creator there are 'credits' that allow users a front pocket to the GPU. These are called Microsoft Reward points, and are basically earned by either waiting (as a punishment) or being a good customer, using Microsoft's other products and sharing one's data. One earns, for instance, points for searching through Bing, using Windows search box, buying Microsoft products, and playing games in Microsoft Xbox. Use is in other words related to different planes in the ecology, such as an infrastructure and a business model (see Maps).
Like the commercial platform interfaces, also the interfaces for Stable Diffusion depend on different planes of AI, but they do so in distinct ways. For instance, installing Draw Things would allow a user to generate images on one's own GPU, on a material plane. With ArtBot it is also possible to connect to Stable Horde, accessing the processing power of a distributed network and building 'currencies' if one is willing to let one's own GPU be part of the network.
What is its place/role in techno cultural strategies?
The commercial interfaces for AI image creation are sometimes criticised for their biases or copyright infringements, but many people see them as useful tools. They are used both by creatives to test out ideas and get inspired, and frequently also in teaching and communication, to easily display ideas (at times, just to laugh at how the generated images relate to the prompt - indeed that seems to be the source of a whole new visual genre). [EXAMPLES / IMAGES FROM ACADEMICS]
By its many settings, interface to Stable Diffusion accommodate a much more fine grained visual culture. Notably, this is found on CivitAI and also sites like Danbooru. However, the visual cultural preferences (for, say, a particular style of manga) also leads to further complexity on the user interface. That is, an interface like Draw Things and ComfyUI are not merely interfaces for generating images, but also for training one's own models (i.e., to build LoRAs). Like the generation of images, there are endless settings that point to how highly skilled the practitioners are within the visual culture of autonomous AI image creation. For instance, in building and annotating datasets and creating 'trigger words' (for prompts), in understanding the impact on 'learning rates' on the use of resources, the implications of 'training steps', and much more. That is, the interface to generative AI is typically not solely a conventional user interface, but a combined user and (amateur) developer interface to a computational 'latent space' of models (see Maps).
How do interfaces relate to autonomous infrastructures?
To conclude, interfaces to autonomous and decentralised AI image generation seem to rely on a high need for parameters and configurations that allow communities to integrate into other software, and also accommodate particular and highly specialised visual expressions. The interfaces rely on material infrastructures, but in contrast to flagship platforms, they offer the potential to use one's own resources (GPU), or a decentralised network of GPUs. They are, however, rarely completely detached commercial interests and centralised infrastructures. The integration and dependency on a platform like Hugging Face is a good example of this.
Also infrastructures and business planes / Hugging Face - GPU/Stable Horde.
also LoRA - re-model latent space. / computational interaction.
How does it evolve through time?
But
advertising, one can easily generate images for inspiration
- How does it move from person to person, person to software, to platform, what things are attached to it (visual culture)
- Networks of attachments
- How does it relate / sustain a collective? (human + non-human)
How does it evolve through time?
Evolution of the interface for these objects. Early chatgpt offered two parameters through the API: prompt and temperature. Today extremely complex object with all kinds of components and parameters. Visually what is the difference? Richness of the interface in decentralization (the more options, the better...)
How does it create value? Or decrease / affect value?
What is its place/role in techno cultural strategies?
and used both by creatives to test out ideas and get inspired, and frequently also in teaching and communication, to easily display ideas (at times, just to laugh at how the generated images relate to the prompt).
How does it relate to autonomous infrastructure?
Interface as an access point - how to get in contact with autonomous AI image creation - access LoRAs and also the wider visual community (using their models) An entry on Draw Things - how it depends on CivitAI as well as Hugging Face
The different experiences of interfaces - the platormed/'conversationa' vs the history of one's prompt
How one prompts collaboratively with the model. E.g., triggering one's annotations of images) or using negative prompts - or the many settings available.
Add also APIs and command based interfaces.
The differences also between ArtBot and Draw Things and also comfyUI - as a catalogue of interfaces.
Interfaces - in computer semiotics. But also how sign/representation gets coupled with 'signals' in unpredictable ways: you anticipate, but never know the outcome.
Interfaces to the generation of images (pixel space to latent space), but also an amateur/developer interface 'backend' of the models/latent space.