LAION: Difference between revisions

From CTPwiki

No edit summary
CUA (talk | contribs)
 
(15 intermediate revisions by 2 users not shown)
Line 2: Line 2:


== LAION ==
== LAION ==
LAION created by a  physics and computer science teacher,  moderator of a Discord channel with a degree in physics?. Works on a voluntary basis to democratize AI.
If our tour has led us into well-funded companies such as [[Hugging Face]] or [[CivitAI]] and their attachments in the heart of venture capital, it also leads us, at the opposite end of the financial spectrum to significant actors that operate within a minimal economy such as [[Stable Horde]]. The Large-scale Artificial Intelligence Open Network (LAION) fits in this category. It is a non-profit organization whose ambition is to democratize AI by encouraging "open public education and a more environment-friendly use of resources by reusing existing datasets and models."[1] LAION operates with small donations in the form of money but mostly in terms of cloud compute.[2]
[[File:Logo-002.png|left|thumb|123x123px|LAION's logo [3]]]


“We have a bank account with a bit of money coming into it from a few companies that support us. That’s primarily Hugging Face, but also StabilityAI, although we’re mostly supported not by money but by cloud compute. StabilityAI, for example, has a huge cluster with 4000 or now 5600 GPUs, and there we or our members who are approved by the core team can use preemptable GPUs, for example, what is not being used at the moment and is idle.”<ref>https://mlconference.ai/blog/ai-as-a-superpower-laion-and-the-role-of-open-source-in-artificial-intelligence/</ref>
LAION's co-funder, Christoph Schuhmann, is the driving force behind one major object of necessity for the generative AI ecosystem: a series of datasets that outscaled the existing offer. The curatorial method for these datasets was entirely automated. It leveraged cleverly available resources such as Common Crawl and Google Collab to download text-image pairs en masse from the internet. This curatorial method differs radically from the practice of affective involvement discussed in the [[LoRA]] entry where anime enthusiasts select images by hand from a visual domain they cherish. It also contrast with the method used in earlier large scale datasets such as ImageNet where the annotation work was manually performed and crowdsourced. In the case of LAION5B that contains 5.85 billion images, Schuhmann and his collaborators used an index of webpages compiled by the non-profit Common Crawl to find html documents with <code><img></code> tags, and extract their Alt Text (Alt Text is a descriptive text acts as a substitute for visual items on a page, and is sometimes included in the image data to increase accessibility). The work of annotation is delegated to the then just-released [[Clip|CLIP]] model tasked to verify the relation between the downloaded images and the adjacent alt-text.[4] The comparison is even more striking with a subsequent dataset, LAION-Aesthetics, consisting in a subset of the 5 billions images dataset that contains images with higher aesthetic quality.[5] This object of high interest for the newly burgeoning field of image generation, that desperately looked for stylistically rich images to train algorithms, was assembled using an approach that again favoured integral automation. This time the selection was handled by a custom-made model trained on clip embeddings to evaluate the quality of images by attaching them an aesthetic score.


“Then in the spring of 2021, I sat down and just wrote down a huge spaghetti code in a Google Colab and then asked around on Discord who wanted to help me with it. Someone got in touch, who later turned out to be only 15 at the time. And he wrote a tracker, basically a server that manages lots of colabs, each of which gets a small job, extracts a gigabyte, and then uploads the results. At that time, the first version was still using Google Drive.” (Ibid)
This can be explained by the fact that LAION operates with a minimal budget and could not afford the cost of manual verification and annotation of a dataset of that scale. But in the case of LAION, the automation of curation did not preclude artisanal practice. It displaced it. An interview given by Schuhmann shows the ad-hoc and low-tech nature of the bricolage that presided the creation of an object that helped sparked the development of image generation [6]:


“We then did a [blog post about our dataset](<nowiki>https://laion.ai/blog/laion-400-open-dataset/</nowiki>), and after less than an hour, I already had an email from the Hugging Face people wanting to support us. I had then posted on the Discord server that if we had $5,000, we could probably create a billion image-text pairs. Shortly after, someone already agreed to pay that: “If it’s so little, I’ll pay it.At some point, it turned out that the person had his own startup in text-to-image generation, and later he became the chief engineer of Midjourney.” (Ibid)
“Then in the spring of 2021, I sat down and just wrote down a huge spaghetti code in a Google Colab and then asked around on Discord who wanted to help me with it. Someone got in touch, who later turned out to be only 15 at the time. And he wrote a tracker, basically a server that manages lots of colabs, each of which gets a small job, extracts a gigabyte, and then uploads the results. At that time, the first version was still using Google Drive.”[7]


“I had already heard about Robin Rombach, who was still a student in Heidelberg at the time and had helped develop latent diffusion models at the CompVis Group. Emad Mostaque, the founder of StabilityAI, told me in May 2022 that he would like to support Robin Rombach with compute time, and that’s how I got in touch with Robin.” (Ibid)
“We then did a [blog post about our dataset](<nowiki>https://laion.ai/blog/laion-400-open-dataset/</nowiki>), and after less than an hour, I already had an email from the Hugging Face people wanting to support us. I had then posted on the Discord server that if we had $5,000, we could probably create a billion image-text pairs. Shortly after, someone already agreed to pay that: “If it’s so little, I’ll pay it.” At some point, it turned out that the person had his own startup in text-to-image generation, and later he became the chief engineer of Midjourney.”[8]


For research vs copyright “There is a Data Mining Law, an EU-wide exception to copyright. It allows non-profit institutions, such as universities, but also associations like ours, whose focus is on research and who make their results publicly available, to download and analyse things that are openly available on the internet.
These two fragments are worth quoting at length. In them, Schuhmann traces a line that goes from the management of the limits of user accounts on collab and Google Drives, the informality of meeting a coder on Discord (who ends up being a teenager) and the future chief engineer of a major company of the field. These anecdotes indicate how the dataset functions as an attractor for actors and projects of radically different scales and funding.  


We are allowed to temporarily store the links, texts, whatever, and when we no longer need them for research, we have to delete them. This law explicitly allows data mining for research, and that is very good.” (Ibid)
In the [[dataset]] entry, we characterized the datasets as conduits of visual culture entering a model. Examining the controversies surrounding LAION, we have to underline how these conduits problematically enable the reigning extractivism of the AI industry. Indeed, the curatorial method devised by LAION does not include seeking permissions from the images' rights owners and several court cases are currently led by artists and image agencies against Stability AI and others on the ground that their use of the images contained in the LAION dataset is infringing.[9] Here the non-profit status of the organization plays an ambiguous role. For Schuhmann, his association benefits from an exception granted by the EU Data Mining directive for scientific research.[10] If this is true for LAION itself, the same can't be said for the parties interested in the object. If the dataset is an object of necessity for Stability AI and Midjourney as much as for Stable Horde or the individual users generating images with their models, the images it contains are also objects of necessity for the artists who produced them. What the example of LAION reveals is that even if their collections of images are sites of convergence for actors and projects of different scales and means, they are at the same time sites of divergence for their authors who have radically different interests.  


==== References ====
 
 
[1] “LAION.” ''LAION'', n.d. Accessed August 22, 2025. https://laion.ai/.
 
[2] Schuhmann, Christoph. “AI as a Superpower: LAION and the Role of Open Source in Artificial Intelligence.” ''MLCon'', June 21, 2023. https://mlconference.ai/blog/ai-as-a-superpower-laion-and-the-role-of-open-source-in-artificial-intelligence/.
 
[3] “LAION.” ''LAION'', n.d. Accessed August 22, 2025. https://laion.ai/.
 
[4] Beaumont, Romain. “LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETS.” ''LAION'', March 31, 2022. https://laion.ai/blog/laion-5b/.
 
[5] Schuhmann, Christoph. “LAION-AESTHETICS.” ''LAION'', August 16, 2022. https://laion.ai/blog/laion-aesthetics/.
 
[6] Schuhmann, “AI as a Superpower: LAION and the Role of Open Source in Artificial Intelligence.”
 
[7] Ibid.
 
[8] Ibid.
 
[9] Andersen et al v. Stability AI Ltd. et Al, 3:2023cv00201 (US District Court for the Northern District of California 2023). https://dockets.justia.com/docket/california/candce/3:2023cv00201/407208.
 
[10] Moody, Glyn. “German Court: LAION’s Generative AI Training Dataset Is Legal Thanks To EU Copyright Exceptions.” ''Techdirt'', October 25, 2024. https://www.techdirt.com/2024/10/25/german-court-laions-generative-ai-training-dataset-is-legal-thanks-to-eu-copyright-exceptions/.
 
== Guestbook ==
<eplite id="Objects_of_interest_and_necessity_LAION" height="600px" width="1000px" />
 
[[Category:Objects of Interest and Necessity]]

Latest revision as of 11:19, 18 September 2025


LAION

If our tour has led us into well-funded companies such as Hugging Face or CivitAI and their attachments in the heart of venture capital, it also leads us, at the opposite end of the financial spectrum to significant actors that operate within a minimal economy such as Stable Horde. The Large-scale Artificial Intelligence Open Network (LAION) fits in this category. It is a non-profit organization whose ambition is to democratize AI by encouraging "open public education and a more environment-friendly use of resources by reusing existing datasets and models."[1] LAION operates with small donations in the form of money but mostly in terms of cloud compute.[2]

LAION's logo [3]

LAION's co-funder, Christoph Schuhmann, is the driving force behind one major object of necessity for the generative AI ecosystem: a series of datasets that outscaled the existing offer. The curatorial method for these datasets was entirely automated. It leveraged cleverly available resources such as Common Crawl and Google Collab to download text-image pairs en masse from the internet. This curatorial method differs radically from the practice of affective involvement discussed in the LoRA entry where anime enthusiasts select images by hand from a visual domain they cherish. It also contrast with the method used in earlier large scale datasets such as ImageNet where the annotation work was manually performed and crowdsourced. In the case of LAION5B that contains 5.85 billion images, Schuhmann and his collaborators used an index of webpages compiled by the non-profit Common Crawl to find html documents with <img> tags, and extract their Alt Text (Alt Text is a descriptive text acts as a substitute for visual items on a page, and is sometimes included in the image data to increase accessibility). The work of annotation is delegated to the then just-released CLIP model tasked to verify the relation between the downloaded images and the adjacent alt-text.[4] The comparison is even more striking with a subsequent dataset, LAION-Aesthetics, consisting in a subset of the 5 billions images dataset that contains images with higher aesthetic quality.[5] This object of high interest for the newly burgeoning field of image generation, that desperately looked for stylistically rich images to train algorithms, was assembled using an approach that again favoured integral automation. This time the selection was handled by a custom-made model trained on clip embeddings to evaluate the quality of images by attaching them an aesthetic score.

This can be explained by the fact that LAION operates with a minimal budget and could not afford the cost of manual verification and annotation of a dataset of that scale. But in the case of LAION, the automation of curation did not preclude artisanal practice. It displaced it. An interview given by Schuhmann shows the ad-hoc and low-tech nature of the bricolage that presided the creation of an object that helped sparked the development of image generation [6]:

“Then in the spring of 2021, I sat down and just wrote down a huge spaghetti code in a Google Colab and then asked around on Discord who wanted to help me with it. Someone got in touch, who later turned out to be only 15 at the time. And he wrote a tracker, basically a server that manages lots of colabs, each of which gets a small job, extracts a gigabyte, and then uploads the results. At that time, the first version was still using Google Drive.”[7]
“We then did a [blog post about our dataset](https://laion.ai/blog/laion-400-open-dataset/), and after less than an hour, I already had an email from the Hugging Face people wanting to support us. I had then posted on the Discord server that if we had $5,000, we could probably create a billion image-text pairs. Shortly after, someone already agreed to pay that: “If it’s so little, I’ll pay it.” At some point, it turned out that the person had his own startup in text-to-image generation, and later he became the chief engineer of Midjourney.”[8]

These two fragments are worth quoting at length. In them, Schuhmann traces a line that goes from the management of the limits of user accounts on collab and Google Drives, the informality of meeting a coder on Discord (who ends up being a teenager) and the future chief engineer of a major company of the field. These anecdotes indicate how the dataset functions as an attractor for actors and projects of radically different scales and funding.

In the dataset entry, we characterized the datasets as conduits of visual culture entering a model. Examining the controversies surrounding LAION, we have to underline how these conduits problematically enable the reigning extractivism of the AI industry. Indeed, the curatorial method devised by LAION does not include seeking permissions from the images' rights owners and several court cases are currently led by artists and image agencies against Stability AI and others on the ground that their use of the images contained in the LAION dataset is infringing.[9] Here the non-profit status of the organization plays an ambiguous role. For Schuhmann, his association benefits from an exception granted by the EU Data Mining directive for scientific research.[10] If this is true for LAION itself, the same can't be said for the parties interested in the object. If the dataset is an object of necessity for Stability AI and Midjourney as much as for Stable Horde or the individual users generating images with their models, the images it contains are also objects of necessity for the artists who produced them. What the example of LAION reveals is that even if their collections of images are sites of convergence for actors and projects of different scales and means, they are at the same time sites of divergence for their authors who have radically different interests.


[1] “LAION.” LAION, n.d. Accessed August 22, 2025. https://laion.ai/.

[2] Schuhmann, Christoph. “AI as a Superpower: LAION and the Role of Open Source in Artificial Intelligence.” MLCon, June 21, 2023. https://mlconference.ai/blog/ai-as-a-superpower-laion-and-the-role-of-open-source-in-artificial-intelligence/.

[3] “LAION.” LAION, n.d. Accessed August 22, 2025. https://laion.ai/.

[4] Beaumont, Romain. “LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETS.” LAION, March 31, 2022. https://laion.ai/blog/laion-5b/.

[5] Schuhmann, Christoph. “LAION-AESTHETICS.” LAION, August 16, 2022. https://laion.ai/blog/laion-aesthetics/.

[6] Schuhmann, “AI as a Superpower: LAION and the Role of Open Source in Artificial Intelligence.”

[7] Ibid.

[8] Ibid.

[9] Andersen et al v. Stability AI Ltd. et Al, 3:2023cv00201 (US District Court for the Northern District of California 2023). https://dockets.justia.com/docket/california/candce/3:2023cv00201/407208.

[10] Moody, Glyn. “German Court: LAION’s Generative AI Training Dataset Is Legal Thanks To EU Copyright Exceptions.” Techdirt, October 25, 2024. https://www.techdirt.com/2024/10/25/german-court-laions-generative-ai-training-dataset-is-legal-thanks-to-eu-copyright-exceptions/.

Guestbook