Dataset

From CTPwiki

Revision as of 11:28, 26 August 2025 by CUA (talk | contribs) (Dataset)

Dataset

In the context of AI image generation, a dataset is a collection of a collection of image-text pairs (and sometimes other attributes such as provenance or an aesthetic score) used to train AI models. It is an object of necessity par excellence. Without dataset, no model could see the light of day. Iconic datasets include the LAION aesthetic dataset, Artemis, ImageNet, or Common Objects in Context (COCO). These collections of images, mostly sourced from the internet, reach dizzying scales. ImageNet became famous for its 14 millions images in the first decade of the century.[1] Today LAION-5B consists of 5,85 billion CLIP-filtered image-text pairs. [2]

If large models such as Stable Diffusion require large scale datasets, various components such as LoRAs, VAEs, refiners, or upscalers can be trained with a much smaller amount of data. In practice, this means that for each of these components, a custom dataset is created. As each of these datasets reflects a particular aspect of visual culture, the components trained on them function as conduits for imaginaries and world views. Image generators are not simply produced through mathematics and statistics, they are programmed by images. Programming by images is a specific curatorial practice that involves a wide range of skills including a deep knowledge of the relevant visual domain, the ability to find the best exemplars, many practical skills such as scraping, image filtering, cleaning and cropping, and mastering the art of a coherent classification and annotation. In our tour, we discuss two examples of curatorial practices of different scales and purpose: the creation of the LAION dataset and the art of collecting the images that are necessary to "bake the LoRA cake."[3] // One will find that behind each dataset there is always an organisation - of people, corporate organisations, researchers, or others.[4] Many datasets are freely available on platforms like Hugging Face for others to build LoRAs or in other ways experiment with. //



[1] Deng, Jia, Wei Dong, Richard Socher, Li-jia Li, Kai Li, and Li Fei-fei. “Imagenet: A Large-Scale Hierarchical Image Database.” CVPR 1 (2009): 248–55. https://doi.org/10.1109/CVPR.2009.5206848.

[2] Deng, Jia, Wei Dong, Richard Socher, Li-jia Li, Kai Li, and Li Fei-fei. “Imagenet: A Large-Scale Hierarchical Image Database.” CVPR 1 (2009): 248–55. https://doi.org/10.1109/CVPR.2009.5206848. Beaumont, Romain. “LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETS.” LAION, March 31, 2022. https://laion.ai/blog/laion-5b/.

[3] knxo, “Making a LoRA Is Like Baking a Cake,” Civitai, published July 10, 2024, accessed August 18, 2025, https://civitai.com/articles/138/making-a-lora-is-like-baking-a-cake.

[4] JinsNotes. “Vision Dataset.” JinsNotes, August 1, 2024. Accessed August 26, 2025. https://jinsnotes.com/2024-08-01-vision-dataset.