Dataset

From CTPwiki

Revision as of 12:02, 20 August 2025 by NicolasMaleve (talk | contribs)

Dataset

In the context of AI image generation, a dataset is a collection of a collection of image - text pairs (and sometimes other attributes such as provenance or an aesthetic score) used to train AI models. Iconic datasets include the LAION aesthetic dataset, Artemis, ImageNet, or Common Objects in Context (COCO). These collections of images, mostly sourced from the internet, reach dizzying scales. ImageNet became famous for its 14 millions images in the first decade of the century. Today LAION-5B consists of 5,85 billion CLIP-filtered image-text pairs.

If large models such as Stable Diffusion require large scale datasets, various components such as LoRAs, VAEs, refiners, or upscalers can be trained with much more modest amount of data.