Dataset
From CTPwiki
Dataset
In the context of AI image generation, a dataset is a collection of a collection of image - text pairs (and sometimes other attributes such as provenance or an aesthetic score) used to train AI models. Iconic datasets include the LAION aesthetic dataset, Artemis, ImageNet, or Common Objects in Context (COCO). These collections of images, mostly sourced from the internet, reach dizzying scales. ImageNet became famous for its 14 millions images in the first decade of the century. Today LAION-5B consists of 5,85 billion CLIP-filtered image-text pairs.
If large models such as Stable Diffusion require large scale datasets, various components such as LoRAs, VAEs, refiners, or upscalers can be trained with much more modest amount of data.