BLOG


Laion coco: 600M synthetic captions from Laion2B-en

by: Christoph Schuhmann, Andreas Köpf, Richard Vencu, Theo Coombes, Romain Beaumont, 9 Sep, 2022


Author: Christoph Schuhmann, Andreas Köpf , Theo Coombes, Richard Vencu, Benjamin Trom , Romain Beaumont We present LAION-COCO, the world’s largest dataset of 600M generated high-quality captions for publicly available web-images Laion5B has five billion natural captions. They provide a lot of infor...

Laion translated: 3B captions translated to English from laion5B

by: Marianna Nezhurina, Romain Beaumont, Richard Vencu and Christoph Schuhmann, 9 Sep, 2022


Author: Marianna Nezhurina Romain Beaumont Richard Vencu Christoph Schuhmann Laion5B dataset was automatically collected from a section of the human web (common crawl). Can models generate different and interesting data compared to what humans write? That’s a question we are interested in investigat...

Large scale openCLIP: L/14, H/14 and g/14 trained on LAION-2B

by: Romain Beaumont, 9 Sep, 2022


We trained three large CLIP models with OpenCLIP: ViT-L/14, ViT-H/14 and ViT-g/14 (ViT-g/14 was trained only for about a third the epochs compared to the rest). The H/14 model achieves 78.0% zero shot top-1 accuracy on ImageNet and 73.4% on zero-shot image retrieval at Recall@5 on MS COCO. As of Sep...

LAION-Aesthetics

by: Christoph Schuhmann, 8 Aug, 2022


We present LAION-Aesthetics, several collections of subsets from LAION 5B with high visual quality. To create LAION-Aesthetics we trained several lightweight models that predict the rating people gave when they were asked “How much do you like this image on a scale from 1 to 10?”. LAION-Aesthetics ...

LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETS

by: Romain Beaumont, 3 Mar, 2022


We present a dataset of 5,85 billion CLIP-filtered image-text pairs, 14x bigger than LAION-400M, previously the biggest openly accessible image-text dataset in the world. Authors: Christoph Schuhmann, Richard Vencu, Romain Beaumont, Theo Coombes, Cade Gordon, Aarush Katta, Robert Kaczmarczyk, Jenia ...

LAION-400-MILLION OPEN DATASET

by: Christoph Schuhmann, 8 Aug, 2021


We present LAION-400M: 400M English (image, text) pairs Concept and Content The LAION-400M dataset is entirely openly, freely accessible. WARNING: be aware that this large-scale dataset is non-curated. It was built for research purposes to enable testing model training on larger scale for broad rese...