LAION-5B: A new era of open large-scale multi-modal datasets

We present a dataset of 5,85 billion CLIP-filtered image-text pairs, 14x bigger than LAION-400M, previously the biggest openly accessible image-text dataset in the world. Authors: Christoph Schuhmann, Richard Vencu, Romain Beaumont, Theo Coombes, Cade Gordon, Aarush Katta, Robert Kaczmarczyk, Jenia Jitsev     Large image-text models like ALIGN, BASIC, Turing Bletchly, FLORENCE & GLIDE have […]

