BLOG

Welcome to our LAION blog! Here, you will find commentaries, news, and updates on our current research projects and progress in the field of AI research. These blog posts are not meant to be full scientific research papers, but work in progress to encourage further research / discussions on our discord server and the open scientific community.

BUD-E: Enhancing AI Voice Assistants’ Conversational Quality, Naturalness and Empathy

by: LAION, 08 Feb, 2024


AI voice assistants have revolutionized our interaction with technology, answering queries, performing tasks, and making life easier. However, the stilted, mechanical nature of their responses is a barrier to truly immersive conversational experiences. Unlike human conversation partners, they often ...

LAION POP: 600,000 high-resolution images with detailed descriptions

by: Christoph Schuhmann, Peter Bevan, 17 Nov, 2023


LAION POP is a subset of LAION 5B: This subset comprises 600,000 high-resolution images, each equipped with detailed descriptions. The selection of images was based on 10,000 different concepts popular on the image generation site "Midjourney". SampleOverview LAION-POP Dataset on Hu...

Open Empathic Launch

by: Christoph, Knoriy, Robert, 22 Oct, 2023


We are thrilled to present Open Empathic, a pioneering open-source project initiated by our non-profit organization, LAION. Open Empathic aims to equip open-source AI systems with empathy and emotional intelligence. We hope that methods and tools developed within the framework of this project, toget...

Strategic Game Datasets for Enhancing AI Planning: An Invitation for Collaborative Research

by: Christoph Schuhmann & Qi Sun, 18 Oct, 2023


Recent advancements in artificial intelligence (AI) underscore the progress of reasoning and planning shown by recent generalist machine learning (ML) models. The progress can be boosted by datasets that can further boost these generic capabilities when used for training foundation models of various...

CLARA: Advancing Machines in Understanding Speech Nuances

by: Knoriy, Christoph, Robert, 16 Oct, 2023


Voices carry not only words but also convey emotions, emphasis, and nuance through aspects like tone and accent. However, existing speech technology only partially comprehends these intricate components of human speech. Introducing CLARA (Multilingual Contrastive Learning for Audio Representation Ac...

LeoLM: Igniting German-Language LLM Research

by: Björn Plüster, 28 Sep, 2023


We proudly introduce LeoLM (Linguistically Enhanced Open Language Model), the first comprehensive suite of German-language Foundation Language Models trained in collaboration with HessianAI on their new supercomputer 42! Built on Llama-2 and trained on a large-scale, high-quality German text corpus,...

Introducing OpenLM

by: OpenLM team, 26 Sep, 2023


Introduction We release OpenLM a simple and minimalist PyTorch codebase for training medium-sized language models. OpenLM is designed to maximize GPU utilization and training speed, and is easy to modify for new language model research and applications. We validate OpenLM by training two language m...

Towards a transparent AI Future: The Call for less regulatory hurdles on Open-Source AI in Europe

by: LAION, 21 Sep, 2023


Following our previous open letter to the European Parliament on the significance of open-source AI, LAION, backed by European Laboratory for Learning and Intelligent Systems (ELLIS) and a long list of very impactful AI researchers, we submit this new open letter to the European Parliament: Link ...

LAION Triumphs at the Falling Walls Science Breakthrough of the Year 2023 Awards

by: Christoph, Jenia, Robert, 14 Sep, 2023


We happily announce that we, LAION, won the Falling Walls Science Breakthrough of the Year 2023 Award in the category Science and Innovation Management for "democratizing AI research by providing open access to advanced AI models, tools, and datasets, fostering public engagement and awareness, ...

Introducing VisIT-Bench, a new benchmark for instruction-following vision-language models inspired by real-world use

by: Yonatan Bitton, 15 Aug, 2023


[Paper] [Code] [Dataset] [Leaderboard] We are thrilled to introduce VisIT-Bench, a benchmark for evaluating instruction-following vision-language models (VLMs). The central goal of VisIT-Bench is to provide a more accurate and meaningful assessment of VLMs, particularly in the context of human-chatb...

Objaverse-XL: An Open Dataset of Over 10 Million 3D Objects

by: Matt Deitke, 11 Jul, 2023


We are thrilled to announce Objaverse-XL, an open dataset of over 10 million 3D objects! Using it, we train Zero123-XL, a foundation model for 3D that displays remarkable generalization abilities. In the landscape of AI, scale has been paramount to recent advances. Over the past decade, we have obs...

video2dataset: A simple tool for large video dataset curation

by: Maciej Kilian, 10 Jul, 2023


[GitHub] Within only two years large foundational models like CLIP, Stable Diffusion, and Flamingo have fundamentally transformed multimodal deep learning. Because of such models and their impressive capabilities to either create stunning, high-resolution imagery or to solve complex downstream tasks...

OpenFlamingo v2: New Models and Enhanced Training Setup

by: Anas Awadalla* and Irena Gao*, 28 Jun, 2023


[GitHub] [Demo] [Models] About three months ago, we announced OpenFlamingo, an open-source effort to replicate DeepMind's Flamingo models. Today, we are excited to release five trained OpenFlamingo models across the 3B, 4B, and 9B scales. These models are based on Mosaic’s MPT-1B and 7B and Together...

Announcing DataComp: In search of the next generation of multimodal datasets

by: Gabriel Ilharco, 27 Apr, 2023


[ Paper ] [ Code ] [ Website ] About a year ago, we released LAION-5B, a billion-scale open-source image-text dataset. Since then, LAION-5B has become a staple in the open-source machine learning ecosystem, powering open-source models like OpenCLIP, OpenFlamingo, and Stable Diffusion. From the begin...

A new Paella: Simple & Efficient Text-To-Image generation

by: Dominic Rampas and Pablo Pernias, 15 Apr, 2023


Overview. We are releasing a new Paella model which builds on top of our initial paper https://arxiv.org/abs/2211.07292. Paella is a text-to-image model that works in a quantized latent space and learns similarly to MUSE and Diffusion models. Paella is similar to MUSE as it also works on discrete t...

Petition for keeping up the progress tempo on AI research while securing its transparency and safety.

by: LAION.ai, 29 Mar, 2023


LINK TO OUR PETITION Authors: Christoph Schuhmann, Huu Nguyen, Robert Kaczmarczyk, Jenia Jitsev & LAION community Securing Our Digital Future: Calling for CERN like international organization to transparently coordinate and progress on large-scale AI research and its safety In an era of unparall...

Announcing OpenFlamingo: An open-source framework for training vision-language models with in-context learning

by: Anas Awadalla and Irena Gao, 28 Mar, 2023


Overview. We are thrilled to announce the release of OpenFlamingo, an open-source reproduction of DeepMind's Flamingo model. At its core, OpenFlamingo is a framework that enables training and evaluation of large multimodal models (LMMs). Check out our GitHub repository and demo to get started! For t...

The OIG Dataset

by: By Huu Nguyen - Ontocord.ai, Sameer Suri, Ken Tsui , Shahules786, Together.xyz team, and Christoph Schuhmann - LAION.ai, 10 Mar, 2023


The Open Instruction Generalist (OIG) dataset is a large open source instruction dataset that currently contains ~43M instructions. OIG is one of many chatbot datasets that LAION, along with its volunteers, Ontocord, Together and other members of the open source community, will be releasing and is i...

Training Contrastive Captioners

by: Giovanni Puccetti, Maciej Kilian, Romain Beaumont, 02 Feb, 2023


We introduce a new model type to OpenClip Contrastive Captioners (CoCa) [1]. This model adds an autoregressive objective (generation) on top of the CLIP contrastive one. The architecture is composed of three parts, the first two are similar to those composing a CLIP model and the third is a text dec...

Clip-Retrieval Update: H-14 Index & SLURM Inference

by: no usr, 31 Jan, 2023


Today we release a KNN index for LAION-5B that allows for fast queries of the dataset with the open clip ViT-H-14 CLIP model. This means that users can search through billions of samples quickly and easily, making it a powerful tool for various applications such as image and text retrieval, data fil...

Reaching 80% zero-shot accuracy with OpenCLIP: ViT-G/14 trained on LAION-2B

by: Mitchell Wortsman, 24 Jan, 2023


We have trained a new ViT-G/14 CLIP model with OpenCLIP which achieves 80.1% zero-shot accuracy on ImageNet and 74.9% zero-shot image retrieval (Recall@5) on MS COCO. As of January 2023, this is the best open source CLIP model. We believe this is interesting because: CLIP models are useful for zero...

Collaboration between LAION and the Stable Horde

by: Konstantinos Thoukydidis, hlky, 08 Jan, 2023


Author: Konstantinos Thoukydidis, hlky We are happy to announce that LAION will be assisted by the Stable Horde to provide aesthetic ratings for existing datasets and a completely new dataset of Stable Diffusion generations, which will also be rated by their community. We wrote in the past about LAI...

Laion coco: 600M synthetic captions from Laion2B-en

by: Christoph Schuhmann, Andreas Köpf, Richard Vencu, Theo Coombes, Romain Beaumont, 15 Sep, 2022


Author: Christoph Schuhmann, Andreas Köpf , Theo Coombes, Richard Vencu, Benjamin Trom , Romain Beaumont We present LAION-COCO, the world’s largest dataset of 600M generated high-quality captions for publicly available web-images Laion5B has five billion natural captions. They provide a lot of infor...

Laion translated: 3B captions translated to English from laion5B

by: Marianna Nezhurina, Romain Beaumont, Richard Vencu and Christoph Schuhmann, 15 Sep, 2022


Author: Marianna Nezhurina Romain Beaumont Richard Vencu Christoph Schuhmann Laion5B dataset was automatically collected from a section of the human web (common crawl). Can models generate different and interesting data compared to what humans write? That’s a question we are interested in investigat...

Large scale openCLIP: L/14, H/14 and g/14 trained on LAION-2B

by: Romain Beaumont, 15 Sep, 2022


We trained three large CLIP models with OpenCLIP: ViT-L/14, ViT-H/14 and ViT-g/14 (ViT-g/14 was trained only for about a third the epochs compared to the rest). The H/14 model achieves 78.0% zero shot top-1 accuracy on ImageNet and 73.4% on zero-shot image retrieval at Recall@5 on MS COCO. As of Sep...

LAION-Aesthetics

by: Christoph Schuhmann, 16 Aug, 2022


We present LAION-Aesthetics, several collections of subsets from LAION 5B with high visual quality. To create LAION-Aesthetics we trained several lightweight models that predict the rating people gave when they were asked “How much do you like this image on a scale from 1 to 10?”. LAION-Aesthetics ...

LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETS

by: Romain Beaumont, 31 Mar, 2022


We present a dataset of 5,85 billion CLIP-filtered image-text pairs, 14x bigger than LAION-400M, previously the biggest openly accessible image-text dataset in the world - see also our NeurIPS2022 paper Authors: Christoph Schuhmann, Richard Vencu, Romain Beaumont, Theo Coombes, Cade Gordon, Aarush K...

LAION-400-MILLION OPEN DATASET

by: Christoph Schuhmann, 20 Aug, 2021


We present LAION-400M: 400M English (image, text) pairs - see also our Data Centric AI NeurIPS Workshop 2021 paper Concept and Content The LAION-400M dataset is entirely openly, freely accessible. WARNING: be aware that this large-scale dataset is non-curated. It was built for research purposes to e...