NOTES

Welcome to our LAION notes section! Here, you will find quick overviews or work in progress of the recent research by our community!

OSSAS (Open Source Summaries at Scale) : The Inference.net × LAION × Grass

by: Christoph Schuhmann, Amarjot Singh, Andrii Prolorenzo, Andrej Radonjic, Sean Smith, and Sam Hogan, 11 Nov, 2025

Abstract We present a comprehensive approach to democratizing access to scientific knowledge through large-scale, structured summarization of academic literature. We retrieved and processed ~100 million research papers from the public internet , leveraging existing datasets from bethgelab, PeS2o, Hu...

Admin Bud-E V1.0 – Datenschutzfreundliche KI-Assistenz für Schulen, Universitäten & Unternehmen

by: Christoph Schuhmann, Robert Kaczmarczyk, 09 Oct, 2025

LAION verfolgt seit seiner Gründung ein klares Ziel: die Demokratisierung von Künstlicher Intelligenz und zugängliche, faire Bildung für alle. Mit Bud-E 1.0 und School Bud-E 1.0 haben wir zu Jahresbeginn zwei browserbasierte Sprachassistenten bereitgestellt, die konsequent auf Privatsphäre, Offenhei...

ROOK: Reasoning Over Organized Knowledge

by: Jonathan Rahn, Jenia Jitsev & Qi Sun, 20 Jan, 2025

The field of artificial intelligence has long used strategic reasoning tasks as benchmarks for measuring and advancing AI capabilities. Chess, with its intricate rules and vast decision space, stands out as a particularly challenging domain. The game's complexity stems from its branching factor—the ...

LAION-Debate: dataset of competitive debates and discussions

by: LAION, 28 Jun, 2024

We’re pleased to announce the World's first Large Competitive Debate Dataset: LAION-Debate. LAION-Debate is a large Competitive debate dataset providing links to Competitive Debate Championships, discussions and prominent speakers intake and conversations posted on YouTube by University of Cambridge...

Call to Build Open Multi-Modal Models for Personal Assistants

by: Christoph Schuhmann, 29 May, 2024

Technologies like the recently introduced GPT-4-OMNI from OpenAI show again the potential which strong multi-modal models might have to positively transform many aspects of our lives. A particularly impressive example of this is in the field of education. Imagine every person in the world having the...

Safety Review for LAION 5B

by: LAION.ai, 19 Dec, 2023

There have been reports in the press about the results of a research project at Stanford University, according to which the LAION training set 5B contains potentially illegal content in the form of CSAM. We would like to comment on this as follows: LAION is a non-profit organization that provides da...

Conditional Pretraining of Large Language Models

by: Rallio, 16 May, 2023

Introduction Large language models (LLMs), such as OpenAI's ChatGPT and similar chatbot products from other organizations, have recently gained widespread adoption. These models can extend text or respond to instructions in a natural and helpful manner. Despite the core technologies behind LLMs, nam...

A Call to Protect Open-Source AI in Europe

by: LAION.ai, 28 Apr, 2023

An Open Letter to the European Parliament: Protecting Open-Source AI for a Safe, Secure, and Sovereign Digital Future LAION, alongside prominent research institutions and developers, has penned an open letter to the European Parliament to express concerns about the draft AI Act's potential impact on...

Training a Binary Classifier to Distinguish Images Generated with Stable Diffusion (v1.4) from Real Ones

by: Christoph Schuhmann, Ilia Zaitsev, 12 Apr, 2023

We present the development and assessment of a binary classifier designed to distinguish between authentic images and images generated using Stable Diffusion (SD) v1.4. We will discuss the dataset employed, describe the model architecture, outline the training process, and present the results obtain...

General-GPT: Breaking the Modality Constraint

by: Shivaen Ramshetty and Christoph Schuhmann, 28 Mar, 2023

Introduction With the rapid explosion of large language models and utilization of their encompassing applications, most notably ChatGPT, there is a clear promise of more capable and useful AI models/systems. Often, such models are compared to us as humans using the Turing test or their performance o...