NOTES
OSSAS (Open Source Summaries at Scale) : The Inference.net × LAION × Grass
by: Christoph Schuhmann, Amarjot Singh, Andrii Prolorenzo, Andrej Radonjic, Sean Smith, and Sam Hogan, 11 Nov, 2025
Abstract We present a comprehensive approach to democratizing access to scientific knowledge through large-scale, structured summarization of academic literature. We retrieved and processed ~100 million research papers from the public internet , leveraging existing datasets from bethgelab, PeS2o, Hu...
Admin Bud-E V1.0 – Datenschutzfreundliche KI-Assistenz für Schulen, Universitäten & Unternehmen
by: Christoph Schuhmann, Robert Kaczmarczyk, 09 Oct, 2025
LAION verfolgt seit seiner Gründung ein klares Ziel: die Demokratisierung von Künstlicher Intelligenz und zugängliche, faire Bildung für alle. Mit Bud-E 1.0 und School Bud-E 1.0 haben wir zu Jahresbeginn zwei browserbasierte Sprachassistenten bereitgestellt, die konsequent auf Privatsphäre, Offenhei...
ROOK: Reasoning Over Organized Knowledge
by: Jonathan Rahn, Jenia Jitsev & Qi Sun, 20 Jan, 2025
The field of artificial intelligence has long used strategic reasoning tasks as benchmarks for measuring and advancing AI capabilities. Chess, with its intricate rules and vast decision space, stands out as a particularly challenging domain. The game's complexity stems from its branching factor—the ...
LAION-Debate: dataset of competitive debates and discussions
by: LAION, 28 Jun, 2024
We’re pleased to announce the World's first Large Competitive Debate Dataset: LAION-Debate. LAION-Debate is a large Competitive debate dataset providing links to Competitive Debate Championships, discussions and prominent speakers intake and conversations posted on YouTube by University of Cambridge...
Call to Build Open Multi-Modal Models for Personal Assistants
by: Christoph Schuhmann, 29 May, 2024
Technologies like the recently introduced GPT-4-OMNI from OpenAI show again the potential which strong multi-modal models might have to positively transform many aspects of our lives. A particularly impressive example of this is in the field of education. Imagine every person in the world having the...
Safety Review for LAION 5B
by: LAION.ai, 19 Dec, 2023
There have been reports in the press about the results of a research project at Stanford University, according to which the LAION training set 5B contains potentially illegal content in the form of CSAM. We would like to comment on this as follows: LAION is a non-profit organization that provides da...
Conditional Pretraining of Large Language Models
by: Rallio, 16 May, 2023
Introduction Large language models (LLMs), such as OpenAI's ChatGPT and similar chatbot products from other organizations, have recently gained widespread adoption. These models can extend text or respond to instructions in a natural and helpful manner. Despite the core technologies behind LLMs, nam...
A Call to Protect Open-Source AI in Europe
by: LAION.ai, 28 Apr, 2023
An Open Letter to the European Parliament: Protecting Open-Source AI for a Safe, Secure, and Sovereign Digital Future LAION, alongside prominent research institutions and developers, has penned an open letter to the European Parliament to express concerns about the draft AI Act's potential impact on...
Training a Binary Classifier to Distinguish Images Generated with Stable Diffusion (v1.4) from Real Ones
by: Christoph Schuhmann, Ilia Zaitsev, 12 Apr, 2023
We present the development and assessment of a binary classifier designed to distinguish between authentic images and images generated using Stable Diffusion (SD) v1.4. We will discuss the dataset employed, describe the model architecture, outline the training process, and present the results obtain...
General-GPT: Breaking the Modality Constraint
by: Shivaen Ramshetty and Christoph Schuhmann, 28 Mar, 2023
Introduction With the rapid explosion of large language models and utilization of their encompassing applications, most notably ChatGPT, there is a clear promise of more capable and useful AI models/systems. Often, such models are compared to us as humans using the Turing test or their performance o...