FAQ


Does LAION datasets respect copyright laws?


LAION datasets are simply indexes to the internet, i.e. lists of URLs to the original images together with the ALT texts found linked to those images. While we downloaded and calculated CLIP embeddings of the pictures to compute similarity scores between pictures and texts, we subsequently discarded all the photos. Any researcher using the datasets must reconstruct the images data by downloading the subset they are interested in. For this purpose, we suggest the img2dataset tool.

Links in your dataset show to my copyrighted data. I would like to have them removed.


LAION is a non-profit research organization that studies how learning algorithms work. Therefore, TDM exemptions are valid for LAION according to Art. 3 EU TDM exemption and §60d UrhG of German law, TDM exemption, for research. LAION is thus permitted to use any coopyrighted material as data for conducting research on learning algorithms and foundation models as training outcome.

Do the datasets contain images that may be disturbing for viewers?


No, but links in the datasets can lead to images that are disturbing or discomforting depending on the filter or search method employed.

I found a dataset containing images while searching on the internet. What about copyright then?


Any dataset containing images is not released by LAION, it must have been reconstructed with the provided tools by other people. We do not host and also do not provide links on our website to access such datasets. Please refer only to links we provide for official released data.

I found my name and my picture in the dataset. I am an EU citizen and I want to protect my personal data as allowed by GDPR. What should I do?


If you found your name only on the ALT text data, and the corresponding picture does NOT contain your image, this is not considered personal data under GDPR terms. Your name associated with other identifiable data is. If the URL or the picture has your image, you may request a takedown of the dataset entry in the GDPR page. As per GDPR, we provide a takedown form you can use. Upon form submission, we will investigate the request, and if verifiable, we will remove the entry from all data repositories we control. Such repositories include current data stored on our computers and future releases of the datasets. We cannot act on data that are not under our control, for example, past releases that circulate via torrents.

Do your scripts respect robots.txt instructions?


Despite the “Crawling at Home” project name, we are not crawling websites to create the datasets. Common Crawl did the crawling part in the past, and they did respect the robots.txt instruction. We only analyse their data and then look at the pictures to assess their value concerning the provided alt text.

Please remove my original image / audio / video / or other media sample from your dataset.


LAION datasets do not contain any original media samples such as images, audio, or video. They contain only metadata (e.g. captions or alt texts) and URL links pointing to the original content on the public web. Because no original samples are stored or distributed by LAION, there is nothing to remove from our datasets in this regard. Any organisation — not only LAION — is permitted under EU and German law (Art. 3 EU TDM exemption and §60d UrhG) to publish such collections of links to publicly accessible web content for research purposes. If you would like the content removed from the web entirely, you will need to contact the original hosting provider directly (e.g. YouTube, Vimeo). If your request concerns personal data covered by GDPR (e.g. your name or image appearing in a URL or associated metadata), please refer to the GDPR question above for the appropriate takedown process.