High-quality training datasets, made available in one place.
A place to easily discover and discuss open datasets.

We are adding new datasets daily! To contribute click here.


Open Source Biometric Recog...

A communal biometrics framework supporting the development of open algorithms and reproducible evaluations. OpenBR is a framework for investigating new...

face detection, biometric, age estimation, gender estimation

Netflix Prize

Netflix released an anonymized version of their movie rating dataset; it consists of 100 million ratings, done by 480,000 users who have rated between 1...

ranking, movie

Uber 2B trip data

Uber Movement provides anonymized data from over two billion trips to help urban planning around the world. You need to sign up to download this data.

uber, urban planning, trips

MNIST handwritten digits

MNIST: handwritten digits: The most commonly used sanity check. Dataset of 25x25, centered, B&W; handwritten digits. It is an easy taskjust because some...


Google Audioset

AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTu...

google, vehicle, music, speech


The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikip...

language modeling, wiki

Searchable Machine Learning Datasets

It is hard to find the relevant datasets for a machine learning problem you are working on. We believe researchers should focus on improving the models and innovating in AI. We are here to help you with the time consuming peice of finding datasets.


1000+ Datasets

Launching with 1000+ datasets across multiple fields with a goal to continously increase this number.


A community for AI researchers & developers to learn and share​ ​their ​knowledge of datasets and models.

Better Search

We are focussed on increasing the searchability of datasets, both by better underlying metadata and better UI.


Coming Soon! We are exploring an option to create a dataset marketplace to further simply the process of acquiring datasets.

If you have any feedback or any features you will like us build on, please send an email to


People posing 20k

It is a new free ML training dataset specific for the use of unsupervised or semi-supervised generative deep learning algorithms. It contains 20015 clea...

Posing, People, GAN

Total Text Dataset

In order to facilitate a new text detection research, we introduce the Total-Text dataset (ICDAR-17 paper), which is more comprehensive than the existin...

Dark Image Dataset

In order to facilitate a new object detection and image enhancement research, we introduce the Exclusively Dark (ExDark) dataset (CVIU2019). The Exclusi...

UNIMIB2016 Food Database

This database can be used for food recognition and segmentation. The database is composed of 1,027 tray images with multiple foods and containing 73 foo...

food segmentation, Food recognition

RawFooT DB: Raw Food Textur...

The Raw Food Texture database (RawFooT) has been specially designed to investigate the robustness of descriptors and classification methods with respect...

texture, food

Oxford Audiovisual Segmenta...

This dataset consists of RGB-D videos in indoor scenes, and has dense, per-frame segmentation labels for both object and material categories. Moreover, ...

Depth, Objects, Semantic Segmentation, Materials, RGB-D, Scene Understanding, Audio-visual, Audio, Places, Scenes


Signup to start to collaborating, contributing and participating in the dataset discussions! You can always browse the datasets without signing up.