Publicly Accessible Datasets

A large enough dataset is critical in deep learning and machine learning for training a system and getting it to produce results.

So, what should a machine learning researcher do when there isn’t enough publicly available data?

Enter the MLCommons Association, a global engineering consortium dedicated to making machine learning better for everyone.

To help advance ML research, MLCommons recently announced the general availability of the People’s Speech Dataset, a 30,000-hour English-language conversational speech dataset, and the Multilingual Spoken Words Corpus, an audio speech dataset with over 340,000 keywords in fifty languages.

In this episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with David Kanter, founder and executive director of MLCommons, and NVIDIA senior AI developer technology engineer Daniel Galvez about the democratization of access to speech technology and how ML Commons is advancing machine learning research and development for all.

Source link