Home Data Media Labeling, transforming, and structuring training data sets for machine learning

Labeling, transforming, and structuring training data sets for machine learning

The O’Reilly Data Show Podcast: Alex Ratner on how to build and manage training data with Snorkel.

In this episode of the Data Show, I speak with Alex Ratner, project lead for Stanford’s Snorkel open source project; Ratner also recently garnered a faculty position at the University of Washington and is currently working on a company supporting and extending the Snorkel project. Snorkel is a framework for building and managing training data. Based on our survey from earlier this year, labeled data remains a key bottleneck for organizations building machine learning applications and services.

Ratner was a guest on the podcast a little over two years ago when Snorkel was a relatively new project. Since then, Snorkel has added more features, expanded into computer vision use cases, and now boasts many users, including Google, Intel, IBM, and other organizations. Along with his thesis advisor professor Chris Ré of Stanford, Ratner and his collaborators have long championed the importance of building tools aimed squarely at helping teams build and manage training data. With today’s release of Snorkel version 0.9, we are a step closer to having a framework that enables the programmatic creation of training data sets.

Snorkel pipeline for data labeling
Snorkel pipeline for data labeling. Source: Alex Ratner, used with permission.

We had a great conversation spanning many topics, including:

  • Why he and his collaborators decided to focus on “data programming” and tools for building and managing training data.
  • A tour through Snorkel, including its target users and key components.
  • What’s in the newly released version (v 0.9) of Snorkel.
  • The number of Snorkel’s users has grown quite a bit since we last spoke, so we went through some of the common use cases for the project.
  • Data lineage, AutoML, and end-to-end automation of machine learning pipelines.
  • Holoclean and other projects focused on data quality and data programming.
  • The need for tools that can ease the transition from raw data to derived data (e.g., entities), insights, and even knowledge.

Related resources:

Source link

Must Read

Artificial Brains Need Sleep Too

 States that resemble sleep-like cycles in simulated neural networks quell the instability that comes with uninterrupted self-learning in artificial analogs of brains.No one can...

Differenciating Bitcoin and Electronic Money

Bitcoin has the largest market share among virtual currencies, and is already being used on a daily basis overseas. Since it is a virtual...

Answering the Woes of Staking Centralization

What if better behavior on blockchains could be encouraged with fun rather than value?Josh Lee and Tony Yun of Chainapsis built a staking demo at the Cross-Chain...

The future of Machine Learning

Machine learning (ML) is the process which enables a computer to perform something that it has not been explicitly told to do. Hence, ML...

Is Automation the solution for rapid scaling in response to the Pandemic

Thanks to the pandemic, the nature of work for federal agencies changed almost overnight. Agencies are now attempting to meet the challenges of a...

Siemens and SparkCognition unveil AI-driven cybersecurity solutions

Today, Siemens and industrial AI-firm, SparkCognition, announced a new cybersecurity solution for industrial control system (ICS) endpoints.DeepArmor Industrial, fortified by Siemens, leverages artificial intelligence (AI) to...

Amazon and Microsoft follow IBM, no longer in Face Recognition business

At least its bandwagon-detection AI still worksMicrosoft said on Thursday it will not sell facial-recognition software to the police in the US until the...

Developing smart contracts with buffered data model

How specifying world state data model with protocol buffers can help in developing smart contracts

Reasons why your AI Project might fail

Here is a common story of how companies trying to adopt AI fail. They work closely with a promising technology vendor. They invest the...

Pointers for Investing in AI Startups

The COVID-19 pandemic has drastically deranged the economic activity globally, and the startup ecosystem hasn’t been spared as well. Majority of them have been...
banner image