Machine Learning Data Deficit

Today we begin with an interview with D. Sculley, a director on the Google Brain team. Many of today’s show’s listeners will recognize D. from his work on the paper The Hidden Technical Debt in Machine Learning Systems, as well as the infamous diagram. D. recently translated the concept of technical debt into data debt, which we discuss briefly in the interview.

We talk about his take on DCAI, where debt fits into the conversation about data quality, and what a shift toward data-centrism looks like in a world of increasingly larger models, such as GPT-3 and the recent PALM models. We also look at common sources of data debt, what the community can and has done to address these issues, the utility of causal inference graphs in this work, and much more!

Source link