The machine learning community, particularly in the fields of computer vision and language processing, has a data culture problem. That’s according to a survey of research into the community’s dataset collection and use practices published earlier this month.

What’s needed is a shift away from reliance on the large, poorly curated datasets used to train machine learning models. Instead, the study recommends a culture that cares for the people who are represented in datasets and respects their privacy and property rights. But in today’s machine learning environment, survey authors said, “anything goes.”