Is It Possible to Remove Datasets from AI?

MIT researchers developed a method for training a machine learning framework that, instead of requiring a dataset, uses a specific type of machine learning model to generate exceptionally realistic synthetic datasets that can be used to train another system for downstream vision applications.

Their findings suggest that a contrastive representation learning model trained solely on synthetic data can produce visual representations that are comparable, if not superior, to those learned from real data.

A generative model, which is a type of machine-learning model, requires far less memory to store or transfer than a dataset. Synthetic data has the potential to overcome some of the privacy and use rights issues that currently limit how actual data can be distributed. To overcome biases in traditional datasets, a generative model could be updated to remove specific features, like race or gender.

Creating synthetic data

After being trained on actual data, a generative model may produce synthetic data that is virtually indistinguishable from the original. The training method entails feeding the generative model millions of photos of items from a specific class (like vehicles or cats), after which it learns how to produce comparable objects.

According to Jahanian, researchers can use a pretrained generative model to generate a constant stream of unique, realistic images based on those in the model’s training dataset by simply flipping a switch.

However, generative models are far more useful because they learn how to modify the underlying data on which they are tested, he claims. If the model has been trained on vehicle photos, it can “imagine” how a car might appear in new scenarios — situations it hasn’t seen before — and then generate images that represent the car in various positions, colors, or sizes.

Contrastive learning requires different perspectives of the same image, which entails exposing a machine-learning model to a large number of unlabeled images to learn whether pairings are similar or different.

The researchers linked a pre-trained generative model to a contrastive learning model so that the two models could work together automatically. According to Jahanian, the contrastive learner can instruct the generative model to generate multiple perspectives of an object and then learn to recognize that object from different angles.

It was like putting together two puzzle pieces. Because it can provide us with alternative views of the same object, the generative model can help the contrastive technique acquire better representations, he explains.

Even better than the original

The researchers compared their technique to several different image classification models that had been trained using real-world data and discovered that it performed equally well, if not better than, the other models.

A generative model has the advantage of being able to generate an infinite number of samples in theory. As a result, the researchers examined how the number of samples affects the model’s performance. They discovered that increasing the number of unique samples generated resulted in even more benefits in some cases.

Source link