HomeData EngineeringData NewsUnderstanding Details About 'Fake' Data Companies

Understanding Details About ‘Fake’ Data Companies

As an investor in automation companies for the last 20 years, I’ve seen plenty of great ideas for robots or AI software end up on the dustheap of history, launched just slightly ahead of their time. But today, we’re at a tipping point in the race to automate our world. AI software is easier to code than ever before using plug-and-play APIs and opensource — and sensors, cameras and other hardware are cheap and powerful enough — to create truly useful, affordable, highly intelligent machines.

In this column, I’ll delve into some of the automation technologies set to transform our lives, discussing which sectors are most promising for investors and entrepreneurs looking to build the next billion-dollar automation companies. The first technology I’ll examine is synthetic data. What is it, and why should investors and entrepreneurs take note?

Synthetic data is defined as any production data not obtained by direct measurement. In other words, it’s “fake” data fabricated by computers. The seeds of this technology were born in the 1990s, but in the last 18 months or so, advances in AI and computer processing speeds have brought it into the mainstream. It matters for investors because it’s an extremely promising technology that could form the foundation for several multibillion-dollar companies. Entrepreneurs might want to build an entire company around synthetic data, or, rather, use it to power product development for other types of businesses.

Synthetic data has two main advantages over real data: Its scope is limitless, and it does not contain private, personal or confidential information. Real data must be collected from events that have occurred, but synthetic data can reflect any potential scenario. It can thus be used to train computers to respond to billions of hypothetical situations. This could look like training a self-driving car to recognize any number of potential environments, from a rainy night on a rural backroad to a bustling city street complete with a sofa blocking the roadway and a child darting into traffic. If engineers trained the car in the real world, they could spend thousands of hours driving around and never capturing every scenario that could occur. Synthetic data is also used to train financial software to anticipate security breaches before they happen, healthcare systems to spot trends without compromising patient confidentiality and logistics software to optimize inventory.

Startups that build synthetic data tools to help companies extract more value out of their disparate data have outsized potential. Synthetic data tools could allow companies to pull insights from their data without running foul of privacy or compliance regulations and help them analyze edge-case data that may not yet exist in real datasets to understand what-if scenarios.

At Calibrate, we’ve invested in two synthetic data companies, Diveplane and Parallel Domain, and other promising startups in the space include Gretel.ai, Hazy and Datagen. We’re actively on the lookout for startups that might be using synthetic data to improve the software development process, speeding up the testing phase by weeks or months, and in other areas, we may not have even dreamed up yet.

The amount of data being created daily is staggering. International Data Corporation estimates (paywall) 74 zettabytes of data will be created worldwide in 2021, up from 18 zettabytes in 2016. (One zettabyte is equal to a billion terabytes, or a trillion gigabytes.) Companies and consumers will only continue to create more and more data, but only a small fraction of it will be accessible or analyzable. Synthetic data fills the gaps where real data doesn’t exist yet, or can’t be used due to privacy concerns. It will help companies glean insights from their existing data, make more accurate predictions and train their autonomous systems to become smarter faster, making it a promising investment for years to come.

For investors looking to jump into the space, a few thoughts: First, understanding the customer use case and how and why synthetic data can serve as a superior alternative is key. This will help establish the required problem to be solved and help size up the market potential. Second, learning the different technologies used to create synthetic data and how they meet the needs of customers will help discern among competing approaches to solve the problem. Third, asking how much manual or unautomated work is in the process to generate the data will frame up the potential margin structure of a vendor. Fourth, investors should carefully look at the ethical considerations around the technology and its potential uses.

Tech investing is always risky, and timing plays a key role in the eventual success or failure of companies — too early, and companies run out of money waiting for the market to show up; too late, and the opportunity to be the No. 1 is gone. While every investor needs to do their own work, this market is poised for a very compelling decade ahead.

This article has been published from the source link without modifications to the text. Only the headline has been changed.

Source link

Most Popular