On average, a new technology that substantially alters civilization only appears every ten years or so. One was the Internet. The following is artificial intelligence (A.I.). A.I. has the power to transform industries like healthcare and finance and enhance lives, but it is only as good as the data it is trained on.
The vast expansion of text, images, videos, and audio that are accessible on the public web has accelerated the development of A.I. models by offering a never-ending source of data. For this reason, analysts expect the $137 billion sector that presently exists in AI to increase by more than 37% annually over the next ten years.
To democratize access to A.I. research, for instance, Meta recently released LLaMA, “a collection of foundation language models.” They use trillions of tokens to train their models, demonstrating that only publicly accessible datasets can be used to train cutting-edge algorithms, according to the Facebook parent company.
While praising the value of publicly accessible data for A.I., Meta is also bringing legal action to block access to publicly available online data that it concedes it does not own.
Big Tech will be prevented from using A.I. to its full potential if allowed to create a walled garden around data that is already in the public domain (i.e., data that isn’t protected by a login).
Looking ahead, it is anticipated that the amount of data and information produced, obtained, duplicated, and used globally would exceed 120 zettabytes this year, nearly tripling from what it was in only 2019.
The potential for artificial intelligence to grow in a way that benefits society would be severely constrained if publicly accessible online data were taken from the general people and held only by the most powerful enterprises. The advancement of cutting-edge A.I. would not be in the best interests of humanity if only a few corporations were working on it.
Not only is publicly accessible data crucial for the development of new artificial intelligence tools, but it is also vital for ongoing corporate operations. According to a poll of 150 IT, technology, and data analytics specialists from U.S. retail, technology, and nonprofit organizations, businesses and nonprofits alike depend on publicly available web data to efficiently and effectively carry out their goals, with 94% using it everyday. Nearly four out of five responders to this study claimed they couldn’t function properly without having access to info from the public web.
Generally speaking, developers must have access to the datasets they require in order to train A.I. in an ethical manner. Public web data can be used to train machine learning models, increase accuracy, and make sure A.I. is in line with humanity’s objectives because it provides a huge amount of diverse and up-to-date information.