Home Artificial Intelligence Artificial Intelligence News Google wants AI systems to mine publishers’ work by default

Google wants AI systems to mine publishers’ work by default

August 9, 2023

Publishers should be allowed to choose not to have their works mined by generative AI systems, according to Google, however the company has not yet specified how such a system would operate.

Google argued that copyright laws should be changed to permit generative AI systems to scour the internet in its submission to the Australian government’s examination of the legal framework surrounding AI.

The business has urged Australian policymakers to support copyright systems that allow appropriate and fair use of copyrighted content so that AI models can be trained in Australia on a wide variety of data while also supporting workable opt-outs for entities that would prefer their data not to be trained using AI systems.

Google has previously argued for a fair use exception for AI systems to the Australian government, but this is the first time the corporation has brought up the idea of a publisher opt-out option.

When asked how such a system would operate, a spokesperson cited a recent blog post by Google in which the company stated that it desired discussion regarding the development of a community-developed web standard that would be comparable to the robots.txt system, which enables publishers to choose which parts of their websites are crawled by search engines.

Google’s comments come as news organizations, such as News Corp, have reportedly begun discussions with AI firms about remuneration for scraping articles.

One of the major issues facing generative AI systems in the upcoming years, according to Dr. Kayleen Manwaring, a senior lecturer at UNSW Law and Justice, is copyright.

The fundamental rule is that you need millions of data points to be able to create useful results, hence copying will always occur, which is presumptively a violation of many people’s copyright.

Manwaring noted that laws governing what AI systems may consume vary by country, but claimed that the idea of an opt-out system would completely alter the way copyright operates.

It is not possible to use an opt-out arrangement if you want to reproduce material that is protected by copyright. Instead, you must obtain the owner’s permission. They are advocating a complete overhaul of the way exceptions are handled.

Toby Murray, an associate professor at the University of Melbourne’s computing and information systems department, said Google’s proposal would put the burden on content producers to say whether AI systems could consume their content or not, but he pointed out that producers could already mark how their works can be used under existing licensing schemes like Creative Commons.

He suggested that they may well be attempting to establish standards early on that stipulate other businesses do not need to pay for this content.

Manwaring said that if the issue wasn’t fixed, copyright could collapse, which would likely be detrimental to smaller content producers.

He believes it will be a significant issue coming forward, especially when more powerful entities have their copyright violated. However, if many people’s concerns are true and AI training sets are utilizing a lot of content from the internet, non-powerful entities are very likely currently getting their copyright violated left, right, and center.

During Senate estimates, In May, Liberal senator Sarah Henderson questioned the communications department if the government was exploring a mechanism similar to the news media negotiating agreement to force AI companies to pay for scraping sites.

In response, the department cited the government’s consultation on artificial intelligence regulations in July, claimed that the government was considering a Treasury review of the news media bargaining code, and stated that it was currently looking at future policy settings for the news media as part of its news media support programme.

The deadline for the AI consultation submissions was last week. Although hundreds of submissions are believed to have been made, none have been posted online as of yet.

Source link