The advent of generative artificial intelligence, such as ChatGPT for text and Midjourney for images, raises serious legal questions, particularly in terms of copyright. The Wikimedia Foundation, which maintains the Wikipedia encyclopedia, outlines the complex legal changes behind this topic.
ChatGPT does not generate text through the operation of the Holy Spirit. Its language model, which forms its technology platform (GPT-3.5, then GPT-4), is trained on hundreds of billions of words. In addition, it is a chatbot that can fetch information from the internet like Wikipedia. This is what we are looking at as an example In the presentation of GPT-4.
Using the contents of the encyclopedia – written collectively by internet users – raises legal questions: OpenAI, the American company behind ChatGPT, is free to draw from the site’s pages to train its models and browse its web pages. User’s request to generate text?
It’s a question that echoes the one we see emerging more and more from the side of AIs that specialize in charts like DALL-E, Stable Diffusion, and Midjourney. Their algorithms are trained on existing content to learn how to respond to the request with their explanation.
It turns out that more and more artists are denouncing the advent of these tools. First, it threatens their business as they compete head-to-head. These models may be trained with copyrighted drawings, sketches and illustrations without any authorization.
For Wikimedia, using its content is not a concern…a priority
What about collaborative work like Wikipedia? This issue is addressed by the Wikimedia Foundation, which oversees the encyclopedia and related projects. March 23, She published an article In it he provides the first legal analysis of copyright against ChatGPT through the prism of US law.
If Wikimedia warns that it ” An initial overview », so there is a chance to develop, an orientation emerges. OpenAI and ChatGPT (and generally by anyone, be it a chatbot, a company or an individual) exploiting the content that appears in its spaces is not difficult at first glance.
We already see it on the Internet: Google also runs Wikipedia. The same is the case with other companies. An encyclopedia is a valuable mine of enrichment for entire sections of a search engine that helps voice assistants find information and transcribe it orally. These applications are huge.
This is precisely because of the nature of the legal framework applicable to the content. On Wikipedia, texts, images, sounds, videos, and other formats are, for the most part, governed by a Creative Commons license. In particular, it is the license ” Attribute – Identical Partition 3.0 “, who A very permissive one.
” Creative Commons licenses allow free reproduction and reuse, so AI programs like ChatGPT can copy an image from a Wikipedia article or from Wikimedia Commons. “Consulting Wikimedia in its preliminary review. So anyone can retrieve the text and use it as they wish without paying anything.
However, there is one point on which Wikimedia hesitates: If certain specifications of this framework are not respected, will mass copying of content lead to a violation of the Creative Commons license? This license requires, in principle, attribution and sharing under the same conditions. Two conditions not applicable to the application.
In the case of attribution, this usually involves citing the author and providing a link to the source. As for sharing under the same conditions, new content uses the same license. In ChatGPT, we cannot see these elements when interacting with the chatbot. But with other integrations like Bing, it’s a great resource.
In addition to the question of the nature of the “input” data (is it protected? can it be considered fair use? etc.), what happens in the “output” (is it copyrightable? if so). , Who has rights? Is it subject to the same license as the input data? etc.)
The issue of compliance with licensing is also subtle for another reason at the output level, i.e. the text generated by AI. ChatGPT doesn’t just copy and paste Wikipedia: it can rewrite parts of it, depending on other sources for composite rendering. In fact, the excerpt from Wikipedia is more or less watered down.
” Overall, if current precedents are to be believed, practice methods using copyrighted data may exceed those protected by fair use in the United States, but uncertainty remains high. “Warning Wikimedia. This assumption applies only to the United States. In France, no fair use, But there are exceptions.
The Foundation acknowledges that legal issues surrounding AIs being developed, trained from state variable data, are still undetermined and unclear. It can also be looked at as a peripheral matter: can the creations of AI be protected by copyright? Not today.
These issues and others drawn up by Wikimedia are still unresolved, especially since the law differs from country to country. ” All possibilities remain open as major AI and copyright cases remain unresolved “Warns the bottom line. Headache for lawyers, red line for artists.
Subscribe Numerama on Google News So you don’t miss any information!