For more than 20 years, Kit Lofstead has written fan fiction exploring alternate universes for “Star Wars” heroes and “Buffy the Vampire Slayer” villains, and sharing his stories for free online.
But in May, Ms Lofstedt stopped posting her creations after she learned that a data company had copied her stories and put them in. artificial intelligence technology basic chatgptViral Chatbot. Frustrated, he hid his writings behind a locked account.
Ms Lofstedt also helped organize an event of rebellion against AI systems last month. Along with dozens of other fan fiction authors, he published a flood of outrageous stories online to overwhelm and confuse the data-collection services that feed the writers’ work into AI technology.
Ms Lofstead, a 42-year-old voice actress from South Yorkshire in Britain, said: “Each one of us has to do everything we can to show them that our creativity is not the result of machines being harvested for their will.”
Fan fiction writers are now just a group rebelling against the AI system technology fever has taken over Silicon Valley and the world. In recent months, social media companies such as Reddit and Twitter, news organizations including The New York Times and NBC News, writers such as Paul Tremblay and actress sarah silverman All have taken a stand against AI stealing their data without permission.
His protest has taken different forms. Writers and artists are locking their files to protect their work or boycotting some websites that publish AI-generated content, even as companies like Reddit seek to do the same. charge for access for their data. At least 10 lawsuits have been filed against AI companies this year, accusing them of training their systems on creative works of artists without their consent. Last week, Ms. Silverman and writers Christopher Golden and Richard Kadrey sued on OpenAI, makers of ChatGPT, and others on using AI in their work.
There is a new understanding of the origins of the rebellions online information – Stories, artworks, news articles, message board posts and photos – can have significant untapped value.
The new wave of AI – known as “Generative AI” for the text, images and other content it generates – is built on top of complex systems such as large language model, who are capable of producing humane prose. These models are trained on repositories of all kinds of data to answer people’s questions, mimic writing styles, or churn out comedy and poetry.
This has triggered a search by tech companies for even more data to feed their AI systems. Google, Meta and OpenAI used information from essentially the entire Internet, including large databases of fan fiction, repositories of news articles, and collections of books, much of which was available for free online. In tech industry parlance, this was known as “scraping” the Internet.
OpenAI’s GPT-3An AI system released in 2020 spans 500 billion “tokens”, each representing parts of words found mostly online. Some AI models span over a trillion tokens.
The practice of internet stalking has been around for a long time and was largely exposed by the companies and non-profit organizations that did it. But it is not well understood or seen as particularly problematic by the companies that own the data. That changed after ChatGPT debuted in November and the public learned more about the underlying AI models that powered chatbots.
“What’s happening here is a fundamental restructuring of the value of data,” said Brandon Duderstedt, founder and CEO of AI company Nomic. “Previously, the idea was that you could get value out of the data by making it open to everyone and running ads. Now, the idea is to lock down your data, because you can get a lot of value out of it when you use it as an input to your AI.
A data conflict may have little effect in the long run. Deep-pocketed tech giants like Google and Microsoft are already sitting on a pile of proprietary information and have the resources to license more. But as the era of easy-to-source content draws to a close, smaller AI upstarts and nonprofits that had hoped to compete with larger companies have been able to get enough content to train their systems. Can’t be.
OpenAI said in a statement that ChatGPT was trained on “licensed content, publicly available content and content created by human AI trainers”. It added, “We respect the rights of creators and authors, and look forward to continuing to work with them to protect their interests.”
Google said in a statement that it was involved in conversations on how publishers can manage their content in the future. “We believe that a vibrant content ecosystem benefits everyone,” the company said. Microsoft did not respond to a request for comment.
The data rebellion erupted last year after ChatGPT became a worldwide phenomenon. In November, a group of programmers filed a proposed class action lawsuit against Microsoft and OpenAI, claiming that the companies had infringed their copyrights after using their code to train an AI-powered programming assistant.
In January, Getty Images, which provides stock photos and videos, filed a lawsuit Sustainability AIAn AI company that creates images from text descriptions claims the start-up used copyrighted photographs to train its system.
Then in June, Clarkson, a Los Angeles law firm, filed a 151-page proposed class action lawsuit against OpenAI and Microsoft, detailing how OpenAI had collected data from minors and saying that web scraping violated copyright law. committed and constitutes “theft”. On Tuesday, the company filed a similar lawsuit against Google.
Ryan Clarkson said, “The data revolt we’re seeing across the country is society’s way of pushing back against the idea that Big Tech is entitled to take any and all information from any source and make it its own. ” Clarkson Founder.
Eric Goldman, a professor at Santa Clara University School of Law, said the lawsuit’s arguments were broad and unlikely to be accepted by the court. But the wave of litigation has just begun, he said, with a “second and third wave” coming that will define the future of AI.
Big companies are also pushing back against AI scrapers. in April, reddit said It wanted to charge a fee for access to its application programming interface, or API, the method through which third parties can download and analyze the social network’s massive database of person-to-person conversations.
Reddit chief executive Steve Huffman said at the time that his company “doesn’t need to give away all that value for free to some of the biggest companies in the world.”
That same month, Stack Overflow, a question-and-answer site for computer programmers, said it would also ask AI companies to pay for the data. There are about 60 million questions and answers on the site. This move was informed earlier by wired.
News organizations are also opposing AI systems. In an internal memo in June about the use of generative AI, The Times said that AI companies “must respect our intellectual property.” A Times spokeswoman declined to elaborate.
For individual artists and writers, fighting back against AI systems means rethinking where they publish.
Nicholas Cooley, 35, a painter in Vancouver, British Columbia, was concerned about how his distinctive art style could be replicated by an AI system and suspected that the technology had taken away his work. He plans to keep posting his creations on Instagram, Twitter and other social media sites to attract customers, but has stopped publishing on sites like ArtStation that share human-generated content as well as AI-generated content. Also posts content.
Mr. Cooley said, “It feels like blatant theft from me and other artists.” “It puts a pit of existential dread in my stomach.”
At Archive of Our Own, a fan fiction database with more than 11 million stories, writers have increasingly pressured the site to ban data-scraping and AI-generated stories.
In May, when some Twitter accounts shared examples of ChatGPT mimicking the style of popular fan fiction posted on Archive of Our Own, dozens of authors were up in arms. They blocked their stories and wrote subversive content to mislead AI scrapers. He also pressured the leaders of Archive of Our Own to stop allowing AI-generated content.
Betsy Rosenblatt, who provides legal advice to Archive of Our Own and is a professor at the University of Tulsa College of Law, said the site’s policy was “maximum inclusivity” and that she did not want to be in a position to understand which stories have been written. with AI
For fan fiction writer Ms. Lofstedt, the battle against AI happened when she was writing a story about “Horizon Zero Dawn,” a video game where humans fight AI-powered robots in a post-apocalyptic world. That said, some robots in the game were good and some were bad.
But in the real world, he said, “they are being driven to do bad things, because of arrogance and corporate greed.”