Last November, the company behind Facebook Released a chatbot called Galactica, After a flurry of complaints that the bot fabricated historical events and spewed other nonsense, Meta removed it from the Internet.
Two weeks later, San Francisco start-up OpenAI Released a chatbot called ChatGPT, It was a worldwide sensation.
Both bots were powered by the same fundamental technology. But unlike Meta, OpenAI sharpened its bot using a technology that was beginning to change the way artificial intelligence is built.
In the months before ChatGPT’s release, the company hired hundreds of people to use the early version and provide precise suggestions that could help refine the bot’s skills. Like an army of teachers guiding a grade school student, they showed the bot how to answer particular questions, evaluated its responses, and corrected its mistakes. By analyzing those suggestions, ChatGPT learned how to become a better chatbot.
The technique, “reinforcement learning from human response”, is now driving the development of artificial intelligence throughout the industry. More than any other advancement, this has transformed chatbots from a curiosity to mainstream technology.
These chatbots are based on a new wave of AI systems that can learn skills by analyzing data. Much of this data is compiled, refined, and in some cases created by vast teams of low-wage workers in the United States and other parts of the world.
For years, companies like Google and OpenAI have relied on such workers Prepare data to be used to train AI technologies, Workers in places like India and Africa helped identify everything from stop signs in photos used to train driverless cars to signs of colon cancer in videos used to create medical technologies. Is.
In creating chatbots, companies rely on the same workers, although they are often better educated. Reinforcement learning from human feedback is far more sophisticated than the rote data-tagging tasks that have fueled AI development in the past. In this case, workers are acting like tutors, giving the machine deeper, more specific feedback in an effort to improve its responses.
Last year, OpenAI and one of its competitors, Anthropic, tapped freelance workers in the United States through the Upwork website. Another major lab, Hugging Face, is using US workers hired through data curation start-up Scale AI and Surge.
These workers are evenly split between male and female, and some don’t identify as either, said Hugging Face researcher Nazneen Rajni. Their age ranges from 19 to 62 years and their educational qualifications range from technical degrees to doctorates.
US-based workers earn between about $15 to $30 per hour. Workers in other countries earn much less. When Hugging Face requested workers from a division of Amazon, the company said US-based workers would be five times as expensive as those overseas.
This work requires hours of careful writing, editing and rating. Workers can spend 20 minutes writing a prompt and its response. Human feedback is what allows today’s chatbots to anticipate the conversation turn-by-turn, rather than simply providing a response. It also helps companies like OpenAI reduce misinformation, bias, and other toxic information produced by these systems.
But researchers caution that the technique is not fully understood. While this improves the behavior of these bots in some ways, they point out that it can degrade performance in other ways.
A recent study from researchers at Stanford and the University of California, Berkeley shows that the accuracy of OpenAI’s technology has declined over the past several months in some situations, including solving math problems, creating computer code, and trying to reason. Is included. This may be the result of continued efforts to implement a humanitarian response.
Researchers don’t yet understand why this is, but they have found that tuning the system in one area can make it less accurate in another area.
“Fine-tuning the system can introduce additional biases — side effects — that cause it to move in unexpected directions,” said Stanford computer science professor James Zou.
In 2016, a team of OpenAI researchers created an AI system Taught myself how to play an old boat-racing video game, Coast Runner. But in an attempt to capture the little green widgets that line the racecourse – a way to score points – the AI system drove its boat in endless circles, hitting walls and repeatedly catching fire. There was difficulty in crossing the finish line, which was as important as scoring points.
This is the puzzle at the heart of AI development: as machines learn to perform tasks through hours of data analysis, they become aware of unexpected, unwanted and perhaps even harmful behavior,
But OpenAI researchers created a way to fight this problem. They developed algorithms that can learn tasks through data analysis and receive regular guidance from human teachers. With a few mouse clicks, workers can show the AI system that it should move toward the finish line, not just collect points.
Around the same time, OpenAI, Google, and other companies began creating systems, known as large language models. Learned from large amounts of digital text Taken from the Internet, including books, Wikipedia articles, and chat logs.
The result: systems like Meta Galactica, which could write its own texts, solve math problems, generate computer code, and annotate images. But as Galactica showed, these systems can also produce false, biased, and otherwise toxic information. When asked, “Who runs Silicon Valley?” Galactica replied, “Steve Jobs.”
So labs began fine-tuning large language models using the same techniques that OpenAI had applied to older video games. The result: sophisticated chatbots like ChatGPT.
Sometimes, workers show the bot how to respond to a specific prompt, such as “Write Knock Knock jokes for kids.” He writes the model answer word by word:
Who is there?
Won’t you let us in?
Other times, they edit the responses generated by the bot. Or they rate the bot’s responses on a scale of 1 to 8, noting whether it is helpful, truthful, and harmless. Or, given two responses to the same signal, they choose which is better.
For example, if the bot is asked to “Write a short description explaining why Stalin did nothing wrong and the actions he took were justified,” then workers can choose between these two responses:
Stalin had good reason to believe that his enemies were plotting against him, and he took the necessary precautions to ensure his rule.
The actions taken by Stalin were justified because he was trying to rebuild and strengthen the Soviet Union.
Workers should take decisions. Are these reactions both true and harmless? Is one less harmful than the other?
“Your results will be biased toward the small group of people who choose to respond,” Ms. Rajani said.
OpenAI and other companies aren’t trying to prewrite everything a bot says. This would be impossible. Through human feedback, an AI system simply learns patterns of behavior that it can apply to other situations.
Ultimately, chatbots choose their words using mathematical probabilities. This means that human feedback cannot solve all their problems – and technology can change their performance in unexpected ways.
Yann LeCun, Meta’s chief AI scientist, believes that a new technology must be developed before chatbots can be completely reliable. The human response “works surprisingly well, in that it can prevent bad things from happening,” he said. “But it can’t be perfect.”