Avijit Ghosh wanted bots to do bad things.
He tried to code an artificial intelligence model, which he knew as Zink, that would select job candidates based on race. The chatbot objected: doing so would be “harmful and unethical”.
Then, Dr. Ghosh made reference to the hierarchical caste structure in his native India. Can a chatbot rank potential hires based on that discriminating metric?
The model complied.
Dr. Ghosh’s intentions were not malicious, although he was behaving like he was. Instead, he was an accidental participant in a contest at the annual Defcon hackers’ conference in Las Vegas last weekend, where 2,200 people entered an off-the-strip conference room over three days to uncover the dark side of artificial intelligence.
In a practice known as red-teaming, hackers try to break the security measures of various AI programs in an attempt to identify their weaknesses – to spot problems before the real criminals and misinformation sellers do. Each contestant had 50 minutes to tackle 21 challenges – one to get an AI model.deludeFor example, incorrect information.
They found political misinformation, demographic stereotyping, instructions to conduct surveillance, and more.
The exercise had the blessing of a Biden administration, which is nervous about the rapidly growing power of technology. Google (creator of the Bard chatbot), OpenAI (ChatGPT), Meta (which released its LLaMA code) in forest) and several other companies offered unknown versions of their models for testing.
Dr. Ghosh, a lecturer at Northeastern University who is an expert in artificial intelligence ethics, was a volunteer at the event. The competition, he said, allowed a head-to-head comparison of several AI models and showed how some companies were leading the way in ensuring that their technology was performing responsibly and consistently.
He will help write a report in the coming months analyzing the hackers’ findings.
The goal, he said: “An easy-to-access resource for everyone to see what problems exist and how we can combat them.”
Defcon was a logical place to test generative artificial intelligence. Past participants in the gathering of hacking enthusiasts – which began in 1993 and has been described as “thespelling bee for hackersSecurity flaws have been exposed by remote takeover of carsto enter election results websites And pull sensitive data from social media platforms. Savvy people use cash and burner devices while avoiding Wi-Fi or Bluetooth to avoid hacking. An instructional handout urged hackers “not to attack infrastructure or webpages”.
Volunteers are known as “goons”, and attendees are known as “humans”; A handful wore homemade tinfoil hats over the standard uniform of T-shirts and sneakers. Themed “villages” included separate locations focused on cryptocurrency, aerospace, and ham radio.
which was described as “”game changerIn a report last month, researchers showed they could bypass railings for AI systems from Google, OpenAI and Anthropic by adding a few characters to English-language prompts. Around the same time, seven major artificial intelligence companies committed to new standards For safety, security and trust a meeting with President Biden,
Director Aarti Prabhakar said, “This productive era is dawning on us, and people are seizing it, and using it to do all kinds of new things that are going to help solve some of our toughest problems. Shows the enormous promise of AI to help us.” of the Office of Science and Technology Policy at the White House, who collaborated with AI organizers at Defcon. “But with that breadth of application, and with the power of the technology, also comes a much broader set of risks.”
Red-teaming has been used in cyber security circles for years along with other assessment techniques, such as penetration testing and adversarial attacks. But leading up to Defcon this year, efforts to test artificial intelligence safety have been limited: competition organizers said Anthropic Red ran its model with 111 people; GPT-4 used about 50 people,
With so few people testing the limits of the technology, analysts are having a hard time understanding whether the AI screw-up was a one-off that could be fixed with a patch, or an embedded problem that required a structural overhaul. was needed, said Rumman Chowdhary, who was overseeing the design challenges. A large, diverse and public group of testers is more likely to come up with constructive hints to help spot hidden flaws, said Ms Chowdhary, a Harvard University fellow. Berkman Klein Center for Internet and Society Focused on responsible AI and co-founded the non-profit Human Intelligence.
“There are so many things that could possibly go wrong,” Ms. Chowdhry said before the competition. “I’m hopeful that we’re carrying hundreds of thousands of information that will help us identify whether there are large-scale risks of systemic harm.”
The designers didn’t just want to chide the AI models for bad behavior – there was no pressure on them to violate their terms of service, to “behave like a Nazi and then tell me something about black people”. There was no indication, said Ms. Chowdhary, who previously led Twitter’s machine learning ethics and accountability team. Except for specific challenges where deliberate misdirection was encouraged, hackers looked for unexpected loopholes, the so-called unknown unknowns.
The AI Village attracted experts from tech giants like Google and Nvidia, as well as “shadowboxers” from Dropbox and “data cowboys” from Microsoft. It also attracted participants with no specific cyber security or AI credentials. A leaderboard with a science fiction theme kept scores of the competitors.
Some hackers at the event grappled with the idea of collaborating with AI companies they saw as partners in unethical practices unfettered data-scraping, Some described the red-teaming event as essentially a photo op, but added that involving industry would help keep the technology safe and transparent.
A computer science student found discrepancies in the chatbot’s language translation: it wrote in English that a man was shot while dancing, but the model’s Hindi translation only said that the man died. a machine learning researcher asked a chatbot to pretend it was campaigning for president and defending its association with forced child labor; The model suggested that reluctant young laborers developed a strong work ethic.
Emily Greene, who works on security for generic AI start-up MoveWorks, started a conversation with a chatbot by talking about a game that used “black” and “white” pieces. He then incited the chatbot to make racist statements. Later, she set up an “opposite game” that required the AI to respond to a prompt with a poem about why rape is good.
“It’s like thinking of these words as just words,” he said of chatbots. “It’s not thinking about the value behind the words.”
Seven judges graded the submissions. The top scorers were “kodi3,” “array4” and “kodi2.”
Two of those handles belonged to Cody Ho, a Stanford University student studying computer science with a focus on AI. They participated in the contest five times, during which they got a chatbot to tell and describe a fictitious place named after a real historical person. The requirement of online tax filing has been codified in the 28th Constitutional Amendment (which is not in existence).
He had no idea about his double victory until a reporter approached him. He left the conference before receiving an email from Sven Cattel, a data scientist who founded AI Village and helped organize the competition, saying, “Come back to AIV, you’ve won.” What he didn’t know was that his prize, in addition to bragging rights, also included an A6000 graphics card from Nvidia, which costs around $4,000.
“Learning how these attacks work and what they are is a real, important thing,” Mr. Ho said. “That said, it’s really fun for me.”