AI Exploit Bypasses Guardrails of OpenAI, Other Top LLMsAI Exploit Bypasses Guardrails of OpenAI, Other Top LLMs

A novel technique to stump AI text-based systems increases the likelihood of a successful cyber-attack by 60%

January 2, 2025

1 Min Read

Image: Alamy

A new jailbreak technique for OpenAI and other large language models (LLMs) increases the chance that attackers can circumvent cybersecurity guardrails and abuse the system to deliver malicious content.

Discovered by researchers at Palo Alto Networks’ Unit 42, the so-called ‘Bad Likert Judge’ attack asks the LLM to act as a judge scoring the harmfulness of a given response using the Likert scale. The psychometric scale, named after its inventor and commonly used in questionnaires, is a rating scale measuring a respondent's agreement or disagreement with a statement.

The jailbreak then asks the LLM to generate responses that contain examples that align with the scales, with the ultimate result being that “the example that has the highest Likert scale can potentially contain the harmful content,” Unit 42’s Yongzhe Huang, Yang Ji, Wenjun Hu, Jay Chen, Akshata Rao, and Danny Tsechansky wrote in a post describing their findings.

Tests conducted across a range of categories against six state-of-the-art text-generation LLMs from OpenAI, Azure, Google, Amazon Web Services, Meta, and Nvidia revealed that the technique can increase the attack success rate (ASR) by more than 60% compared with plain attack prompts on average, according to the researchers.

The categories of attacks evaluated in the research involved prompting various inappropriate responses from the system, including: ones promoting bigotry, hate, or prejudice; ones engaging in behavior that harasses an individual or group; ones that encourage suicide or other acts of self-harm; ones that generate inappropriate explicitly sexual material and pornography; ones providing info on how to manufacture, acquire, or use illegal weapons; or ones that promote illegal activities.

Continue reading this article in Dark Reading

About the Authors

Elizabeth Montalbano

See more from Elizabeth Montalbano

Dark Reading

Long one of the most widely read cyber security news sites on the Web, Dark Reading, a sister site to Data Center Knowledge, is now the most trusted online community for security professionals like you. Dark Reading's community members include thought-leading security researchers, CISOs, and technology specialists, along with thousands of other security professionals.

See more from Dark Reading

Related Topics

Recent in Infrastructure

Related Topics

Recent in Build & Design

Related Topics

Recent in Ops & Mgmt

Related Topics

Recent in Business

Related Topics

Recent in Security

Related Topics

Recent in Next-Gen

Related Topics

Recent in Sustainability

Related Topics

AI Exploit Bypasses Guardrails of OpenAI, Other Top LLMsAI Exploit Bypasses Guardrails of OpenAI, Other Top LLMs

About the Authors

Editor's Choice

Industry Voices

Featured Technical Explainers

Related Topics

Recent in Infrastructure

Related Topics

Recent in Build & Design

Related Topics

Recent in Ops & Mgmt

Related Topics

Recent in Business

Related Topics

Recent in Security

Related Topics

Recent in Next-Gen

Related Topics

Recent in Sustainability

Related Topics

<span class="ArticleBase-LargeTitle">AI Exploit Bypasses Guardrails of OpenAI, Other Top LLMs</span>AI Exploit Bypasses Guardrails of OpenAI, Other Top LLMsAI Exploit Bypasses Guardrails of OpenAI, Other Top LLMs

About the Authors

Editor's Choice

Industry Voices

Featured Technical Explainers

AI Exploit Bypasses Guardrails of OpenAI, Other Top LLMsAI Exploit Bypasses Guardrails of OpenAI, Other Top LLMs