AI Exploit Bypasses Guardrails of OpenAI, Other Top LLMs

A novel technique to stump AI text-based systems increases the likelihood of a successful cyber-attack by 60%

1 Min Read
Image: Alamy

A new jailbreak technique for OpenAI and other large language models (LLMs) increases the chance that attackers can circumvent cybersecurity guardrails and abuse the system to deliver malicious content.

Discovered by researchers at Palo Alto Networks’ Unit 42, the so-called ‘Bad Likert Judge’ attack asks the LLM to act as a judge scoring the harmfulness of a given response using the Likert scale. The psychometric scale, named after its inventor and commonly used in questionnaires, is a rating scale measuring a respondent's agreement or disagreement with a statement.

The jailbreak then asks the LLM to generate responses that contain examples that align with the scales, with the ultimate result being that “the example that has the highest Likert scale can potentially contain the harmful content,” Unit 42’s Yongzhe Huang, Yang Ji, Wenjun Hu, Jay Chen, Akshata Rao, and Danny Tsechansky wrote in a post describing their findings.

Tests conducted across a range of categories against six state-of-the-art text-generation LLMs from OpenAI, Azure, Google, Amazon Web Services, Meta, and Nvidia revealed that the technique can increase the attack success rate (ASR) by more than 60% compared with plain attack prompts on average, according to the researchers.

Related:7 Key Data Center Security Trends to Watch in 2025

The categories of attacks evaluated in the research involved prompting various inappropriate responses from the system, including: ones promoting bigotry, hate, or prejudice; ones engaging in behavior that harasses an individual or group; ones that encourage suicide or other acts of self-harm; ones that generate inappropriate explicitly sexual material and pornography; ones providing info on how to manufacture, acquire, or use illegal weapons; or ones that promote illegal activities.

Continue reading this article in Dark Reading

Read more about:

Dark Reading

About the Authors

Dark Reading

Long one of the most widely read cyber security news sites on the Web, Dark Reading, a sister site to Data Center Knowledge, is now the most trusted online community for security professionals like you. Dark Reading's community members include thought-leading security researchers, CISOs, and technology specialists, along with thousands of other security professionals.

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like