Skip to main content

All Major Gen-AI Models Vulnerable to ‘Policy Puppetry’ Prompt Injection Attack


Jasper_The_Rasper
Moderator
Forum|alt.badge.img+54

A new attack technique named Policy Puppetry can break the protections of major gen-AI models to produce harmful outputs.

 

April 25, 2025 By Ionut Arghire

 

AI jailbreak

A newly devised universal prompt injection technique can break the safety guardrails of all major generative AI models, AI security firm HiddenLayer says.

Called Policy Puppetry, the attack relies on prompts crafted so that the target LLM would interpret them as policies, leading to instruction override and safety alignment bypass.

Gen-AI models are trained to refuse user requests that would result in harmful output, such as those related to CBRN threats (chemical, biological, radiological, and nuclear), self-harm, or violence.

“These models are fine-tuned, via reinforcement learning, to never output or glorify such content under any circumstances, even when the user makes indirect requests in the form of hypothetical or fictional scenarios,” HiddenLayer notes.

Despite this training, however, previous research has demonstrated that AI jailbreaking is possible using methods such as Context Compliance Attack (CCA) or narrative engineering, and that threat actors are using various prompt engineering techniques to exploit AI for nefarious purposes.

 

>>Full Article<<

0 replies

Be the first to reply!

Reply