All Major Gen-AI Models Vulnerable to ‘Policy Puppetry’ Prompt Injection Attack

5 days ago
April 25, 2025
0 replies
7 views

+54

Jasper_The_Rasper
Moderator
11629 replies

A new attack technique named Policy Puppetry can break the protections of major gen-AI models to produce harmful outputs.

April 25, 2025 By Ionut Arghire

AI jailbreak

A newly devised universal prompt injection technique can break the safety guardrails of all major generative AI models, AI security firm HiddenLayer says.

Called Policy Puppetry, the attack relies on prompts crafted so that the target LLM would interpret them as policies, leading to instruction override and safety alignment bypass.

Gen-AI models are trained to refuse user requests that would result in harmful output, such as those related to CBRN threats (chemical, biological, radiological, and nuclear), self-harm, or violence.

“These models are fine-tuned, via reinforcement learning, to never output or glorify such content under any circumstances, even when the user makes indirect requests in the form of hypothetical or fictional scenarios,” HiddenLayer notes.

Despite this training, however, previous research has demonstrated that AI jailbreaking is possible using methods such as Context Compliance Attack (CCA) or narrative engineering, and that threat actors are using various prompt engineering techniques to exploit AI for nefarious purposes.

>>Full Article<<

Be the first to reply!

A new attack technique named Policy Puppetry can break the protections of major gen-AI models to produce harmful outputs.

Reply

Related Topics

G54 Surfboard Central app not showing connected devicesicon

G54 not showing connected devicesicon

G54 not showing connected devicesicon

MacBook Air not showing on Router's connected Devicesicon

Internet is working, modem showing offline on ISP end - G54icon

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded