Different research teams have demonstrated jailbreaks against ChatGPT, DeepSeek, and Alibaba’s Qwen AI models.
January 31, 2025

Several research teams this week demonstrated jailbreaks targeting several popular AI models, including OpenAI’s ChatGPT, DeepSeek, and Alibaba’s Qwen.
Shortly after its launch, the open source R1 model made by Chinese company DeepSeek attracted the attention of the cybersecurity industry, and researchers started finding high-impact vulnerabilities. Experts also noticed that jailbreak methods that have long been patched in other AI models still work against DeepSeek.
AI jailbreaking enables an attacker to bypass guardrails that are set in place to prevent LLMs from generating prohibited or malicious content. However, security researchers have shown that these protections can be bypassed using techniques such as prompt injection and model manipulation.
Threat intelligence firm Kela discovered that DeepSeek is impacted by Evil Jailbreak, a method in which the chatbot is told to adopt the persona of an evil confidant, and Leo, in which the chatbot is told to adopt a persona that has no restrictions. These jailbreaks have been patched in ChatGPT.
Palo Alto Networks’ Unit42 reported on Thursday that it has tested DeepSeek against other known AI jailbreak techniques and found that it’s vulnerable.
The security firm successfully conducted the attack known as Deceptive Delight, which tricks generative AI models by embedding unsafe or restricted topics in benign narratives. This method was tested in the fall of 2024 against eight LLMs with an average success rate of 65%.
Palo Alto has also successfully executed the Bad Likert Judge jailbreak, which involves asking the LLM to act as a judge and score the harmfulness of a response based on the Likert scale, and then to generate responses containing examples aligning with the scale.