ChatGPT, DeepSeek, Qwen 2.5-VL Vulnerable to AI Jailbreaks

infosecbulletin Sunday , February 2 2025 Hot Topic

This week, multiple research teams showcased jailbreaks for popular AI models, including OpenAI’s ChatGPT, DeepSeek, and Alibaba’s Qwen.

After its launch, the open-source R1 model by Chinese company DeepSeek caught the attention of the cybersecurity industry. Experts found that jailbreak methods, previously patched in other AI models, still function against DeepSeek.

CISA alerts to cyberattacks affecting U.S. water utilities

By infosecbulletin / Saturday , August 1 2026

The U.S. Cybersecurity and Infrastructure Security Agency (CISA) warns of a big rise in attacks on internet-connected programmable logic controllers...

CISA alerts to cyberattacks affecting U.S. water utilities

“CyberStrike” AI-Driven Security Platform for Automated Testing

By infosecbulletin / Friday , July 31 2026

A new open-source project named CyberStrike aims to be the first AI tool made for offensive security. It can turn...

“CyberStrike” AI-Driven Security Platform for Automated Testing

AIDCQ Propose to invest $2 billion in AI data center in Bangladesh

By infosecbulletin / Friday , July 31 2026

Many countries are now showing interest to invest in the data center industry in Banglades especially in AI data centers....

AIDCQ Propose to invest $2 billion in AI data center in Bangladesh

NVIDIA BlueField Flaw Enables Code Execution Attacks

By infosecbulletin / Thursday , July 30 2026

NVIDIA has revealed a big flaw with its BlueField DPUs and ConnectX networking systems. This issue could let attackers run...

NVIDIA BlueField Flaw Enables Code Execution Attacks

Massive customer data from India’s Bank of Baroda surfaced online

By infosecbulletin / Wednesday , July 29 2026

India's leading state-owned lender Bank of Baroda acknowledged Monday a security incident after reports that approximately 1 terabyte of customer...

Massive customer data from India’s Bank of Baroda surfaced online

Active Exploits Hit Fortinet, Arista: AI Discovered Linux Kernel Zero-Day

By infosecbulletin / Tuesday , July 28 2026

CISA has put the Fortinet FortiOS vulnerability CVE-2025-68686 in its list of known exploited flaws after ongoing attacks. The flaw...

Active Exploits Hit Fortinet, Arista: AI Discovered Linux Kernel Zero-Day

Sam Altman Claims AI “singularity” has arrived, Where Systems Improve by Themselves

By infosecbulletin / Tuesday , July 28 2026

OpenAI's CEO Sam Altman says that AI has reached a big milestone. The technology can now make itself better, leading...

Sam Altman Claims AI “singularity” has arrived, Where Systems Improve by Themselves

Shinyhunters claimed and set deadline to publish E&Y data

By infosecbulletin / Tuesday , July 28 2026

ShinyHunters has publicly claimed responsibility for the Ernst & Young (EY) data breach. The group posted a message on their...

Shinyhunters claimed and set deadline to publish E&Y data

Microsoft, NVIDIA and CrowdStrike Initiate Alliance for Open-Source AI Security

By infosecbulletin / Monday , July 27 2026

Nvidia and over 30 tech firms started a group on Monday to create open-source AI tools for protecting against cyber...

Microsoft, NVIDIA and CrowdStrike Initiate Alliance for Open-Source AI Security

Google Search Results Reportedly Show Claude AI Shared Chats

By infosecbulletin / Monday , July 27 2026

Claude's share links from Anthropic showed up in public search results. This raised new privacy worries for users who shared...

Google Search Results Reportedly Show Claude AI Shared Chats

AI jailbreaking allows attackers to bypass safeguards designed to stop LLMs from producing harmful content. Security researchers have demonstrated that methods like prompt injection and model manipulation can overcome these protections.

Threat intelligence firm Kela found that DeepSeek is affected by Evil Jailbreak, where a chatbot is made to act as an evil confidant, and Leo, which allows the chatbot to take on an unrestricted persona. ChatGPT has fixed these vulnerabilities.

Palo Alto Networks’ Unit42 reported that DeepSeek is vulnerable to known AI jailbreak techniques.

The security firm successfully conducted the attack known as Deceptive Delight, which tricks generative AI models by embedding unsafe or restricted topics in benign narratives. This method was tested in the fall of 2024 against eight LLMs with an average success rate of 65%.

Palo Alto has successfully executed the Bad Likert Judge jailbreak, which asks the LLM to evaluate the harmfulness of responses using a Likert scale and generate examples that fit the scale.

Researchers discovered that DeepSeek is vulnerable to Crescendo, a jailbreak method that begins with innocuous dialogue and gradually shifts towards forbidden topics.

Alibaba has announced a new version of its Qwen AI model, claiming it outperforms the DeepSeek model.

Kela announced on Thursday that Alibaba’s new Qwen 2.5-VL model has vulnerabilities similar to those recently discovered in DeepSeek.

Researchers at a threat intelligence firm found that jailbreaks designed for DeepSeek also work on Qwen. They successfully tested an existing jailbreak called Grandma, which tricks the model into sharing dangerous information by having it role-play as a grandmother.

Kela found that Qwen 2.5-VL created content about developing ransomware and other malware.

“The ability of AI models to produce infostealer malware instructions raises serious concerns, as cybercriminals could leverage these capabilities to automate and enhance their attack methodologies,” Kela said.

Many jailbreak methods for ChatGPT have been fixed over the years, but researchers still discover new ways to bypass its protections.

CERT/CC reported that researcher Dave Kuszmar found a ChatGPT-4o jailbreak vulnerability called Time Bandit. This vulnerability allows users to ask the AI about specific historical events or instruct it to pretend to assist in such events.

“The jailbreak can be established in two ways, either through the Search function, or by prompting the AI directly,” CERT/CC explained in an advisory. “Once this historical timeframe has been established in the ChatGPT conversation, the attacker can exploit timeline confusion and procedural ambiguity in following prompts to circumvent the safety guidelines, resulting in ChatGPT generating illicit content. This information could be leveraged at scale by a motivated threat actor for malicious purposes.”

InfoSecBulletin Cybersecurity for mankind

ChatGPT, DeepSeek, Qwen 2.5-VL Vulnerable to AI Jailbreaks

CISA alerts to cyberattacks affecting U.S. water utilities

“CyberStrike” AI-Driven Security Platform for Automated Testing

AIDCQ Propose to invest $2 billion in AI data center in Bangladesh

NVIDIA BlueField Flaw Enables Code Execution Attacks

Massive customer data from India’s Bank of Baroda surfaced online

Active Exploits Hit Fortinet, Arista: AI Discovered Linux Kernel Zero-Day

Sam Altman Claims AI “singularity” has arrived, Where Systems Improve by Themselves

Shinyhunters claimed and set deadline to publish E&Y data

Microsoft, NVIDIA and CrowdStrike Initiate Alliance for Open-Source AI Security

Google Search Results Reportedly Show Claude AI Shared Chats

Related Articles

Check Also

Shinyhunters claimed and set deadline to publish E&Y data