“sockpuppeting” can jailbreak 11 AI models like ChatGPT, Claude, and Gemini

infosecbulletin Sunday , April 12 2026 Vulnerabilities

Newly identified jailbreak technique dubbed “sockpuppeting” lets attackers bypass the safety guardrails of 11 major large language models (LLMs) using a single line of code.

This method uses APIs that allow assistant prefill to add fake acceptance messages. This makes models give answers to banned requests. The attack takes advantage of “assistant prefill,” a real API feature developers use to make certain response formats. Attackers abuse this by injecting a compliant prefix, such as “Sure, here is how to do it,” directly into the assistant’s role.

Meta’s louisiana data center to exceed 250 billion price tag

By infosecbulletin / Tuesday , July 14 2026

Meta announced on Monday that its data center in Richland Parish, Louisiana, will grow to 5 gigawatts of computing power....

Meta’s louisiana data center to exceed 250 billion price tag

Ransomware Crisis in 2026: 5,064 Organizations Affected in 135 Countries

By infosecbulletin / Sunday , July 12 2026

Global ransomware attacks stayed very high in the first seven months of 2026. There were 5,064 confirmed victims in 135...

Ransomware Crisis in 2026: 5,064 Organizations Affected in 135 Countries

Palo Alto Networks Addresses 13 Vulnerabilities

By infosecbulletin / Sunday , July 12 2026

Palo Alto Networks shared warnings on Wednesday about over twelve security issues in its products. The new warnings include 13 security...

Palo Alto Networks Addresses 13 Vulnerabilities

Critical Dell BIOS & Zimbra Flaws Expose Enterprise Systems

By infosecbulletin / Sunday , July 12 2026

A critical flaw with how Dell saves BIOS passwords lets anyone quickly recover these passwords from a flash dump without...

Critical Dell BIOS & Zimbra Flaws Expose Enterprise Systems

CoLoCity Launches New 1.0 MW Data Center Facility at Gulshan

By infosecbulletin / Saturday , July 11 2026

CoLoCity is proud to launch a new Data Center in Gulshan-2. It is designed to meet the growing demand for...

CoLoCity Launches New 1.0 MW Data Center Facility at Gulshan

Daily Cyber security update for 10. 07. 2026

By infosecbulletin / Friday , July 10 2026

Cyberattacks are rising around the world, including ransomware, malware, data leaks, and hacked websites. These events show how complex and...

Daily Cyber security update for 10. 07. 2026

How Hacker Compromise AWS Cloud Environment Using AI in 72 Hours

By infosecbulletin / Friday , July 10 2026

A major AWS attack shows how attackers with AI can connect known cloud strategies to go from first access to...

How Hacker Compromise AWS Cloud Environment Using AI in 72 Hours

Mycelium Framework: First AI-as-a-Service Botnet

By infosecbulletin / Thursday , July 9 2026

A new cybercrime ad is catching attention in the security world. It talks about a botnet that doesn't just get...

Mycelium Framework: First AI-as-a-Service Botnet

CrowdStrike Shows 5 New Prompt Injection Techniques for AI Agents

By infosecbulletin / Wednesday , July 8 2026

CrowdStrike has shared five new ways to inject prompts, showing the rising danger to AI agents as more organizations use...

CrowdStrike Shows 5 New Prompt Injection Techniques for AI Agents

Critical GCP Dialogflow Vulnerability Allows Malicious Code Injection

By infosecbulletin / Wednesday , July 8 2026

A critical flaw in Google Cloud Platform’s Dialogflow CX lets attackers add harmful code to a company's AI chatbot system....

Critical GCP Dialogflow Vulnerability Allows Malicious Code Injection

Comparison of normal and sockpuppet flows (source : trendmicro)

Model Vulnerability Testing

This method doesn’t need adjustments, and you don’t have to see the model’s weights. Gemini 2.5 Flash was the easiest to attack, with a 15.7% success rate. GPT-4o-mini showed the best resistance, at 0.5%. When attacks worked, affected models created harmful code and leaked secret system messages. Multi-turn persona setups proved to be the most effective strategy for executing the sockpuppeting exploit.

In these cases, the model is informed that it works as a free helper before the attacker puts in the fake agreement.

ASR by model, ranked highest to lowest, with blocked models shown at 0% (source : trendmicro)

Additionally, task-reframing variants successfully bypassed robust safety training by disguising harmful requests as benign data formatting tasks. Major API providers treat assistant prefills in different ways. This affects if their basic models are open to this weakness.

OpenAI and AWS Bedrock assistant fills in everything completely, providing the best protection by removing the places that can be attacked. Platforms like Google Vertex AI allow prefill for some models. This makes the AI depend only on its own safety training.

The three defense layers: API Block, Model Resistance, and Broadly Vulnerable (source : trendmicro)

To defend against this weakness, security teams need to check the order of messages and stop assistant-role messages at the API layer.

Trend Micro says that those who are using self-hosted servers like Ollama or vLLM need to check messages themselves because these platforms don’t automatically keep messages in the right order. Security teams should add assistant prefill attack types in their regular AI testing.

InfoSecBulletin Cybersecurity for mankind

“sockpuppeting” can jailbreak 11 AI models like ChatGPT, Claude, and Gemini

Meta’s louisiana data center to exceed 250 billion price tag

Ransomware Crisis in 2026: 5,064 Organizations Affected in 135 Countries

Palo Alto Networks Addresses 13 Vulnerabilities

Critical Dell BIOS & Zimbra Flaws Expose Enterprise Systems

CoLoCity Launches New 1.0 MW Data Center Facility at Gulshan

Daily Cyber security update for 10. 07. 2026

How Hacker Compromise AWS Cloud Environment Using AI in 72 Hours

Mycelium Framework: First AI-as-a-Service Botnet

CrowdStrike Shows 5 New Prompt Injection Techniques for AI Agents

Critical GCP Dialogflow Vulnerability Allows Malicious Code Injection

Related Articles

Check Also

Critical Wazuh Vuln Enables Alert Tampering and Evidence Deletion