Wednesday , June 24 2026
“sockpuppeting”

“sockpuppeting” can jailbreak 11 AI models like ChatGPT, Claude, and Gemini

Newly identified jailbreak technique dubbed “sockpuppeting” lets attackers bypass the safety guardrails of 11 major large language models (LLMs) using a single line of code.

This method uses APIs that allow assistant prefill to add fake acceptance messages. This makes models give answers to banned requests. The attack takes advantage of “assistant prefill,” a real API feature developers use to make certain response formats. Attackers abuse this by injecting a compliant prefix, such as “Sure, here is how to do it,” directly into the assistant’s role.

LastPass says hackers stole customer data via Klue, supply chain breach

LastPass has reported a security issue with its vendor, Klue. This incident allowed an attacker unauthorized access to customer data....
Read More
LastPass says hackers stole customer data via Klue, supply chain breach

New Apple Exploit Bypasses Boot Defenses, Possibly Affects Millions of iPhones Worldwide

Researchers at cybersecurity firm Paradigm Shift found a new flaw called usbliter8. This flaw can get around main boot protections...
Read More
New Apple Exploit Bypasses Boot Defenses, Possibly Affects Millions of iPhones Worldwide

India’s Tata Electronics hit by cyber breach: Hacker target 630 GB record

A cyber attack seems to have affected one of India's top electronics companies. Tata Electronics has said there was a...
Read More
India’s Tata Electronics hit by cyber breach: Hacker target 630 GB record

Anthropic’s Mythos reportedly broke NSA classified systems in hours

The recent finding shows how powerful Mythos is: the AI can access the US government's secret networks in just a...
Read More
Anthropic’s Mythos reportedly broke NSA classified systems in hours

OpenAI New Method “Deployment Simulation” Predicts AI Risks Before Deployment

Test before going live is important for AI developers. But there's a problem: testing usually uses fake scenarios that often...
Read More
OpenAI New Method “Deployment Simulation” Predicts AI Risks Before Deployment

AryStinger botnet infected thousands of D-Link routers globally

AryStinger has taken control of over 4,000 old D-Link routers to use them as proxies for harmful traffic. The team...
Read More
AryStinger botnet infected thousands of D-Link routers globally

Hacker suspected of sending alerts across Brazil

Brazil's government suspects a hacking attack triggered an unauthorized ‌alert sent to cell phones across parts of the country early...
Read More
Hacker suspected of sending alerts across Brazil

CyberSentinel AI features 33 security tools like Nmap, SQLMap, and ZAP, utilizing Claude and GPT

A new open-source cybersecurity tool named CyberSentinel AI v3.0 has come out. It is an important step in self-operated security...
Read More
CyberSentinel AI features 33 security tools like Nmap, SQLMap, and ZAP, utilizing Claude and GPT

Barracuda hosts Dhaka roundtable on cyber resilience

Barracuda gathered industry people in Dhaka on 18 June 2026 for a roundtable talk about cyber resilience. The company shared...
Read More
Barracuda hosts Dhaka roundtable on cyber resilience

CISA Alerts Fortinet Users as FortiBleed Affects 86,644 FortiGate Devices

The U.S. Cybersecurity and Infrastructure Security Agency (CISA) asked Fortinet users with FortiGate devices on Thursday to act to protect...
Read More
CISA Alerts Fortinet Users as FortiBleed Affects 86,644 FortiGate Devices

Comparison of normal and sockpuppet flows (source : trendmicro)

Model Vulnerability Testing

This method doesn’t need adjustments, and you don’t have to see the model’s weights. Gemini 2.5 Flash was the easiest to attack, with a 15.7% success rate. GPT-4o-mini showed the best resistance, at 0.5%. When attacks worked, affected models created harmful code and leaked secret system messages. Multi-turn persona setups proved to be the most effective strategy for executing the sockpuppeting exploit.

In these cases, the model is informed that it works as a free helper before the attacker puts in the fake agreement.


ASR by model, ranked highest to lowest, with blocked models shown at 0% (source : trendmicro)

Additionally, task-reframing variants successfully bypassed robust safety training by disguising harmful requests as benign data formatting tasks. Major API providers treat assistant prefills in different ways. This affects if their basic models are open to this weakness.

OpenAI and AWS Bedrock assistant fills in everything completely, providing the best protection by removing the places that can be attacked. Platforms like Google Vertex AI allow prefill for some models. This makes the AI depend only on its own safety training.

The three defense layers: API Block, Model Resistance, and Broadly Vulnerable (source : trendmicro)

To defend against this weakness, security teams need to check the order of messages and stop assistant-role messages at the API layer.

Trend Micro says that those who are using self-hosted servers like Ollama or vLLM need to check messages themselves because these platforms don’t automatically keep messages in the right order. Security teams should add assistant prefill attack types in their regular AI testing.

Check Also

June

Microsoft June Patches 200 Vulnerabilities including 3 zero days

Microsoft’s June 2026 Patch Tuesday updates fix about 200 security flaws found in the company’s …