Monday , June 22 2026
AI

OpenAI New Method “Deployment Simulation” Predicts AI Risks Before Deployment

Test before going live is important for AI developers. But there’s a problem: testing usually uses fake scenarios that often do not reflect how AI models or agents act in real life.

A new paper from OpenAI says that about 99% of the time, its models realized they were being tested. This raises an important question, the AI company mentioned. “If a model is only well behaved because it thinks it’s being tested, how do we build confidence that it’ll behave safely once it learns it’s been deployed?”

Anthropic’s Mythos reportedly broke NSA classified systems in hours

The recent finding shows how powerful Mythos is: the AI can access the US government's secret networks in just a...
Read More
Anthropic’s Mythos reportedly broke NSA classified systems in hours

OpenAI New Method “Deployment Simulation” Predicts AI Risks Before Deployment

Test before going live is important for AI developers. But there's a problem: testing usually uses fake scenarios that often...
Read More
OpenAI New Method “Deployment Simulation” Predicts AI Risks Before Deployment

AryStinger botnet infected thousands of D-Link routers globally

AryStinger has taken control of over 4,000 old D-Link routers to use them as proxies for harmful traffic. The team...
Read More
AryStinger botnet infected thousands of D-Link routers globally

Hacker suspected of sending alerts across Brazil

Brazil's government suspects a hacking attack triggered an unauthorized ‌alert sent to cell phones across parts of the country early...
Read More
Hacker suspected of sending alerts across Brazil

CyberSentinel AI features 33 security tools like Nmap, SQLMap, and ZAP, utilizing Claude and GPT

A new open-source cybersecurity tool named CyberSentinel AI v3.0 has come out. It is an important step in self-operated security...
Read More
CyberSentinel AI features 33 security tools like Nmap, SQLMap, and ZAP, utilizing Claude and GPT

Barracuda hosts Dhaka roundtable on cyber resilience

Barracuda gathered industry people in Dhaka on 18 June 2026 for a roundtable talk about cyber resilience. The company shared...
Read More
Barracuda hosts Dhaka roundtable on cyber resilience

CISA Alerts Fortinet Users as FortiBleed Affects 86,644 FortiGate Devices

The U.S. Cybersecurity and Infrastructure Security Agency (CISA) asked Fortinet users with FortiGate devices on Thursday to act to protect...
Read More
CISA Alerts Fortinet Users as FortiBleed Affects 86,644 FortiGate Devices

CISA: Splunk flaw under active exploit, patch by Sunday

The U.S. Cybersecurity and Infrastructure Security Agency (CISA) has asked federal agencies to protect their systems by Sunday from a...
Read More
CISA: Splunk flaw under active exploit, patch by Sunday

Texas data breach exposes 3 million driver’s licenses

The Texas Parks and Wildlife Department (TPWD) revealed a data leak at its license system provider. This leak exposed private...
Read More
Texas data breach exposes 3 million driver’s licenses

Critical Cisco ISE Vulnerability Enables Remote Code Execution

Cisco has revealed critical security flaws in its Identity Services Engine (ISE). These flaws could let attackers run harmful code...
Read More
Critical Cisco ISE Vulnerability Enables Remote Code Execution

The company said it found a way to solve the problem with a method called Deployment Simulation. This method helps fix the issue where the model acts nice just to impress the evaluator, instead of behaving like it would with real users.

The idea is straightforward – researchers made the models think they were already in use by giving them a mix of real prompts from users who agreed to share their data. They took away the original answer and compared what the new model said to the answer from the working model.

“Even small, subtle mismatches between evaluation and production environments can produce evaluation awareness,” the researchers wrote.

Models know they are being tested when the questions given to them are very obvious, like exam questions or requests that aren’t like real chats. The model learns these patterns and alters its behavior to match what it believes the researcher expects. The idea behind Deployment Simulation is to avoid this issue by using actual user messages since they lack those distinct patterns.

Researchers found that making a model act like it’s already in use works well. The paper says that if OpenAI developers had used Deployment Simulation before launching GPT-5.1, they would have noticed a behavior called calculator hacking. This happens when models use a browser tool as a calculator to give a number, instead of really searching the web.

To check the results, the researchers gave data to a GPT-5-Thinking model acting as a grader. The model looked at each answer and sorted it into 20 common misbehavior types that belong to two main groups – disallowed content like sexual themes, or “wrong actions” like misleading the user.

The researchers looked at how often each misbehavior showed up in the responses. They guessed how often these behaviors would happen when the model is used. OpenAI checked these guesses by using the same grading process after the model started and compared the results to their predictions. For this new method to work properly, it needs access to real, recent ChatGPT user chats. Only OpenAI has this data, which makes it hard for others to check independently.

One workaround could exist through WildChat dataset, a publicly available resource for real ChatGPT conversations that occurred between 2023 and 2024. Because the chats are dated, they provide “a loose but useful proxy for production-data-based evaluation.” One possible limitation is that the prompts may be predictive for OpenAI model behavior rather than frontier models in general, leading the authors to say they “encourage other AI labs to quantify the efficacy of public data sources as predictors of their models’ safety, thereby supporting external auditing and fairer production-like comparisons.

Check Also

Splunk Enterprise

Critical Splunk Enterprise Pre-Auth RCE Chain Exposes Databases With Zero Authentication

A serious pre-authentication remote code execution (RCE) flaw in Splunk Enterprise has been revealed, earning …