Security researchers found that datasets used by companies to develop large language models included API keys, passwords, and other sensitive credentials.
Large language models are dominating the online landscape, with companies promoting AI solutions that claim to solve all problems.
By infosecbulletin
/ Tuesday , April 1 2025
Israeli cybersecurity firm Check Point has responded to a hacker who claimed to have stolen valuable information from its systems....
Read More
By infosecbulletin
/ Tuesday , April 1 2025
Apple has issued an urgent security advisory about 3 critical zero-day vulnerabilities—CVE-2025-24200, CVE-2025-24201, and CVE-2025-24085—that are being actively exploited in...
Read More
By infosecbulletin
/ Tuesday , April 1 2025
GreyNoise has detected a sharp increase in login scanning aimed at Palo Alto Networks PAN-OS GlobalProtect portals. In the past...
Read More
By infosecbulletin
/ Monday , March 31 2025
Canon has announced a critical security vulnerability, CVE-2025-1268, in printer drivers for its production printers, multifunction printers, and laser printers....
Read More
By infosecbulletin
/ Sunday , March 30 2025
RamiGPT is an AI security tool that targets root accounts. Using PwnTools and OpwnAI, it quickly navigated privilege escalation scenarios...
Read More
By infosecbulletin
/ Sunday , March 30 2025
Cybersecurity researcher Jeremiah Fowler recently revealed a sensitive data exposure involving the Australian fintech company Vroom by YouX, previously known...
Read More
By infosecbulletin
/ Sunday , March 30 2025
Safety Detectives' Cybersecurity Team found a forum post where a threat actor shared a .CSV file with over 200 million...
Read More
By infosecbulletin
/ Saturday , March 29 2025
The Federal Bureau of Investigation (FBI) is probing the cyberattack at Oracle (ORCL.N), opens new tab that has led to...
Read More
By infosecbulletin
/ Thursday , March 27 2025
OpenAI has increased its maximum bug bounty payout to $100,000, up from $20,000, to encourage the discovery of critical vulnerabilities...
Read More
By infosecbulletin
/ Thursday , March 27 2025
Splunk has released a security advisory about critical vulnerabilities in Splunk Enterprise and Splunk Cloud Platform. These issues could lead...
Read More
For an AI to be effective, it needs extensive training data, much of which is gathered from the Internet by specialized companies and organizations.
Common Crawl provides datasets for companies to train their AI, gathering information from the internet, which may include sensitive data.
Researchers from Truffle Security discovered that credentials, API keys, and passwords are being exposed. The main issue is that some web developers hardcode sensitive information into websites, which then ends up in LLM training data.
Researchers discovered 11,908 live secrets, such as API keys and passwords, across 2.76 million websites.
“Leaked keys in Common Crawl’s dataset should not reflect poorly on their organization; it’s not their fault developers hardcode keys in front-end HTML and JavaScript on web pages they don’t control. And Common Crawl should not be tasked with redacting secrets; their goal is to provide a free, public dataset based on the public Internet for organizations like Truffle Security to conduct this type of research,” explained the researchers.
Companies that create LLMs have warned against hardcoding sensitive information on websites. They advise avoiding this practice, as users may unintentionally share the code in their work, worsening the issue.