Security researchers found that datasets used by companies to develop large language models included API keys, passwords, and other sensitive credentials.
Large language models are dominating the online landscape, with companies promoting AI solutions that claim to solve all problems.
By infosecbulletin
/ Thursday , July 2 2026
Nepal has started a 'Hall of Fame' program to honor cybersecurity researchers who safely report security flaws in government digital...
Read More
By infosecbulletin
/ Thursday , July 2 2026
The Shadowserver Foundation found about 950 Oracle E-Business Suite (EBS) systems on the internet around the world. This discovery came...
Read More
By infosecbulletin
/ Thursday , July 2 2026
The Indian government issued a notice WhatsApp planned to roll out its new 'username' feature. They are worried about fake...
Read More
By infosecbulletin
/ Wednesday , July 1 2026
Cybersecurity researchers have warned of a "massive, ongoing, automated password spray attack" aimed at Microsoft's Azure command-line interface (CLI), compromising...
Read More
By infosecbulletin
/ Wednesday , July 1 2026
Chrome 151 has a new update that fixes 382 security problems. This includes 15 critical issues that could allow attackers...
Read More
By infosecbulletin
/ Wednesday , July 1 2026
Apple released security updates on Monday for iOS, macOS, and Safari. These updates fix more than thirty issues, including four...
Read More
By infosecbulletin
/ Tuesday , June 30 2026
Attackers are now using a flaw (called CVE-2026-46817) in the Oracle E-Business Suite (EBS) financial app, according to the security...
Read More
By infosecbulletin
/ Tuesday , June 30 2026
WhatsApp is about to release a big update that may change how people communicate on the app. Soon, users can...
Read More
By infosecbulletin
/ Monday , June 29 2026
The Linux Foundation said on Thursday that they are starting a new project to fix flaws in open source software...
Read More
By infosecbulletin
/ Sunday , June 28 2026
KDDI Corporation, a Japanese telecom company, revealed a data breach. Hackers got into one of its email systems that five...
Read More
For an AI to be effective, it needs extensive training data, much of which is gathered from the Internet by specialized companies and organizations.
Common Crawl provides datasets for companies to train their AI, gathering information from the internet, which may include sensitive data.
Researchers from Truffle Security discovered that credentials, API keys, and passwords are being exposed. The main issue is that some web developers hardcode sensitive information into websites, which then ends up in LLM training data.
Researchers discovered 11,908 live secrets, such as API keys and passwords, across 2.76 million websites.
“Leaked keys in Common Crawl’s dataset should not reflect poorly on their organization; it’s not their fault developers hardcode keys in front-end HTML and JavaScript on web pages they don’t control. And Common Crawl should not be tasked with redacting secrets; their goal is to provide a free, public dataset based on the public Internet for organizations like Truffle Security to conduct this type of research,” explained the researchers.
Companies that create LLMs have warned against hardcoding sensitive information on websites. They advise avoiding this practice, as users may unintentionally share the code in their work, worsening the issue.