Security researchers found that datasets used by companies to develop large language models included API keys, passwords, and other sensitive credentials.
Large language models are dominating the online landscape, with companies promoting AI solutions that claim to solve all problems.
By infosecbulletin
/ Thursday , April 24 2025
SonicWall has revealed a vulnerability in its SonicOS SSLVPN Virtual Office interface that could let remote attackers crash firewall appliances....
Read More
By infosecbulletin
/ Thursday , April 24 2025
GitLab has announced a security advisory urging users to upgrade their self-managed installations right away. Versions 17.11.1, 17.10.5, and 17.9.7...
Read More
By infosecbulletin
/ Wednesday , April 23 2025
Imdadul Haque, the president of Internet Service Provider of Bangladesh (ISPAB) said, I automatically got back my WhatsApp account. What...
Read More
By infosecbulletin
/ Wednesday , April 23 2025
Zyxel Networks has issued critical security patches for two high-severity vulnerabilities in its USG FLEX H series firewalls. These flaws...
Read More
By infosecbulletin
/ Wednesday , April 23 2025
South Korea's largest mobile operator, SK Telecom, is warning that a malware infection allowed threat actors to access sensitive USIM-related...
Read More
By infosecbulletin
/ Wednesday , April 23 2025
Security researcher Matt Keeley showed that artificial intelligence can now develop working exploits for critical vulnerabilities before public proof-of-concept (PoC)...
Read More
By infosecbulletin
/ Tuesday , April 22 2025
Several vulnerabilities have been found in TP-Link routers, exposing users to serious security risks from SQL injection flaws in their...
Read More
By infosecbulletin
/ Tuesday , April 22 2025
SSL.com has revealed a major security flaw in its domain validation system, which could enable attackers to acquire fake SSL...
Read More
By infosecbulletin
/ Tuesday , April 22 2025
Amazon has paused some data center lease negotiations for its cloud division, particularly in international markets, according to Wells Fargo...
Read More
By infosecbulletin
/ Monday , April 21 2025
ELUSIVE COMET is a threat actor conducting a sophisticated attack campaign that uses Zoom's remote control feature to access victims'...
Read More
For an AI to be effective, it needs extensive training data, much of which is gathered from the Internet by specialized companies and organizations.
Common Crawl provides datasets for companies to train their AI, gathering information from the internet, which may include sensitive data.
Researchers from Truffle Security discovered that credentials, API keys, and passwords are being exposed. The main issue is that some web developers hardcode sensitive information into websites, which then ends up in LLM training data.
Researchers discovered 11,908 live secrets, such as API keys and passwords, across 2.76 million websites.
“Leaked keys in Common Crawl’s dataset should not reflect poorly on their organization; it’s not their fault developers hardcode keys in front-end HTML and JavaScript on web pages they don’t control. And Common Crawl should not be tasked with redacting secrets; their goal is to provide a free, public dataset based on the public Internet for organizations like Truffle Security to conduct this type of research,” explained the researchers.
Companies that create LLMs have warned against hardcoding sensitive information on websites. They advise avoiding this practice, as users may unintentionally share the code in their work, worsening the issue.