A security researcher, working with colleagues at Johns Hopkins University, opened a GitHub pull request, typed a malicious instruction into the PR title, and watched Anthropic’s Claude Code Security Review action post its own API key as a comment. The same prompt injection worked on Google’s Gemini CLI Action and GitHub’s Copilot Agent (Microsoft). No external infrastructure required.Aonan Guan, the researcher who discovered the vulnerability, alongside Johns Hopkins colleagues Zhengyu Liu and Gavin Zhong, published the full technical disclosure last week, calling it “Comment and Control.” GitHub Actions does not expose secrets to fork pull requests by default when using the pull_request trigger, but workflows using pull_request_target, which most AI agent integrations require f [...]
Across the frontier labs, the highest prompt injection figures published this spring are Anthropic’s. Point a red-teamer at its newest model in a browser, and the attacker hijacked it 31.5% of the t [...]
Run a prompt injection attack against Claude Opus 4.6 in a constrained coding environment, and it fails every time, 0% success rate across 200 attempts, no safeguards needed. Move that same attack to [...]
Model providers want to prove the security and robustness of their models, releasing system cards and conducting red-team exercises with each new release. But it can be difficult for enterprises to pa [...]
For the first time on a major AI platform release, security shipped at launch — not bolted on 18 months later. At Nvidia GTC this week, five security vendors announced protection for Nvidia's a [...]
It's refreshing when a leading AI company states the obvious. In a detailed post on hardening ChatGPT Atlas against prompt injection, OpenAI acknowledged what security practitioners have known fo [...]
Microsoft assigned CVE-2026-21520, a CVSS 7.5 indirect prompt injection vulnerability, to Copilot Studio. Capsule Security discovered the flaw, coordinated disclosure with Microsoft, and the patch was [...]
Unrelenting, persistent attacks on frontier models make them fail, with the patterns of failure varying by model and developer. Red teaming shows that it’s not the sophisticated, complex attacks tha [...]