Run a prompt injection attack against Claude Opus 4.6 in a constrained coding environment, and it fails every time, 0% success rate across 200 attempts, no safeguards needed. Move that same attack to a GUI-based system with extended thinking enabled, and the picture changes fast. A single attempt gets through 17.8% of the time without safeguards. By the 200th attempt, the breach rate hits 78.6% without safeguards and 57.1% with them.The latest models’ 212-page system card, released February 5, breaks out attack success rates by surface, by attempt count, and by safeguard configuration. Why surface-level differences determine enterprise riskFor years, prompt injection was a known risk that no one quantified. Security teams treated it as theoretical. AI developers treated it as a research [...]
Model providers want to prove the security and robustness of their models, releasing system cards and conducting red-team exercises with each new release. But it can be difficult for enterprises to pa [...]
Anthropic on Monday launched the most ambitious consumer AI agent to date, giving its Claude chatbot the ability to directly control a user's Mac — clicking buttons, opening applications, typin [...]
In the race to deploy generative AI for coding, the fastest tools are not winning enterprise deals. A new VentureBeat analysis, combining a comprehensive survey of 86 engineering teams with our own ha [...]
Unrelenting, persistent attacks on frontier models make them fail, with the patterns of failure varying by model and developer. Red teaming shows that it’s not the sophisticated, complex attacks tha [...]
Perplexity, the AI-powered search company valued at $20 billion, announced on Wednesday at its inaugural Ask 2026 developer conference that its multi-model AI agent, Computer, is now available to ente [...]
For the first time on a major AI platform release, security shipped at launch — not bolted on 18 months later. At Nvidia GTC this week, five security vendors announced protection for Nvidia's a [...]
It's refreshing when a leading AI company states the obvious. In a detailed post on hardening ChatGPT Atlas against prompt injection, OpenAI acknowledged what security practitioners have known fo [...]
Anthropic on Thursday released Claude Opus 4.6, a major upgrade to its flagship artificial intelligence model that the company says plans more carefully, sustains longer autonomous workflows, and outp [...]
Anthropic on Monday released Code Review, a multi-agent code review system built into Claude Code that dispatches teams of AI agents to scrutinize every pull request for bugs that human reviewers rout [...]