AI researcher Sam Paech has created a new test, Spiral-Bench, that shows how some AI models can trap users in "escalatory delusion loops." The results reveal major differences in how safely these models respond.<br /> The article Spiral-Bench shows which AI models most strongly reinforce users' delusional thinking appeared first on THE DECODER. [...]
For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same. OpenAI's GPT-5 family, Anthropic's Claude [...]
Even as concern and skepticism grows over U.S. AI startup OpenAI's buildout strategy and high spending commitments, Chinese open source AI providers are escalating their competition and one has e [...]
Is AI leaving the era of "turn-based" chat?Right now, all of us who use AI models regularly for work or in our personal lives know that the basic interaction mode across text, imagery, audio [...]
OpenAI has been hit with a wrongful death lawsuit after a man killed his mother and took his own life back in August, according to a report by The Verge. The suit names CEO Sam Altman and accuses Chat [...]
The developers of Terminal-Bench, a benchmark suite for evaluating the performance of autonomous AI agents on real-world terminal-based tasks, have released version 2.0 alongside Harbor, a new framewo [...]
The AI race lately has felt a bit like a game of tennis: first, Anthropic releases a new, pricey state-of-the-art proprietary model for general users (Claude Opus 4.7), then, a week or so later, its r [...]
Watch out, DeepSeek and Qwen! There's a new king of open source large language models (LLMs), especially when it comes to something enterprises are increasingly valuing: agentic tool use — that [...]
Baidu Inc., China's largest search engine company, released a new artificial intelligence model on Monday that its developers claim outperforms competitors from Google and OpenAI on several visio [...]