Destination
Spiral-Bench shows which AI models most strongly reinforce users' delusional thinking

AI researcher Sam Paech has created a new test, Spiral-Bench, that shows how some AI models can trap users in "escalatory delusion loops." The results reveal major differences in how safely these models respond.<br /> The article Spiral-Bench shows which AI models most strongly reinforce users' delusional thinking appeared first on THE DECODER. [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5, and finds Claude Opus exploiting a benchmark loophole

For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same. OpenAI's GPT-5 family, Anthropic's Claude [...]

Match Score: 127.12

venturebeat
Moonshot's Kimi K2 Thinking emerges as leading open source AI, outperforming GPT-5, Claude Sonnet 4.5 on key benchmarks

Even as concern and skepticism grows over U.S. AI startup OpenAI's buildout strategy and high spending commitments, Chinese open source AI providers are escalating their competition and one has e [...]

Match Score: 121.47

Destination
Ooni’s first departure from pizza ovens is a $799 spiral mixer

Ooni, the Scottish company known for its innovative outdoor pizza ovens, is expanding into a new product category — without sacrificing the brand’s pizza theme. The Halo Pro is a $799 mixer that t [...]

Match Score: 116.60

venturebeat
Thinking Machines shows off preview of near-realtime AI voice and video conversation with new 'interaction models'

Is AI leaving the era of "turn-based" chat?Right now, all of us who use AI models regularly for work or in our personal lives know that the basic interaction mode across text, imagery, audio [...]

Match Score: 97.22

Destination
Lawsuit accuses ChatGPT of reinforcing delusions that led to a woman's death

OpenAI has been hit with a wrongful death lawsuit after a man killed his mother and took his own life back in August, according to a report by The Verge. The suit names CEO Sam Altman and accuses Chat [...]

Match Score: 89.53

venturebeat
Terminal-Bench 2.0 launches alongside Harbor, a new framework for testing agents in containers

The developers of Terminal-Bench, a benchmark suite for evaluating the performance of autonomous AI agents on real-world terminal-based tasks, have released version 2.0 alongside Harbor, a new framewo [...]

Match Score: 79.07

venturebeat
American AI startup Poolside launches free, high-performing open model Laguna XS.2 for local agentic coding

The AI race lately has felt a bit like a game of tennis: first, Anthropic releases a new, pricey state-of-the-art proprietary model for general users (Claude Opus 4.7), then, a week or so later, its r [...]

Match Score: 69.14

venturebeat
MiniMax-M2 is the new king of open source LLMs (especially for agentic tool calling)

Watch out, DeepSeek and Qwen! There's a new king of open source large language models (LLMs), especially when it comes to something enterprises are increasingly valuing: agentic tool use — that [...]

Match Score: 68.75

venturebeat
Baidu just dropped an open-source multimodal AI that it claims beats GPT-5 and Gemini

Baidu Inc., China's largest search engine company, released a new artificial intelligence model on Monday that its developers claim outperforms competitors from Google and OpenAI on several visio [...]

Match Score: 67.91