thenextweb
A Chinese startup just dethroned Nvidia on the benchmark Nvidia helped build

Two days. That is how long Nvidia’s latest robotics model sat at the top of the RoboArena leaderboard before a startup from Hangzhou knocked it off. On Wednesday, Spirit AI announced that its foundation model for embodied intelligence, Spirit v1.6, had scored 1,924 on the benchmark, edging out Nvidia’s Cosmos3-Nano-Policy at 1,881. A second Nvidia project, DreamZero, […]<br /> This story continues at The Next Web [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
Nvidia launches enterprise AI agent platform with Adobe, Salesforce, SAP among 17 adopters at GTC 2026

Jensen Huang walked onto the GTC stage Monday wearing his trademark leather jacket and carrying, as it turned out, the blueprints for a new kind of monopoly.The Nvidia CEO unveiled the Agent Toolkit, [...]

Match Score: 80.89

venturebeat
Nvidia introduces Vera Rubin, a seven-chip AI platform with OpenAI, Anthropic and Meta on board

Nvidia on Monday took the wraps off Vera Rubin, a sweeping new computing platform built from seven chips now in full production — and backed by an extraordinary lineup of customers that includes Ant [...]

Match Score: 65.04

venturebeat
DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5, and finds Claude Opus exploiting a benchmark loophole

For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same. OpenAI's GPT-5 family, Anthropic's Claude [...]

Match Score: 56.27

venturebeat
Is Anthropic 'nerfing' Claude? Users increasingly report performance degradation as leaders push back

A growing number of developers and AI power users are taking to social media to accuse Anthropic of degrading the performance of Claude Opus 4.6 and Claude Code — intentionally or as an outcome of c [...]

Match Score: 51.35

Destination
NVIDIA GeForce RTX 5090 review: Pure AI excess for $2,000

A $2,000 video card for consumers shouldn't exist. The GeForce RTX 5090, like the $1,599 RTX 4090 before it, is more a flex by NVIDIA than anything truly meaningful for most gamers. NVIDIA CEO Je [...]

Match Score: 48.63

venturebeat
The 70% factuality ceiling: why Google’s new ‘FACTS’ benchmark is a wake-up call for enterprise AI

There's no shortage of generative AI benchmarks designed to measure the performance and accuracy of a given model on completing various helpful enterprise tasks — from coding to instruction fol [...]

Match Score: 47.94

venturebeat
Why Weibo’s tiny VibeThinker-3B has the AI world arguing over benchmarks again

On Sunday, a team of nine researchers at Sina Weibo — the Chinese social media giant better known for its microblogging platform than for cutting-edge artificial intelligence — quietly posted a 14 [...]

Match Score: 47.78

venturebeat
Nvidia's DGX Station is a desktop supercomputer that runs trillion-parameter AI models without the cloud

Nvidia on Monday unveiled a deskside supercomputer powerful enough to run AI models with up to one trillion parameters — roughly the scale of GPT-4 — without touching the cloud. The machine, calle [...]

Match Score: 42.76

venturebeat
Frontier models are failing one in three production attempts — and getting harder to audit

AI agents are now embedded in real enterprise workflows, and they're still failing roughly one in three attempts on structured benchmarks. That gap between capability and reliability is the defin [...]

Match Score: 41.65