Destination
Grok 4 edges out GPT-5 in complex reasoning benchmark ARC-AGI

In the ARC-AGI-2 benchmark, which is designed to measure a language model's general reasoning skills, GPT-5 (High) scored 9.9 percent at a cost of $0.73 per task, according to ARC Prize.<br /> The article Grok 4 edges out GPT-5 in complex reasoning benchmark ARC-AGI appeared first on THE DECODER. [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
Grok 4.1 Fast's compelling dev access and Agent Tools API overshadowed by Musk glazing

Elon Musk's frontier generative AI startup xAI formally opened developer access to its Grok 4.1 Fast models last night and introduced a new Agent Tools API—but the technical milestones were imm [...]

Match Score: 344.74

venturebeat
Musk's xAI launches Grok 4.1 with lower hallucination rate on the web and apps — no API access (for now)

In what appeared to be a bid to soak up some of Google's limelight prior to the launch of its new Gemini 3 flagship AI model — now recorded as the most powerful LLM in the world by multiple ind [...]

Match Score: 209.38

venturebeat
Microsoft built Phi-4-reasoning-vision-15B to know when to think — and when thinking is a waste of time

Microsoft on Tuesday released Phi-4-reasoning-vision-15B, a compact open-weight multimodal AI model that the company says matches or exceeds the performance of systems many times its size — while co [...]

Match Score: 196.21

venturebeat
OpenAI launches GPT-5.4 with native computer use mode, financial plugins for Microsoft Excel, Google Sheets

The AI updates aren't slowing down. Literally two days after OpenAI launched a new underlying AI model for ChatGPT called GPT-5.3 Instant, the company has unveiled another, even more massive upgr [...]

Match Score: 185.85

venturebeat
Musk's xAI launches Grok Business and Enterprise with compelling vault amid ongoing deepfake controversy

xAI has launched Grok Business and Grok Enterprise, positioning its flagship AI assistant as a secure, team-ready platform for organizational use. These new tiers offer scalable access to Grok’s mos [...]

Match Score: 156.07

venturebeat
Samsung AI researcher's new, open reasoning model TRM outperforms models 10,000X larger — on specific problems

The trend of AI researchers developing new, small open source generative models that outperform far larger, proprietary peers continued this week with yet another staggering advancement.Alexia Jolicoe [...]

Match Score: 145.02

venturebeat
OpenAI's GPT-5.2 is here: what enterprises need to know

The rumors were true, and the "Code Red" is over: OpenAI today announced the release of its new frontier large language model (LLM) family: GPT-5.2.It comes at a pivotal moment for the AI pi [...]

Match Score: 131.92

venturebeat
Moonshot's Kimi K2 Thinking emerges as leading open source AI, outperforming GPT-5, Claude Sonnet 4.5 on key benchmarks

Even as concern and skepticism grows over U.S. AI startup OpenAI's buildout strategy and high spending commitments, Chinese open source AI providers are escalating their competition and one has e [...]

Match Score: 130.66

venturebeat
OpenAI is ending API access to fan-favorite GPT-4o model in February 2026

OpenAI has sent out emails notifying API customers that its chatgpt-4o-latest model will be retired from the developer platform in mid-February 2026,. Access to the model is scheduled to end on Februa [...]

Match Score: 128.74