Peektastic.com

Grok 4 edges out GPT-5 in complex reasoning benchmark ARC-AGI

In the ARC-AGI-2 benchmark, which is designed to measure a language model's general reasoning skills, GPT-5 (High) scored 9.9 percent at a cost of $0.73 per task, according to ARC Prize.<br /> The article Grok 4 edges out GPT-5 in complex reasoning benchmark ARC-AGI appeared first on THE DECODER. [...]

Discover Copy

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat

Grok 4.1 Fast's compelling dev access and Agent Tools API overshadowed by Musk glazing

Elon Musk's frontier generative AI startup xAI formally opened developer access to its Grok 4.1 Fast models last night and introduced a new Agent Tools API—but the technical milestones were imm [...]

More Copy

Match Score: 324.70

venturebeat

xAI launches Grok 4.3 at an aggressively low price and a new, fast, powerful voice cloning suite

While Elon Musk faces off against his former colleague and OpenAI co-founder Sam Altman in court, Musk's rival firm xAI, founded to take on OpenAI, isn't slowing down on launching competitiv [...]

More Copy

Match Score: 212.42

venturebeat

Musk's xAI launches Grok 4.1 with lower hallucination rate on the web and apps — no API access (for now)

In what appeared to be a bid to soak up some of Google's limelight prior to the launch of its new Gemini 3 flagship AI model — now recorded as the most powerful LLM in the world by multiple ind [...]

More Copy

Match Score: 197.49

venturebeat

Microsoft built Phi-4-reasoning-vision-15B to know when to think — and when thinking is a waste of time

Microsoft on Tuesday released Phi-4-reasoning-vision-15B, a compact open-weight multimodal AI model that the company says matches or exceeds the performance of systems many times its size — while co [...]

More Copy

Match Score: 180.33

venturebeat

OpenAI launches GPT-5.4 with native computer use mode, financial plugins for Microsoft Excel, Google Sheets

The AI updates aren't slowing down. Literally two days after OpenAI launched a new underlying AI model for ChatGPT called GPT-5.3 Instant, the company has unveiled another, even more massive upgr [...]

More Copy

Match Score: 173.99

venturebeat

AI IQ is here: a new site scores frontier AI models on the human IQ scale. The results are already dividing tech.

For decades, the IQ test has been one of the most familiar — and most contested — yardsticks for human intelligence. Now, a startup project called AI IQ is applying the same metaphor to artificial [...]

More Copy

Match Score: 151.39

venturebeat

Musk's xAI launches Grok Business and Enterprise with compelling vault amid ongoing deepfake controversy

xAI has launched Grok Business and Grok Enterprise, positioning its flagship AI assistant as a secure, team-ready platform for organizational use. These new tiers offer scalable access to Grok’s mos [...]

More Copy

Match Score: 147.98

venturebeat

Samsung AI researcher's new, open reasoning model TRM outperforms models 10,000X larger — on specific problems

The trend of AI researchers developing new, small open source generative models that outperform far larger, proprietary peers continued this week with yet another staggering advancement.Alexia Jolicoe [...]

More Copy

Match Score: 135.89

venturebeat

DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5

The whale has resurfaced. DeepSeek, the Chinese AI startup offshoot of High-Flyer Capital Management quantitative analysis firm, became a near-overnight sensation globally in January 2025 with the rel [...]

More Copy

Match Score: 132.37