Destination
Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high

A new benchmark from Artificial Analysis reveals alarming weaknesses in the factual reliability of large language models. Out of 40 models tested, only four achieved a positive score - with Google's Gemini 3 Pro clearly in the lead.<br /> The article Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high appeared first on THE DECODER. [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
Artificial Analysis overhauls its AI Intelligence Index, replacing popular benchmarks with 'real-world' tests

The arms race to build smarter AI models has a measurement problem: the tests used to rank them are becoming obsolete almost as quickly as the models improve. On Monday, Artificial Analysis, an indepe [...]

Match Score: 121.16

venturebeat
Gemini 3 Flash arrives with reduced costs and latency — a powerful combo for enterprises

Enterprises can now harness the power of a large language model that's near that of the state-of-the-art Google’s Gemini 3 Pro, but at a fraction of the cost and with increased speed, thanks to [...]

Match Score: 113.18

venturebeat
Google unveils Gemini 3 claiming the lead in math, science, multimodal and agentic AI benchmarks

After more than a month of rumors and feverish speculation — including Polymarket wagering on the release date — Google today unveiled Gemini 3, its newest proprietary frontier model family and th [...]

Match Score: 107.82

venturebeat
Frontier models are failing one in three production attempts — and getting harder to audit

AI agents are now embedded in real enterprise workflows, and they're still failing roughly one in three attempts on structured benchmarks. That gap between capability and reliability is the defin [...]

Match Score: 98.63

venturebeat
Google’s new Deep Research and Deep Research Max agents can search the web and your private data

Google on Monday unveiled the most significant upgrade to its autonomous research agent capabilities since the product's debut, launching two new agents — Deep Research and Deep Research Max †[...]

Match Score: 95.54

venturebeat
Grok 4.1 Fast's compelling dev access and Agent Tools API overshadowed by Musk glazing

Elon Musk's frontier generative AI startup xAI formally opened developer access to its Grok 4.1 Fast models last night and introduced a new Agent Tools API—but the technical milestones were imm [...]

Match Score: 87.12

venturebeat
Google releases Gemini 3.1 Flash Lite at 1/8th the cost of Pro

Google's newest AI model is here: Gemini 3.1 Flash-Lite, and the biggest improvements this time around come in cost and speed, especially for enterprises and developers seeking to leverage powerf [...]

Match Score: 78.69

venturebeat
Musk's xAI launches Grok 4.1 with lower hallucination rate on the web and apps — no API access (for now)

In what appeared to be a bid to soak up some of Google's limelight prior to the launch of its new Gemini 3 flagship AI model — now recorded as the most powerful LLM in the world by multiple ind [...]

Match Score: 76.65

venturebeat
Google Gemini 3.1 Pro first impressions: a 'Deep Think Mini' with adjustable reasoning on demand

For the past three months, Google's Gemini 3 Pro has held its ground as one of the most capable frontier models available. But in the fast-moving world of AI, three months is a lifetime — and c [...]

Match Score: 74.39