Destination

2025-11-23

Gemini 3 Pro and GPT-5 still fail at complex physics tasks designed for real scientific research


A new physics benchmark called "CritPt" puts leading AI models to the test at the level of early-stage PhD research. The results show that even top systems like Gemini 3 Pro and GPT-5 still fall far short of acting as autonomous scientists.


The article Gemini 3 Pro and GPT-5 still fail at complex physics tasks designed for real scientific research appeared first on Discover Copy

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat

2025-11-18

Google unveils Gemini 3 claiming the lead in math, science, multimodal and agentic AI benchmarks

After more than a month of rumors and feverish speculation — including Polymarket wagering on the release date — Google today unveiled Gemini 3, its newest proprietary frontier model family and th [...]

Match Score: 147.35

venturebeat

2025-11-21

OpenAI is ending API access to fan-favorite GPT-4o model in February 2026

OpenAI has sent out emails notifying API customers that its chatgpt-4o-latest model will be retired from the developer platform in mid-February 2026,. Access to the model is scheduled to end on Februa [...]

Match Score: 143.34

venturebeat

2025-11-13

Upwork study shows AI agents excel with human partners but fail independently

Artificial intelligence agents powered by the world's most advanced language models routinely fail to complete even straightforward professional tasks on their own, according to groundbreaking re [...]

Match Score: 129.26

Destination

2025-10-07

Google's Michel Devoret is one of the 2025 winners of the Nobel Prize in Physics

The Royal Swedish Academy of Sciences has awarded Google's Chief Scientist of Quantum Hardware the Nobel Prize in Physics alongside former Google employee John Martinis, and University of Califor [...]

Match Score: 90.24

venturebeat

2025-11-12

OpenAI reboots ChatGPT experience with GPT-5.1 after mixed reviews of GPT-5

ChatGPT is about to become faster and more conversational as OpenAI upgrades its flagship model GPT-5 to GPT-5.1.OpenAI announced two updates to the GPT-5 series: GPT-5.1 Instant and GPT-5.1 Thinking. [...]

Match Score: 88.87

venturebeat

2025-11-19

OpenAI debuts GPT‑5.1-Codex-Max coding model and it already completed a 24-hour task internally

OpenAI has introduced GPT‑5.1-Codex-Max, a new frontier agentic coding model now available in its Codex developer environment. The release marks a significant step forward in AI-assisted software en [...]

Match Score: 85.71

venturebeat

2025-11-25

What enterprises should know about The White House's new AI 'Manhattan Project' the Genesis Mission

President Donald Trump’s new “Genesis Mission” unveiled Monday is billed as a generational leap in how the United States does science akin to the Manhattan Project that created the atomic bomb d [...]

Match Score: 84.04

venturebeat

2025-11-20

Google's upgraded Nano Banana Pro AI image model hailed as 'absolutely bonkers' for enterprises and users

Infographics rendered without a single spelling error. Complex diagrams one-shotted from paragraph prompts. Logos restored from fragments. And visual outputs so sharp with so much text density and acc [...]

Match Score: 83.52

venturebeat

2025-11-03

Meet Denario, the AI ‘research assistant’ that is already getting its own papers published

An international team of researchers has released an artificial intelligence system capable of autonomously conducting scientific research across multiple disciplines — generating papers from initia [...]

Match Score: 80.64