For years, the ARC benchmark was considered a nearly insurmountable obstacle for AI systems, a true test of fluid intelligence rather than simple memorization. But new results show that even this barrier is crumbling under the relentless optimization machinery of modern AI labs.<br /> The article The ARC benchmark's fall marks another casualty of relentless AI optimization appeared first on THE DECODER. [...]
On Sunday, a team of nine researchers at Sina Weibo — the Chinese social media giant better known for its microblogging platform than for cutting-edge artificial intelligence — quietly posted a 14 [...]
A growing number of developers and AI power users are taking to social media to accuse Anthropic of degrading the performance of Claude Opus 4.6 and Claude Code — intentionally or as an outcome of c [...]
For decades, the IQ test has been one of the most familiar — and most contested — yardsticks for human intelligence. Now, a startup project called AI IQ is applying the same metaphor to artificial [...]
For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same. OpenAI's GPT-5 family, Anthropic's Claude [...]
There's no shortage of generative AI benchmarks designed to measure the performance and accuracy of a given model on completing various helpful enterprise tasks — from coding to instruction fol [...]
The Browser Company has stopped active development of the popular Arc web browser, according to a blog post from CEO Josh Miller. There will still be updates to fix security issues and the like, but t [...]