2025-04-20

One of the most compelling results in recent o3 benchmarks comes from its performance on long-context tasks.
The article OpenAI's o3 achieves near-perfect performance on long context benchmark appeared first on THE DECODER.
[...]2025-10-16
A new framework from Stanford University and SambaNova addresses a critical challenge in building robust AI agents: context engineering. Called Agentic Context Engineering (ACE), the framework automat [...]
2025-11-20
Elon Musk's frontier generative AI startup xAI formally opened developer access to its Grok 4.1 Fast models last night and introduced a new Agent Tools API—but the technical milestones were imm [...]
2025-10-27
Watch out, DeepSeek and Qwen! There's a new king of open source large language models (LLMs), especially when it comes to something enterprises are increasingly valuing: agentic tool use — that [...]
2025-10-21
DeepSeek, the Chinese artificial intelligence research company that has repeatedly challenged assumptions about AI development costs, has released a new model that fundamentally reimagines how large l [...]
2025-11-04
When the transformer architecture was introduced in 2017 in the now seminal Google paper "Attention Is All You Need," it became an instant cornerstone of modern artificial intelligence. Ever [...]
2025-11-12
Baidu Inc., China's largest search engine company, released a new artificial intelligence model on Monday that its developers claim outperforms competitors from Google and OpenAI on several visio [...]
2025-10-29
Enterprise AI agents today face a fundamental timing problem: They can't easily act on critical business events because they aren't always aware of them in real-time.The challenge is infrast [...]
2025-11-06
Even as concern and skepticism grows over U.S. AI startup OpenAI's buildout strategy and high spending commitments, Chinese open source AI providers are escalating their competition and one has e [...]
2025-11-07
The developers of Terminal-Bench, a benchmark suite for evaluating the performance of autonomous AI agents on real-world terminal-based tasks, have released version 2.0 alongside Harbor, a new framewo [...]