venturebeat
Has this stealth startup finally cracked the code on enterprise AI agent reliability? Meet AUI's Apollo-1

For more than a decade, conversational AI has promised human-like assistants that can do more than chat. Yet even as large language models (LLMs) like ChatGPT, Gemini, and Claude learn to reason, explain, and code, one critical category of interaction remains largely unsolved — reliably completing tasks for people outside of chat. Even the best AI models score only in the 30th percentile on Terminal-Bench Hard, a third-party benchmark designed to evaluate the performance of AI agents on completing a variety of browser-based tasks, far below the reliability demanded by most enterprises and users. And task-specific benchmarks like TAU-Bench airline, which measures the reliability of AI agents on finding and booking flights on behalf of a user, also don't have much higher pass rates, w [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
The beginning of the end of the transformer era? Neuro-symbolic AI startup AUI announces new funding at $750M valuation

The buzzed-about but still stealthy New York City startup Augmented Intelligence Inc (AUI), which seeks to go beyond the popular "transformer" architecture used by most of today's LLMs [...]

Match Score: 290.76

venturebeat
Most enterprises can't stop stage-three AI agent threats, VentureBeat survey finds

A rogue AI agent at Meta passed every identity check and still exposed sensitive data to unauthorized employees in March. Two weeks later, Mercor, a $10 billion AI startup, confirmed a supply-chain br [...]

Match Score: 174.87

Destination
Noble Audio FoKus Apollo review: The high price of pristine audio

I don’t review a lot of $650 headphones. That’s because most audio companies sell their top-of-the-line gear around $300-$400. Noble Audio isn’t like most companies. The FoKus Rex5 earbuds, for [...]

Match Score: 161.43

venturebeat
Claude Code 2.1.0 arrives with smoother workflows and smarter agents

Anthropic has released Claude Code v2.1.0, a notable update to its "vibe coding" development environment for autonomously building software, spinning up AI agents, and completing a wide rang [...]

Match Score: 152.96

venturebeat
Claude’s next enterprise battle is not models: it’s the agent control plane

New VB Pulse data shows Microsoft and OpenAI leading enterprise agent orchestration, but Anthropic’s first measurable foothold points to a larger fight over who controls the infrastructure where AI [...]

Match Score: 145.55

venturebeat
Perplexity takes its ‘Computer’ AI agent into the enterprise, taking aim at Microsoft and Salesforce

Perplexity, the AI-powered search company valued at $20 billion, announced on Wednesday at its inaugural Ask 2026 developer conference that its multi-model AI agent, Computer, is now available to ente [...]

Match Score: 121.10

venturebeat
Microsoft takes Agent 365 out of preview as shadow AI becomes an enterprise threat

Microsoft last week took Agent 365, its management platform for AI agents, out of preview and into general availability — a move that signals the software giant believes the governance challenge aro [...]

Match Score: 111.92

venturebeat
Microsoft launches MXC, an OS-level sandbox for AI agents, with OpenAI and Nvidia already on board

For the past two years, the technology industry has raced to make AI agents more capable — teaching them to write code, navigate software interfaces, manage files, and orchestrate multi-step workflo [...]

Match Score: 111.91

venturebeat
Testing autonomous agents (Or: how I learned to stop worrying and embrace chaos)

Look, we've spent the last 18 months building production AI systems, and we'll tell you what keeps us up at night — and it's not whether the model can answer questions. That's ta [...]

Match Score: 109.90