Context windows are becoming a computational bottleneck. The longer an agent runs, the more tokens accumulate from retrieved documents, reasoning traces and conversation history, and the more memory and compute that growing context demands. Most existing solutions either degrade model accuracy, require the full context to load before compression begins, or produce memory savings that don't translate into real speedups in standard serving infrastructure.A research team from NYU, Columbia, Princeton, University of Maryland, Harvard and Lawrence Livermore National Laboratory published a paper this week that proposes a novel fix. The researchers introduce the concept of Latent Context Language Models, or LCLMs, a family of encoder-decoder compression models that compress input context b [...]
DeepSeek, the Chinese artificial intelligence research company that has repeatedly challenged assumptions about AI development costs, has released a new model that fundamentally reimagines how large l [...]
Nvidia researchers have introduced a new technique that dramatically reduces how much memory large language models need to track conversation history — by as much as 20x — without modifying the mo [...]
Google on Monday unveiled the most significant upgrade to its autonomous research agent capabilities since the product's debut, launching two new agents — Deep Research and Deep Research Max [...]
Enterprise AI applications that handle large documents or long-horizon tasks face a severe memory bottleneck. As the context grows longer, so does the KV cache, the area where the model’s working me [...]
A new framework from Stanford University and SambaNova addresses a critical challenge in building robust AI agents: context engineering. Called Agentic Context Engineering (ACE), the framework automat [...]
For all their superhuman power, today’s AI models suffer from a surprisingly human flaw: They forget. Give an AI assistant a sprawling conversation, a multi-step reasoning task or a project spanning [...]
RAG isn't always fast enough or intelligent enough for modern agentic AI workflows. As teams move from short-lived chatbots to long-running, tool-heavy agents embedded in production systems, thos [...]
A little-known Miami-based startup called Subquadratic emerged from stealth on Tuesday with a sweeping claim: that it has built the first large language model to fully escape the mathematical constrai [...]
One employee at Vercel adopted an AI tool. One employee at that AI vendor got hit with an infostealer. That combination created a walk-in path to Vercel’s production environments through an OAuth gr [...]