venturebeat
Researchers baked 3x inference speedups directly into LLM weights — without speculative decoding

As agentic AI workflows multiply the cost and latency of long reasoning chains, a team from the University of Maryland, Lawrence Livermore National Labs, Columbia University and TogetherAI has found a way to bake 3x throughput gains directly into a model's weights.Unlike speculative decoding, which requires a separate drafting model, this approach requires no additional infrastructure — just a single special token added to the model's existing architecture.The limits of next-token predictionNext-token prediction — generating text one token per forward pass — creates a throughput ceiling that becomes painfully expensive when models need to produce thousands of tokens. This bottleneck is especially problematic in reasoning models, which frequently generate thousands of “cha [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
Baseten takes on hyperscalers with new AI training platform that lets you own your model weights

Baseten, the AI infrastructure company recently valued at $2.15 billion, is making its most significant product pivot yet: a full-scale push into model training that could reshape how enterprises wean [...]

Match Score: 189.03

venturebeat
Moving past speculation: How deterministic CPUs deliver predictable AI performance

For more than three decades, modern CPUs have relied on speculative execution to keep pipelines full. When it emerged in the 1990s, speculation was hailed as a breakthrough — just as pipelining and [...]

Match Score: 169.23

venturebeat
Cerebras stock nearly doubles on day one as AI chipmaker hits $100 billion — what it means for AI infrastructure

Cerebras Systems, the Silicon Valley chipmaker that built the world's largest commercial AI processor, erupted onto the Nasdaq on Wednesday, opening at $350 per share — nearly double its $185 I [...]

Match Score: 146.75

venturebeat
Together AI's ATLAS adaptive speculator delivers 400% inference speedup by learning from workloads in real-time

Enterprises expanding AI deployments are hitting an invisible performance wall. The culprit? Static speculators that can't keep up with shifting workloads.Speculators are smaller AI models that w [...]

Match Score: 124.25

venturebeat
5% GPU utilization: The $401 billion AI infrastructure problem enterprises can't keep ignoring

For the last 24 months, one narrative justified every over-provisioned data center and bloated IT budget: the GPU scramble. Silicon was the new oil, and H100s traded like contraband. Reserve capacity [...]

Match Score: 114.63

venturebeat
AI inference costs dropped up to 10x on Nvidia's Blackwell — but hardware is only half the equation

Lowering the cost of inference is typically a combination of hardware and software. A new analysis released Thursday by Nvidia details how four leading inference providers are reporting 4x to 10x redu [...]

Match Score: 96.15

venturebeat
MiniMax-M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5-10% of the cost

Big news in enterprise AI broke over the weekend as Chinese AI startup MiniMax released its highly anticipated M3 large language model on Sunday evening Eastern time, pairing frontier-tier coding and [...]

Match Score: 88.04

venturebeat
The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

Every GPU cluster has dead time. Training jobs finish, workloads shift and hardware sits dark while power and cooling costs keep running. For neocloud operators, those empty cycles are lost margin.The [...]

Match Score: 84.09

venturebeat
MIT's MeMo lets teams swap in a better LLM without retraining — and performance jumps 26%

Enabling LLMs to acquire new knowledge after training remains a major hurdle for enterprise AI — current solutions are either too expensive, too slow, or constrained by context window limits.MeMo, a [...]

Match Score: 83.35