Researchers at Mila have proposed a new technique that makes large language models (LLMs) vastly more efficient when performing complex reasoning. Called Markovian Thinking, the approach allows LLMs to engage in lengthy reasoning without incurring the prohibitive computational costs that currently limit such tasks.The team’s implementation, an environment named Delethink, structures the reasoning chain into fixed-size chunks, breaking the scaling problem that plagues very long LLM responses. Initial estimates show that for a 1.5B parameter model, this method can cut the costs of training by more than two-thirds compared to standard approaches.The quadratic curse of long-chain reasoningFor an LLM to solve a complex problem, it often needs to generate a long series of intermediate “think [...]
Microsoft on Tuesday released Phi-4-reasoning-vision-15B, a compact open-weight multimodal AI model that the company says matches or exceeds the performance of systems many times its size — while co [...]
Even as concern and skepticism grows over U.S. AI startup OpenAI's buildout strategy and high spending commitments, Chinese open source AI providers are escalating their competition and one has e [...]
Chinese AI and tech firms continue to impress with their development of cutting-edge, state-of-the-art AI language models.Today, the one drawing eyeballs is Alibaba Cloud's Qwen Team of AI resear [...]
Researchers at Nvidia have developed a new technique that flips the script on how large language models (LLMs) learn to reason. The method, called reinforcement learning pre-training (RLP), integrates [...]
Deploying AI agents for repository-scale tasks like bug detection, patch verification, and code review requires overcoming significant technical hurdles. One major bottleneck: the need to set up dynam [...]
Nvidia researchers have introduced a new technique that dramatically reduces how much memory large language models need to track conversation history — by as much as 20x — without modifying the mo [...]
As agentic AI workflows multiply the cost and latency of long reasoning chains, a team from the University of Maryland, Lawrence Livermore National Labs, Columbia University and TogetherAI has found a [...]
AI engineers often chase performance by scaling up LLM parameters and data, but the trend toward smaller, more efficient, and better-focused models has accelerated. The Phi-4 fine-tuning methodology [...]