Researchers at Nvidia have developed a new technique that flips the script on how large language models (LLMs) learn to reason. The method, called reinforcement learning pre-training (RLP), integrates RL into the initial training phase rather than saving it for the end.This approach encourages the model to “think for itself before predicting what comes next, thus teaching an independent thinking behavior earlier in the pretraining,” the researchers state in their paper. By learning to reason on plain text without needing external verifiers, models trained with RLP show significant improvements in learning complex reasoning tasks downstream, hinting at a future of more capable and adaptable AI for real-world tasks.The typical LLM training cycleTypically, large language models are first [...]
Microsoft on Tuesday released Phi-4-reasoning-vision-15B, a compact open-weight multimodal AI model that the company says matches or exceeds the performance of systems many times its size — while co [...]
Researchers at MiroMind AI and several Chinese universities have released OpenMMReasoner, a new training framework that improves the capabilities of language models in multimodal reasoning.The framewo [...]
Jensen Huang walked onto the GTC stage Monday wearing his trademark leather jacket and carrying, as it turned out, the blueprints for a new kind of monopoly.The Nvidia CEO unveiled the Agent Toolkit, [...]
Nvidia on Monday took the wraps off Vera Rubin, a sweeping new computing platform built from seven chips now in full production — and backed by an extraordinary lineup of customers that includes Ant [...]
AI engineers often chase performance by scaling up LLM parameters and data, but the trend toward smaller, more efficient, and better-focused models has accelerated. The Phi-4 fine-tuning methodology [...]