Enterprises that want tokenizer-free multilingual models are increasingly turning to byte-level language models to reduce brittleness in noisy or low-resource text. To tap into that niche — and make it practical at scale — the Allen Institute of AI (Ai2) introduced Bolmo, a new family of models that leverage its Olmo 3 models by “bytefiying” them and reusing their backbone and capabilities. The company launched two versions, Bolmo 7B and Bolmo 1B, which are “the first fully open byte-level language model,” according to Ai2. The company said the two models performed competitively with — and in some cases surpassed — other byte-level and character-based models.Byte-level language models operate directly on raw UTF-8 bytes, eliminating the need for a predefined vocabulary or t [...]
Baseten, the AI infrastructure company recently valued at $2.15 billion, is making its most significant product pivot yet: a full-scale push into model training that could reshape how enterprises wean [...]
When the transformer architecture was introduced in 2017 in the now seminal Google paper "Attention Is All You Need," it became an instant cornerstone of modern artificial intelligence. Ever [...]
The prevailing assumption in AI development has been straightforward: larger models trained on more data produce better results. Nvidia's latest release directly challenges that size assumption [...]
Mistral AI on Monday launched Forge, an enterprise model training platform that allows organizations to build, customize, and continuously improve AI models using their own proprietary data — a move [...]
When Liquid AI, a startup founded by MIT computer scientists back in 2023, introduced its Liquid Foundation Models series 2 (LFM2) in July 2025, the pitch was straightforward: deliver the fastest on-d [...]
Microsoft on Tuesday released Phi-4-reasoning-vision-15B, a compact open-weight multimodal AI model that the company says matches or exceeds the performance of systems many times its size — while co [...]
For much of 2025, the frontier of open-weight language models has been defined not in Silicon Valley or New York City, but in Beijing and Hangzhou.Chinese research labs including Alibaba's Qwen, [...]
A new study from researchers at Stanford University and Nvidia proposes a way for AI models to keep learning after deployment — without increasing inference costs. For enterprise agents that have to [...]
Researchers at Meta, the University of Chicago, and UC Berkeley have developed a new framework that addresses the high costs, infrastructure complexity, and unreliable feedback associated with using r [...]