On Sunday, a team of nine researchers at Sina Weibo — the Chinese social media giant better known for its microblogging platform than for cutting-edge artificial intelligence — quietly posted a 14-page technical report to arXiv that sent shockwaves through the AI research community. Their claim: a language model with just 3 billion parameters can match or exceed the reasoning performance of flagship systems from Google DeepMind, OpenAI, Anthropic, and DeepSeek that are hundreds of times larger.The model, called VibeThinker-3B, scored 94.3 on AIME 2026 — the American Invitational Mathematics Examination, one of the most demanding standardized math competitions in the world. That figure places it alongside DeepSeek V3.2, a model with 671 billion parameters, and ahead of Gemini 3 Pro, G [...]
Another day in late 2025, another impressive result from a Chinese company in open source artificial intelligence.Chinese social networking company Weibo's AI division recently released its open [...]
For decades, the IQ test has been one of the most familiar — and most contested — yardsticks for human intelligence. Now, a startup project called AI IQ is applying the same metaphor to artificial [...]
Meta has been one of the most interesting companies of the generative AI era — initially gaining a loyal and huge following of users for the release of its mostly open source Llama family of large l [...]
Patronus AI, the artificial intelligence evaluation startup backed by $20 million from investors including Lightspeed Venture Partners and Datadog, unveiled a new training architecture Tuesday that it [...]
There is no shortage of AI benchmarks in the market today, with popular options like Humanity's Last Exam (HLE), ARC-AGI-2 and GDPval, among numerous others.AI agents excel at solving abstract ma [...]
A little-known Miami-based startup called Subquadratic emerged from stealth on Tuesday with a sweeping claim: that it has built the first large language model to fully escape the mathematical constrai [...]
Major social media platforms in China have started rolling out labels for AI-generated content to comply with a law that took effect on Monday. Users of the likes of WeChat, Douyin, Weibo and RedNote [...]
IBM today announced the release of Granite 4.0, the newest generation of its homemade family of open source large language models (LLMs) designed to balance high performance with lower memory and cost [...]