2025-10-17
AI models are only as good as the data they're trained on. That data generally needs to be labeled, curated and organized before models can learn from it in an effective way.
One of the big missing links in the AI ecosystem has been the availability of a large high-quality open-source multimodal dataset. That changes today with the debut of the EMM-1 dataset which is comprised of 1 billion data pairs and 100M data groups across 5 modalities: text, image, video, audio and 3d point clouds .Multimodal datasets combine different types of data that AI systems can process together. This mirrors how humans perceive the world using multiple senses simultaneously. These datasets enable AI systems to make richer inferences by understanding relationships across data types, rather than proces [...]
2025-10-02
IBM today announced the release of Granite 4.0, the newest generation of its homemade family of open source large language models (LLMs) designed to balance high performance with lower memory and cost [...]
2025-04-17
Wikipedia has been struggling with the impact that AI crawlers — bots that are scraping text and multimedia from the encyclopedia to train generative artificial intelligence models — have been hav [...]
2025-10-01
In the race to deploy generative AI for coding, the fastest tools are not winning enterprise deals. A new VentureBeat analysis, combining a comprehensive survey of 86 engineering teams with our own ha [...]
2025-10-07
A research team from MIT, IBM, and the University of Washington has released TOUCAN, the largest open dataset to date for training AI agents. The dataset contains 1.5 million real tool interactions, a [...]
2025-10-07
For many enterprises, there continue to be barriers to fully adopting and benefiting from agentic AI.IBM is betting the blocker isn't building AI agents but governing them in production.At its Te [...]
2025-10-02
A new study by Shanghai Jiao Tong University and SII Generative AI Research Lab (GAIR) shows that training large language models (LLMs) for complex, autonomous tasks does not require massive datasets. [...]
2025-10-09
The friction of having to open a separate chat window to prompt an agent could be a hassle for many enterprises. And AI companies are seeing an opportunity to bring more and more AI services into one [...]
2025-10-01
Thinking Machines, the AI startup founded earlier this year by former OpenAI CTO Mira Murati, has launched its first product: Tinker, a Python-based API designed to make large language model (LLM) fin [...]
2025-10-09
Researchers at Nvidia have developed a new technique that flips the script on how large language models (LLMs) learn to reason. The method, called reinforcement learning pre-training (RLP), integrates [...]