venturebeat

2025-10-17

World's largest open-source multimodal dataset delivers 17x training efficiency, unlocking enterprise AI that connects documents, audio and video

AI models are only as good as the data they're trained on. That data generally needs to be labeled, curated and organized before models can learn from it in an effective way.

One of the big missing links in the AI ecosystem has been the availability of a large high-quality open-source multimodal dataset. That changes today with the debut of the EMM-1 dataset which is comprised of 1 billion data pairs and 100M data groups across 5 modalities: text, image, video, audio and 3d point clouds .Multimodal datasets combine different types of data that AI systems can process together. This mirrors how humans perceive the world using multiple senses simultaneously. These datasets enable AI systems to make richer inferences by understanding relationships across data types, rather than proces [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat

2025-10-02

'Western Qwen': IBM wows with Granite 4 LLM launch and hybrid Mamba/Transformer architecture

IBM today announced the release of Granite 4.0, the newest generation of its homemade family of open source large language models (LLMs) designed to balance high performance with lower memory and cost [...]

Match Score: 139.42

Destination

2025-04-17

Wikipedia offers AI developers a training dataset to maybe get scraper bots off its back

Wikipedia has been struggling with the impact that AI crawlers — bots that are scraping text and multimedia from the encyclopedia to train generative artificial intelligence models — have been hav [...]

Match Score: 107.83

venturebeat

2025-10-01

GitHub leads the enterprise, Claude leads the pack—Cursor’s speed can’t close

In the race to deploy generative AI for coding, the fastest tools are not winning enterprise deals. A new VentureBeat analysis, combining a comprehensive survey of 86 engineering teams with our own ha [...]

Match Score: 103.77

Destination

2025-10-07

TOUCAN is the largest open training dataset for AI agents

A research team from MIT, IBM, and the University of Washington has released TOUCAN, the largest open dataset to date for training AI agents. The dataset contains 1.5 million real tool interactions, a [...]

Match Score: 93.69

venturebeat

2025-10-07

IBM claims 45% productivity gains with Project Bob, its multi-model IDE that orchestrates LLMs with full repository context

For many enterprises, there continue to be barriers to fully adopting and benefiting from agentic AI.IBM is betting the blocker isn't building AI agents but governing them in production.At its Te [...]

Match Score: 82.14

venturebeat

2025-10-02

New AI training method creates powerful software agents with just 78 examples

A new study by Shanghai Jiao Tong University and SII Generative AI Research Lab (GAIR) shows that training large language models (LLMs) for complex, autonomous tasks does not require massive datasets. [...]

Match Score: 81.48

venturebeat

2025-10-09

The next AI battleground: Google’s Gemini Enterprise and AWS’s Quick Suite bring full-stack, in-context AI to the workplace

The friction of having to open a separate chat window to prompt an agent could be a hassle for many enterprises. And AI companies are seeing an opportunity to bring more and more AI services into one [...]

Match Score: 78.50

venturebeat

2025-10-01

Thinking Machines' first official product is here: meet Tinker, an API for distributed LLM fine-tuning

Thinking Machines, the AI startup founded earlier this year by former OpenAI CTO Mira Murati, has launched its first product: Tinker, a Python-based API designed to make large language model (LLM) fin [...]

Match Score: 77.45

venturebeat

2025-10-09

Nvidia researchers boost LLMs reasoning skills by getting them to 'think' during pre-training

Researchers at Nvidia have developed a new technique that flips the script on how large language models (LLMs) learn to reason. The method, called reinforcement learning pre-training (RLP), integrates [...]

Match Score: 77.27