AI companies claim their tools couldn't exist without training on copyrighted material. It turns out, they could — it's just really hard. To prove it, AI researchers trained a new model that's less powerful but much more ethical. That's because the LLM's dataset uses only public domain and openly licensed material.<br /> The paper (via The Washington Post) was a collaboration between 14 different institutions. The authors represent universities like MIT, Carnegie Mellon and the University of Toronto. Nonprofits like Vector Institute and the Allen Institute for AI also contributed.<br /> The group built an 8 TB ethically-sourced dataset. Among the data was a set of 130,000 books in the Library of Congress. After inputting the material, they trained a s [...]
OpenAI is calling on the Trump administration to give AI companies an exemption to train their models on copyrighted material. In a blog post spotted by The Verge, the company this week published its [...]
Disney is going after another generative AI tool, accusing ByteDance and its recently released Seedance 2.0 of using its copyrighted material without permission. As first reported on by Axios, the Wal [...]
OpenAI claims that Chinese startups are persistently trying to copy the technology of American AI companies. Aligned with that, OpenAI says it and partner Microsoft have been banning accounts suspecte [...]
Three YouTube channels have banded together and filed a class action lawsuit against Apple, as first spotted by MacRumors. According to the lawsuit, the creators behind h3h3 Productions, MrShortGameGo [...]
The UK's House of Lords just voted to add an amendment to a data bill that mandates that tech companies disclose which copyright-protected works were used to train AI models, as reported by The G [...]
Federal Judge Vince Chhabria has ruled in favor of Meta over the 13 book authors, including Sarah Silverman, who sued the company for training its large language model on their published work without [...]