AI Training Data Reveals YouTube videos Used, Including Those of Popular Creators
A new report by The Atlantic details the scope of YouTube videos used to train artificial intelligence models, sparking renewed debate over copyright and creator rights. The datasets, used by tech companies like Nvidia and Meta, contain hundreds of thousands of videos from individual creators – including artist Jon Peters - alongside massive contributions from news and educational channels like the BBC (33,000+ videos) and TED (nearly 50,000).
The report highlights a potential conflict with YouTube’s terms of service, which prohibit the mass ripping of videos. While YouTube offers a toggle for creators to opt-out of having their content used for AI training, the Atlantic report questions its effectiveness.
“Hundreds of thousands of others-if not more-are from individual creators,such as Peters,” Reisner wrote in The Atlantic.
Tech companies maintain their use of YouTube-based training data is compliant. Though, legal challenges, such as David millette’s dismissed lawsuit against Nvidia, demonstrate the difficulty creators face in defending their work. Creators can take steps to mitigate the risk, including adding watermarks or captions to videos, which can make them less desirable for AI training purposes.