New Delhi, July 17: Tech companies, including Apple, NVIDIA and Anthropic, have been reportedly accused of using YouTube video transcripts from popular creators like MKBHD and MrBeast to train their AI models without obtaining permission. As per reports, this type of practice has raised ethical concerns about the transparency and legality of the methods used by these AI companies to gather training data. According to multiple reports, the use of publicly available content for AI training without the explicit consent of the creators is becoming increasingly problematic.
As per a report of Wired, Apple, Nvidia and Anthropic have allegedly used YouTube transcripts of popular YouTube creators such as MKBHD, MrBeast, Jacksepticeye and PewDiePie to train their AI models. These companies used transcripts from YouTube videos to enhance their AI models without their knowledge or approval, as per a report of Times Now. The use of such data without consent has prompted discussions on the need for clearer regulations and guidelines in the AI industry to protect content creators' rights. YouTube New Feature Update: Google-Owned Platform Introduces AI-Powered ‘Jump Ahead’ Feature for Premium Members; Check Details.
Marques Brownlee Highlights Apple’s Use of YouTube Videos for AI
Apple has sourced data for their AI from several companies
One of them scraped tons of data/transcripts from YouTube videos, including mine
Apple technically avoids "fault" here because they're not the ones scraping
But this is going to be an evolving problem for a long time https://t.co/U93riaeSlY
— Marques Brownlee (@MKBHD) July 16, 2024
Marques Brownlee aka MKBHD has a YouTube channel. On July 16, Marques Brownlee shared a post on X, and said, “Apple has sourced data for their AI from several companies. One of them scraped tons of data/transcripts from YouTube videos, including mine.”
According to reports, there is a dataset called YouTube Subtitles that includes transcripts from educational and online learning channels such as Khan Academy, MIT and Harvard. Reports suggest that companies like Apple, Nvidia, Anthropic and Salesforce used subtitles from 173,536 YouTube videos, which were taken from over 48,000 channels. The companies reportedly have been using a dataset compiled by the EleutherAI. Instagram New Feature Update: Meta-Owned App Introduces New Stickers for Stories and Insta Reels.
According to the report, EleutherAI, published a research paper that mentioned a dataset called the Pile. The Pile is a non profit release and includes collection of various materials, not just from YouTube, but also from the European Parliament, English Wikipedia and a collection of emails from Enron Corporation employees that were released during a federal investigation into the company.
(The above story first appeared on LatestLY on Jul 17, 2024 01:18 PM IST. For more news and updates on politics, world, sports, entertainment and lifestyle, log on to our website latestly.com).