Media Copyright, AI Training Data, and the Challenge of Content Control in the Age of Generative AI


1. Introduction

The rapid advancement of generative artificial intelligence (AI) models has sparked a contentious debate regarding the use of copyrighted media content as training data. Media organisations, including news publishers, music companies, and creative industries, increasingly assert their rights over intellectual property (IP) utilised by AI firms to develop and operate models. These entities seek licensing fees, pursue litigation, and employ technical measures such as blocking AI web crawlers to prevent unauthorised scraping of their content. This article examines the evolving landscape of media copyright disputes in the context of AI, highlighting the complex interplay between cooperation and conflict, the motivations behind blocking AI data collection, legal and economic implications, and potential outcomes for the future of AI development and media rights.

2. Background: AI Training and Use of Media Content

Generative AI models rely heavily on vast datasets compiled from publicly accessible digital content, including news articles, images, music, and other media (Bender et al., 2021). This data is typically collected through automated web crawling. However, the inclusion of copyrighted works without explicit permission raises significant legal and ethical questions (Diakopoulos, 2019). Media organisations argue that AI firms profit from their creative investments without fair compensation or control over how their content is utilised.

3. Collaboration and Conflict: Media’s Dual Approach to AI Data Use

Media companies have adopted a dual strategy toward AI firms’ use of their content, balancing opportunities for cooperation with protective measures:

  • Cooperation through Licensing and Partnerships: Some media organisations recognise AI’s potential to serve as a platform for wider content distribution and new revenue streams. For example, The New York Times initially filed lawsuits against AI companies such as OpenAI and Microsoft in 2023, alleging unauthorised use of its articles for training AI models. Subsequently, the Times negotiated a licensing agreement with Amazon, allowing its content to be legally used across Amazon’s platforms (NYT, 2025). This approach reflects an attempt to monetise AI’s utilisation of media assets fairly and maintain influence over content use.
  • Conflict through Litigation and Technical Blocking: Other media entities have taken a defensive stance, actively blocking AI web crawlers and pursuing legal action to prevent unauthorised scraping and usage. Getty Images, representing approximately 600,000 photographers, has sued Stability AI for unlicensed use of its copyrighted photographs in training data. The lawsuit highlighted instances where AI-generated images included Getty watermarks, indicating direct misuse of protected content (Reuters, 2025). Similarly, Hollywood studios Disney and Universal filed lawsuits against the AI image generator Midjourney, accusing it of training on millions of copyrighted images and producing derivative works featuring iconic characters such as Shrek and Spider-Man without consent. These companies labelled the AI’s training method a “bottomless pit of plagiarism,” signalling a forceful resistance to what they perceive as infringement (Washington Post, 2025).

This duality exemplifies the tensions inherent in AI’s reliance on vast “organic” datasets harvested from the internet, which frequently encompass copyrighted content, and media owners’ efforts to protect and monetise their intellectual property.

4. Legal, Economic, and Policy Implications

The conflict embodies a broader tension between accelerating technological innovation and established intellectual property laws. Media companies contend that uncompensated AI use undermines the creative economy, threatening jobs, investment, and the sustainability of content creation (The Verge, 2025). On the other hand, AI proponents argue that broad access to data is essential for rapid development, competition, and innovation in the AI industry.

Legislative responses vary internationally. In the United Kingdom, government proposals to adopt an “opt-out” model for AI data usage—where rights holders must actively block access to their content—have faced opposition from artists like Elton John and institutions such as the BBC and Sky. They advocate instead for an “opt-in” system that mandates licensing agreements before AI companies can use copyrighted material, emphasising the importance of protecting the UK’s £125 billion creative sector (The Guardian, 2025; Rolling Stone, 2025).

Moreover, the “opt-out” system places the monitoring burden on rights holders, often seen as impractical and unfair. Calls have arisen within the UK House of Lords for mandatory transparency from AI firms regarding their training data sources, although the government has been reluctant to impose such requirements, citing concerns over stifling AI innovation (The Guardian, 2025).

5. Technical Measures and AI Crawler Blocking

In response to unauthorised content scraping, some media outlets have implemented technical measures to block AI web crawlers. While detailed disclosures are limited, these restrictions aim to prevent automated data collection by AI firms, safeguarding proprietary content (PhoneArena, 2025). Getty Images’ legal actions, while primarily focused on copyright infringement, also underscore a broader industry effort to control data access and usage.

This trend reflects media owners’ increasing awareness of AI’s data dependence and their desire to exert control over how and whether their content contributes to AI training.

6. Broader Industry and Cultural Context

Globally, creators from various sectors—including music stars like Dua Lipa and Elton John—have voiced concerns over AI’s unlicensed use of creative works, demanding stronger legal protections (The Guardian, 2025). Social media platforms have become venues for public discourse, with creators and advocates pushing back against perceived technological overreach and calling for more rigorous regulations to protect livelihoods and rights (X Platform, 2025).

These debates highlight fundamental questions about the balance between fostering technological progress and respecting creators’ rights, which will significantly shape the future landscape of AI and creative industries.

7. Conclusion

The evolving dynamics between media organisations and AI firms reveal a complex interplay of cooperation and conflict over the use of copyrighted content in AI training. Licensing agreements illustrate a pathway for mutually beneficial collaboration, while litigation and technical blocking underscore media’s protective instincts.

Resolving these tensions demands nuanced legal frameworks that balance innovation with intellectual property rights, greater transparency from AI developers regarding data use, and collaborative dialogue among stakeholders. The outcomes will decisively influence both the sustainability of creative sectors and the responsible development of AI technologies.


References

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623.

Diakopoulos, N. (2019). Automating the News: How Algorithms are Rewriting the Media. Harvard University Press.

NPR. (2025). AI Copyright Lawsuits and Licensing Deals. https://www.npr.org/2025/06/12/nx-s1-5431684/ai-disney-universal-midjourney-copyright-infringement-lawsuit

NYT. (2025). The New York Times’ Legal Action and Licensing with Amazon. https://www.nytimes.com/2025/05/29/business/media/new-york-times-amazon-ai-licensing.html

PhoneArena. (2025). The UK and Copyright Challenges with AI. https://www.phonearena.com/news/the-uk-is-about-to-choose-between-ai-and-copyright-and-the-stakes-are-massive_id170830

Reuters. (2025). Getty Images Lawsuit Against Stability AI. https://www.reuters.com/sustainability/boards-policy-regulation/gettys-landmark-uk-lawsuit-copyright-ai-set-begin-2025-06-09/

Rolling Stone. (2025). UK Artists’ Opposition to AI Opt-Out Plans. https://www.rollingstone.com/music/music-news/elton-john-uk-government-ai-copyright-plans-1235342381/

The Guardian. (2025). Media and AI Copyright Disputes. https://www.theguardian.com/technology/2025/jun/09/stability-ai-getty-lawsuit-copyright

The Verge. (2025). Nick Clegg and AI Industry Concerns. https://www.theverge.com/news/674366/nick-clegg-uk-ai-artists-policy-letter

Time. (2025). The Fair Use Debate Around AI Training. https://time.com/7293362/disney-universal-midjourney-lawsuit-ai/

Washington Post. (2025). Disney and Universal Lawsuit Against Midjourney AI. https://www.washingtonpost.com/technology/2025/06/11/disney-universal-sue-midjourney-ai-copyright/

X Platform. (2025). Social Media Creators’ Responses to AI Copyright Issues. https://x.com/JonLamArt/status/1746984942909755859