AI and the Corporate Capture of Knowledge
More than a decade after Aaron Swartz’s death, the United States finds itself grappling with a fundamental contradiction. Swartz passionately believed that knowledge, especially when funded by public resources, should be universally accessible. This conviction led him to download thousands of academic articles from the JSTOR archive, intending to make them freely available. The resulting legal battle, culminating in a felony charge and the threat of decades in prison, tragically ended with his suicide in 2013.
The unresolved ethical and legal questions ignited by Swartz’s case have taken on renewed urgency in the age of artificial intelligence. The current debates surrounding AI, copyright, and the control of facts echo the concerns that drove Swartz’s actions. At the heart of the matter lies a crucial question: who owns knowledge, and who benefits from its use?
The Ancient Context: Access to Publicly Funded Research
During Swartz’s time, a significant portion of academic research was financed by taxpayer dollars, conducted at public institutions, and intended to advance collective understanding. yet, access to this research was – and frequently enough remains – restricted behind expensive paywalls erected by private publishers. Individuals who contributed to funding this research were effectively barred from reading it without incurring additional costs.Swartz rightly identified this as an intentional system, not an accident of the market, designed to prioritize profit over public benefit.
This system continues to present considerable challenges. The rising costs of academic journals hinder research progress, especially for self-reliant scholars and institutions with limited funding. A 2023 study by the Association of Research Libraries found that the average cost of institutional subscriptions to scholarly journals increased by over 5%, substantially outpacing library budget growth. This situation perpetuates an inequitable system where access to knowledge is resolute by financial resources.
AI and the New Era of Information Appropriation
Today’s AI landscape represents a far more ambitious and encompassing form of information extraction. Tech companies are “ingesting” vast quantities of copyrighted material – books, journalistic articles, academic papers, artwork, music, and even personal writings – at an unprecedented scale. This “scraping” of data is often conducted without explicit consent, fair compensation, or even transparency. the collected data is then used to train large language models (LLMs) which form the foundation of many new AI applications.
The resulting AI systems are then sold back to the public, effectively monetizing knowledge that was, in many cases, originally created and disseminated with public support.Crucially,the government’s response to this large-scale data extraction has been markedly different from the aggressive prosecution faced by Swartz. While copyright lawsuits are emerging, they proceed slowly, enforcement is uncertain, and policymakers often express caution due to the perceived economic and strategic importance of AI growth.Copyright infringement is increasingly framed as a necesary byproduct of “innovation.”
The Anthropic Settlement: A Case Study in Corporate Immunity
Recent legal developments illuminate this disparity. In 2025, Anthropic, an AI company, reached a settlement with publishers over the unauthorized use of copyrighted books in its AI training data. The settlement, reportedly valuing infringement at roughly $3,000 per book across an estimated 500,000 works, totaled over $1.5 billion.Though, scholars estimate that Anthropic avoided over $1 trillion in potential liability. This suggests that for large, well-funded AI firms, settling copyright claims is increasingly viewed as a predictable cost of doing business, rather than a deterrent to unauthorized data scraping.
This situation underscores a troubling double standard. While swartz was prosecuted for making knowledge *more* accessible, AI companies are largely permitted to profit from massively exploiting knowledge, with minimal legal repercussions. It begs the question: what standard of legality now applies to those who extract and commercialize information?
The Democratic Implications of Concentrated Knowledge Control
The stakes extend far beyond copyright law.the central issue is who controls the infrastructure of knowledge in the future and what that control means for democratic participation, accountability, and public trust. AI systems trained on vast datasets of publicly funded research are rapidly becoming the primary means by which people access information on critical issues like science, law, medicine, and public policy.
As search, analysis, and interpretation are increasingly mediated by proprietary AI models, control over the training data and the underlying infrastructure translates into control over what questions are asked, what answers are surfaced, and whose expertise is considered authoritative.If public knowledge is absorbed into opaque, proprietary systems that the public cannot inspect or challenge, access to information is no longer governed by democratic norms, but by the priorities of the companies that control those systems.
The Illusion of Democratization
AI is often touted as a democratizing force, promising to make information more accessible to all. Though, like the early internet, its trajectory is increasingly pointing towards consolidation. Control over data, the algorithms themselves, and the necessary computational infrastructure is concentrated in the hands of a small number of powerful tech companies. These companies will dictate who has access to knowledge, under what conditions, and at what price.
This consolidation mirrors historical patterns of technological disruption. While new technologies may initially offer opportunities for decentralization, they frequently become dominated by powerful actors who leverage their resources to shape the landscape to their advantage.
The Legacy of Aaron Swartz and the Future of Knowledge
Swartz’s struggle was not merely about access to information, but about the fundamental question of whether knowledge should be governed by openness or by corporate capture, and for whose benefit. He understood that access to knowledge is a prerequisite for a functioning democracy. A society cannot meaningfully engage in debate, formulate policy, or pursue justice if information is concealed behind paywalls or manipulated by proprietary algorithms.
Allowing AI companies to profit from widespread appropriation of data while claiming legal immunity threatens to create a future where access to knowledge is dictated by corporate power, rather than democratic values. this is not simply a legal or economic challenge; it’s a moral one.
How we treat knowledge – who has the right to access it, who has the right to profit from it, and who is held accountable for its misuse – has become a defining test of our commitment to democratic principles. It is imperative that we honestly assess the choices we are making and the future they will create. A future where knowledge is a public good, accessible to all, is a future worth fighting for.
This essay was written with J. B. Branch, and originally appeared in the San Francisco Chronicle.