| Key Takeaway | Significance |
|---|---|
| The New York Times implemented a hard block on Wayback Machine crawling in late 2025, followed by the Guardian and USA Today Co. | Three major publishers have made a deliberate choice to remove their journalism from the public historical record. |
| Over 120 journalists, including prominent names such as Cory Doctorow and Rachel Maddow, have signed an open letter supporting the Internet Archive. | Significant professional and public pushback against the decisions. |
| Publishers are reportedly motivated by concerns about AI training data, not editorial or legal concerns. | The decisions reflect commercial positioning in the AI licensing market, not archiving policy. |
| Removing content from public archives creates accountability gaps in the journalistic record. | Stories that were deleted or corrected without notice become harder to identify and document. |
| Publishers have legitimate rights to their content but face reputational consequences for how they exercise them. | Legal rights and public interest responsibilities are not always aligned. |
| The episode highlights the fragility of digital journalism’s institutional memory compared to print. | A key vulnerability in the transition to digital-first publishing that most organisations have not addressed. |
| Platforms like Publishrs can help publishers implement content policies that balance commercial interests with editorial responsibilities. | Technology can support more considered approaches than binary block or allow decisions. |
When the New York Times quietly implemented a hard block on the Wayback Machine’s crawlers in late 2025, it set off a chain of events that has become one of the more revealing episodes in recent media history. Within months, the Guardian and USA Today Co. had followed with their own restrictions. Then more than 120 journalists, including some of the most recognisable names in American media, signed an open letter championing the Internet Archive and its preservation work.
The immediate context is the debate over AI training data and licensing. Publishers restricting Wayback Machine access are, at least in part, trying to assert control over their content as a negotiating asset in a market where AI companies need large text corpora. But the consequences extend well beyond any licensing deal, and the questions raised have relevance for every news publisher thinking about their relationship to the historical record.
What the Internet Archive Actually Does for Journalism
The Wayback Machine has been quietly indispensable to journalism for more than two decades. Reporters use it to document what websites looked like at specific points in time, to verify claims about what organisations previously stated publicly, and to retrieve reporting that has been deleted or substantially changed without notice.
The accountability function is not trivial
According to Nieman Lab, which broke the original story about the Times’ blocking decision, journalists across multiple disciplines rely on the Wayback Machine for accountability reporting. Rachel Maddow wrote in a testimonial supporting the open letter that she uses the archive daily and cannot imagine doing her work without it. For investigative journalists tracking how government and corporate statements change over time, the ability to access archived versions of web pages is not a convenience. It is a fundamental research tool.
The MTV News archive, preserved by the Wayback Machine, was cited by founding editor Michael Alex as an irreplaceable record of original reporting on music and popular culture. When news organisations delete content, whether because it is outdated, legally problematic, or commercially inconvenient, the Wayback Machine has historically been the last line of preservation. Blocking it removes that safety net entirely.
Print had institutional memory that digital has struggled to replicate
Physical newspaper archives in libraries and record offices have been a cornerstone of historical research for well over a century. The transition to digital-first publishing was supposed to make the journalistic record more accessible, not less. In practice, the combination of link rot, content deletion, and now active blocking of preservation systems has created significant gaps in the digital record that have no parallel in print publishing history. Reuters Institute research on digital journalism’s relationship to institutional memory has consistently found this a structural vulnerability of the digital era.
Why Publishers Are Making These Decisions Now
The timing of the blocking decisions is not coincidental. The rapid development of large language models and the growing market for licensing deals between AI companies and news publishers has created a commercial context in which content ownership and exclusivity have new financial value.
AI licensing as the underlying driver
Publishers including the AP, Reuters, and several major newspaper groups have reached licensing agreements with AI companies for access to their archives as training data. The financial terms of these deals vary, but the structural logic is consistent: if your content has value as AI training data, restricting its free availability strengthens your negotiating position. A publisher that has blocked all public crawling can credibly argue that any AI company wanting access to its archive must pay for it.
This is a commercially rational position. But it conflates two distinct questions: the right to licence content commercially, and the right to remove it from the public record entirely. Publishers have always had the former right. The latter is more contested, and the journalist response to the Times’ decision suggests that the profession does not see them as equivalent. Press Gazette has covered the licensing debate in detail, noting the tension between commercial interests and editorial obligations that it surfaces.
The reputational calculus
The open letter signed by more than 120 journalists represents a significant reputational signal. The signatories are not fringe voices. They include reporters and editors with decades of credibility at major institutions, who are publicly stating that the archiving decisions of the organisations involved are contrary to the public interest. For publishers who depend on public trust as their primary commercial asset, this is not a cost-free decision.
What This Means for Your Archiving Policy
Most publishers have not thought systematically about their archiving policy beyond basic decisions about what content to retain or delete. The Wayback Machine episode creates an opportunity to develop a more considered position.
The options are not binary
The decision to block or not block the Wayback Machine is not the only option available. Publishers can implement selective blocking, allowing archiving of news content while restricting certain commercial content. They can set specific rules about what types of content can be crawled and under what conditions. They can engage with the Internet Archive directly to shape how their content is preserved. Publishrs provides the content management infrastructure to implement nuanced policies of this kind, rather than relying on blanket technical blocks.
Transparency is a minimum standard
Whatever archiving policy a publisher adopts, transparency about that policy is a basic expectation. Publishers that implement blocks without explanation invite the kind of hostile interpretation that has characterised coverage of the Times’ decision. A clear, publicly stated rationale, even if contested, is significantly less damaging than the appearance of secretive action. According to research by the WAN-IFRA Trends in Newsrooms report, audience trust correlates strongly with perceived transparency in editorial decision-making, even when readers disagree with specific decisions.
Why did the New York Times block the Wayback Machine?
The Times has not given a detailed public explanation. Reporting suggests the decision was motivated at least in part by commercial considerations around AI training data licensing, rather than editorial or legal concerns.
What is the Internet Archive’s role in journalism?
The Internet Archive’s Wayback Machine preserves web pages over time, allowing journalists, researchers, and the public to access historical versions of websites. It is widely used for accountability reporting and preservation of deleted content.
Do publishers have the legal right to block web archiving?
Generally yes. Publishers can use robots.txt files to instruct crawlers including the Wayback Machine not to index their content. The legal rights are clearer than the reputational and public interest questions.
How does this relate to AI licensing?
Publishers restricting web archiving are, in part, trying to control access to their content as a commercial asset in AI training data licensing negotiations. Restricting free access strengthens the commercial argument for paid licensing agreements.
What should publishers include in an archiving policy?
A clear policy should address what content is available for archiving, under what conditions, what the rationale is, and how it will be communicated publicly. Binary block decisions without explanation are the most reputationally costly approach.
How can publishers manage archiving decisions practically?
Technology platforms including Publishrs support selective content policies that allow publishers to implement nuanced archiving rules rather than relying on blanket blocks.
The archiving debate will not be resolved quickly, but every publisher needs a considered position on it. If you need content management infrastructure that supports sophisticated editorial policies, Publishrs can help you get there.





