The New York Times Restricts OpenAI's Web Crawler's Access
The New York Times has taken measures to block OpenAI's web crawler, preventing the organization from utilizing the publication's content to train its AI models. A review of the NYT's robots.txt page reveals that GPTBot, the crawler introduced by OpenAI earlier this month, has been disallowed by the newspaper.
Records from the Internet Archive's Wayback Machine suggest that the NYT implemented this block as early as August 17th.
This move follows the New York Times' recent update to its terms of service, explicitly prohibiting the use of its content for training AI models. Charlie Stadtlander, a spokesperson for The New York Times, declined to provide further comments on the matter. OpenAI has not responded to the request for comment at the time of writing.
Furthermore, there are indications that The New York Times is contemplating legal action against OpenAI for potential violations of intellectual property rights, as reported by NPR last week. If the Times proceeds with legal action, it would join a group of individuals such as Sarah Silverman and two other authors who filed a lawsuit against OpenAI in July.
This lawsuit concerns the use of Books3, a dataset used to train ChatGPT, which reportedly contains numerous copyrighted works. Additionally, programmer and lawyer Matthew Butterick has raised allegations that OpenAI's data scraping practices amount to software piracy.