Dutch AI Model GPT-NL Fuels Up with National News Archives
Collaboration Aims to Create Transparent, Copyright-Respecting AI Alternative
A significant stride has been made towards developing a Dutch artificial intelligence language model, with major news organizations partnering with research institute TNO. This initiative, tentatively named GPT-NL, seeks to create a homegrown competitor to global AI giants, trained on a vast trove of Dutch journalistic content.
Groundbreaking Data Agreement Paves Way for GPT-NL
An agreement has been finalized, granting TNO access to the archives of nearly all Dutch news companies. This collaboration, announced this Thursday, is a pivotal step for GPT-NL, with the AI model projected to be operational by the end of this year or early next. TNO is working alongside co-founders SURF and NFI to bring the project to fruition.
The deal allows TNO to utilize articles from major media groups like DPG and Mediahuis, including publications such as *NRC*, as well as content from the branch organization NDP Nieuwsmedia and the news service ANP. This influx of data effectively doubles the resources available for developing GPT-NL.
Addressing Copyright Concerns in AI Development
The project is backed by a €13.5 million investment from the Ministry of Economic Affairs, with the explicit goal of fostering “transparent, honest, and testable use of artificial intelligence.” GPT-NL is envisioned as the Dutch answer to AI models developed by international tech firms, which have faced criticism for utilizing copyrighted material without explicit permission or compensation.
“The biggest theft ever of copyright protected material.”
—Christian Van Thillo, CEO of DPG
This sentiment was echoed by Christian Van Thillo, CEO of DPG, who has previously decried the unauthorized use of journalistic work by AI companies. Internationally, media outlets have adopted varying stances; The New York Times has initiated legal action against OpenAI, while others like the Financial Times and The Guardian have opted to license their data to AI firms.
A United Front Against Big Tech Dominance
In the Netherlands, news publishers have opted for a collective approach rather than individual deals with AI companies. This strategy aims to level the playing field, enabling smaller publishers to participate and pooling a larger dataset that is more attractive to AI developers. While the news companies are not currently receiving direct payment for their data contributions, they will share in any future revenues generated by GPT-NL.
Herman Wolswinkel, director of NDP Nieuwsmedia, highlighted the significance of this unified front:
“Big Tech companies regularly state that publishers stand in the way of AI-innovation, because it would not be possible to make agreements with all entitled people. That argument no longer counts in the Netherlands: there is now one counter where AI-makers, including the large technology companies, can go for a great, high-class data as axis.”
—Herman Wolswinkel, Director of NDP Nieuwsmedia
Articles from the past six months are excluded from the agreement. This measure is intended to prevent AI developers from creating tools that could directly compete with news organizations, particularly as a significant portion of the Dutch population, especially young people, already utilize chatbots for information. Wolswinkel emphasized the importance of maintaining the media’s role as a reliable news source.
Building an Ethical AI with European Values
GPT-NL is set to distinguish itself from models like ChatGPT and Gemini through its commitment to copyright respect and data transparency. Unlike AI models trained on broadly scraped internet data, GPT-NL will incorporate content from authoritative sources such as parliamentary records, legal databases, and cultural institutions, in addition to the news archives. Personal data will be filtered to ensure compliance with European privacy regulations like the GDPR.
The AI model will not be freely available, initially targeting businesses, institutions, and governments for applications like customer service chatbots. Selmar Smit, founder of GPT-NL and manager of Science and Technology at TNO, likened GPT-NL’s development to providing further education to a student, acknowledging that the €13.5 million budget is modest compared to the billions available to large tech companies.
Smit sees strong potential for GPT-NL, particularly in serving the government and Dutch multinationals who must adhere to AVG compliance. He believes GPT-NL offers a secure alternative that could displace foreign AI models for organizations prioritizing data privacy and ethical standards. The project’s inception stemmed from a desire to reduce reliance on opaque, large American tech firms and to build an AI aligned with European values.
Navigating Domain Confusion
A note of caution has been issued regarding a similarly named website, gpt-nl.com, which TNO states is unaffiliated with their official project. The TNO GPT-NL website warns users about this imposter domain, which appears to have been registered by a “domain farmer” seeking to profit from the project’s name, which is not yet protected.