Home » Health » Lessons for synthetic data from care.data’s past

Lessons for synthetic data from care.data’s past

Synthetic Data Must Learn From NHS Data Fumbles

Confidentiality, Consent, and Transparency Key to Public Trust

Future use of synthetic health data hinges on addressing critical concerns that previously derailed NHS data initiatives, like the ill-fated care.data programme. Lessons learned highlight the absolute necessity of robust patient confidentiality, clear consent protocols, and unwavering transparency to build public confidence.

Confidentiality: A Constant Battle

Concerns over patient re-identification were central to opposition against care.data, a point championed by groups like medConfidential. Even with pseudo-anonymisation, the risk of linking data back to individuals persists, subjecting it to stringent UK General Data Protection Regulation laws.

Professional bodies, including the British Medical Association (BMA) and the Royal College of General Practitioners (RCGP), echoed these anxieties, fearing a detrimental impact on the patient-doctor relationship and ultimately, patient care. NHS England’s inability to sufficiently allay public and professional fears regarding re-identification risk ultimately contributed to care.data’s downfall.

Risk Stratification for Synthetic Data

To navigate these challenges, a tiered approach to privacy metrics for synthetic datasets is proposed. Categorising data by risk level—low, medium, and high—will enable policymakers to implement proportionate safeguards and reassure stakeholders.

Such risk stratification should involve national consensus from cross-functional teams, merging technical expertise with sector-specific knowledge. Low-fidelity synthetic data, posing minimal re-identification risk, could face less rigorous access protocols, facilitating broader data sharing. For instance, a pilot program offers publicly downloadable low-fidelity synthetic data derived from Hospital Episode Statistics aggregate data.

Medium-risk datasets might necessitate enhanced security measures, such as Trusted Research Environments (TREs). Organisations unable to support TREs would be guided towards generating low-fidelity synthetic data. High-fidelity synthetic data, however, would require adherence to the same stringent access procedures as real-world data, potentially leading organisations to opt for direct real-world data acquisition instead.

Consent: Rebuilding the Social Contract

A significant failure of care.data was its inadequate approach to patient consent, widely perceived as a breach of patient autonomy. Proposed consent strategies, such as posters in GP practices and mailed leaflets, proved problematic.

These methods relied on assumptions of readership and excluded individuals with literacy or language barriers. Unaddressed leaflets were often dismissed as junk mail, and households opting out of unsolicited mail were entirely missed. Furthermore, feedback indicated a lack of explicit mention of care.data by name, insufficient detail on risks, including re-identification, and unclear opt-out mechanisms.

Public acceptance is paramount for the success of synthetic data initiatives. While legal compliance is necessary, it does not guarantee social legitimacy. As noted by **Carter et al.**, data-sharing projects depend on a ‘social contract’ built on trust and transparency.

Patients must grasp the nature of synthetic data, its associated risks and benefits, and their rights as data subjects. Without this clear communication, synthetic data projects risk facing public backlash, mirroring care.data’s fate due to a fractured social contract. Consequently, meaningful engagement with Patient and Public Involvement and Engagement groups must be a priority for policymakers.

Transparency: Who Holds the Data?

A key critique leading to care.data’s demise was the lack of clarity regarding data access. Amendments to the Care Act 2014 later prohibited data release to certain commercial entities, like marketing and insurance firms.

While research indicates public concern lessens when data access conditions include clear public benefit, care.data’s clarifications arrived too late. More recent NHS data-sharing plans, such as those involving a federated learning platform, have encountered difficulties, largely stemming from controversy over a contract awarded to the US tech company Palantir.

To prevent similar transparency failures, synthetic data initiatives must clearly inform patients about intended users before implementation. External organisations seeking access to medium and high-fidelity synthetic data should undergo a vetting process to ensure their rationale serves the public good.

Patients should retain the right to opt out of synthetic data generation from their personal data and decide whether commercial entities can access it. This approach respects patient autonomy and choice, addressing concerns about commercial data usage.

In the wake of controversies surrounding NHS England’s £480 million contract with Palantir for its federated learning platform, synthetic data projects must be transparent about data creation. A designated NHS public body should ideally own and manage access to synthetic data, fostering public reassurance and avoiding the pitfalls seen in other privacy-enhancing technology (PET) initiatives.

When outsourcing is unavoidable, potential conflicts of interest must be thoroughly assessed and disclosed to ensure partner trustworthiness in handling sensitive information. The openSAFELY federated platform, a publicly funded collaborative effort, has garnered support from the BMA, RCGP, and medConfidential, demonstrating that trust in the same technology can vary significantly depending on the platform’s management.

In conclusion, synthetic data holds significant promise for addressing AI development challenges related to data availability and imbalance. For UK-based synthetic data initiatives to succeed, they must internalise the lessons from past endeavours like care.data by prioritizing patient confidentiality, informed consent, and organisational transparency, all with the ultimate goal of enhancing patient care.

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.