Erosion of Anonymity: Mitigating the Risk of Re-identification of De-identified Health Data

Categories: FDA, Health Information Technology, Innovations in Health Care Delivery, Privacy and Security Law

One well-recognized way to protect patient privacy is to de-identify health data. However, trends around increases in publicly-available personal data, data linking and aggregation, big data analytics, and computing power are challenging traditional de-identification models. While traditional de-identification techniques may mitigate privacy risk, the possibility remains that such data may be coupled with other information to reveal the identity of the individual.

Last month, a JAMA article demonstrated that an artificial intelligence algorithm could re-identify de-identified data stripped of identifiable demographic and health information. In the demonstration, an algorithm was utilized to identify individuals by pairing daily patterns in physical mobility data with corresponding demographic data. This study revealed that re-identification risks can arise when a de-identified dataset is paired with a complementary resource.

In light of this seeming erosion of anonymity, entities creating, using and sharing de-identified data should ensure that they (1) employ compliant and defensible de-identification techniques and data governance principles and (2) implement data sharing and use agreements to govern how recipients use and safeguard such de-identified data.

De-identification Techniques and Data Governance

The HIPAA Privacy Rule (45 C.F.R. §164.502(d)) permits a covered entity or its business associate to create information that is not individually identifiable by following the de-identification standard and implementation specifications (45 C.F.R. §164.514(a)-(b)).

In 2012, the Office for Civil Rights (OCR) provided guidance on the de-identification standards. Specifically, OCR provided granular and contextual technical assistance regarding (i) utilizing a formal determination by a qualified expert (the “Expert Determination” method); or (ii) removing specified individual identifiers in the absence of actual knowledge by the covered entity that the remaining information could be used alone or in combination with other information to identify the individual (the “Safe Harbor” method).

As publicly-available datasets expand and technology advances, ensuring the Safe Harbor method sufficiently mitigates re-identification risk becomes more difficult. This is due to the fact that more data and computing power arguably increase the risk that de-identified information could be used alone or in combination with other information to identify an individual who is a subject of the information.

Given the apparent practical defects in the “Safe Harbor” method, many organizations are applying a more risk-based approach to de-identification through the use of the “Expert Determination” method. This method explicitly recognizes that risk of re-identification may never be completely removed. Under this method, data is deemed de-identified if after applying various deletion or obfuscation techniques the “risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information . . . .”

In light of the residual risks associated with de-identified data generally, it is important that organizations continue to apply good data governance principles when using and disclosing such data. These best practices should include: data minimization, storage limitation, and data security. Organizations should also proceed with caution when linking data sets together in a manner that could compromise the integrity of the techniques used to originally de-identify the data.

Data Sharing and Use Agreements

Regardless of the de-identification approach, the lingering risk of re-identification can be further managed through contracts with third parties who receive such data. Though not required by the Privacy Rule, an entity providing de-identified data to another party should enter into a data sharing and use agreement with the recipient. Such agreements may include obligations to secure the data, prohibit re-identification of the data, place limitations on linking data sets, and contractually bind the recipient to pass on similar requirements to any downstream other party with whom the data is subsequently shared. Further, such agreements may include provisions prohibiting recipients from attempting to contact individuals who provided data in the set and may also include audit rights to ensure compliance.

The risk of re-identification may be a tradeoff to realize the vast benefits that sharing anonymized health data provides; however, entities creating, using and sharing de-identified data should doing so responsibly and defensibly.

Alaap B. Shah