Activate data privacy with Privacy enhancing technologies
What are PETs and how can they enable privacy within your organization?
The importance and priority for data privacy and security has increased tremendously today for individuals, companies and regulatory agencies. Businesses are having to rethink how to approach privacy given the increasing awareness among consumers and the escalating focus from regulators. The cost of privacy compliance can be significant and it is very difficult to address privacy issues retroactively. A proactive and pragmatic approach to privacy is the only way forward. Companies are realizing the need to incorporate privacy by design.
Privacy Enhancing Technologies (PETs) are a broad range of technologies that in conjunction with changes to policies and business frameworks make it possible for companies to be data-driven without compromising the privacy of its customers and employees. A recent Pew Research Center survey found 79% of American adults are concerned regarding how service providers and applications use their data while 52% may go as far as refusing products or services with privacy threats. PETs can potentially reshape the data economy and foster relationships of trust between users, corporations and regulatory agencies.
How do Privacy Enhancing Technologies work?
PETs help address privacy and security challenges in numerous ways to enable anonymity, pseudonymity, unlinkability and unobservability of data subjects. While encryption and data obfuscation are among the most commonly used technologies, PETs include the following technologies:
Typically, encrypted data needs to be decrypted prior to processing. Decrypting exposes the data to the very same threats encryption tried to safeguard against in the first place. Homomorphic encryption could be the ultimate answer to vulnerability inherent in all other approaches to data protection.
Homomorphic encryption allows computation or processing to be performed on encrypted data without needing to decrypt data. It requires a public key to encrypt data and allows only the authorized individual with the matching key to access its unencrypted data. With homomorphic encryption, data processing can be performed by employees (or a third party) on encrypted data without decryption.
Differential privacy approaches try to preserve the privacy of an individual within a group when data about the group is shared. This is achieved by injecting a minimum random perturbation or noise into the aggregate data set that does not alter the “characteristics” of the aggregate data set while also ensuring that it is not possible to understand attributes about any single individual in the group. This allows the bare minimum about an individual’s data is known when looking at the group level data. The Laplace mechanism is a commonly used mathematical technique to inject the random noise into the aggregate dataset. Differential privacy approaches try to make sure it is not possible to determine from an analysis of the aggregate dataset whether any individual’s data was or was not included in the aggregate dataset, thereby providing privacy about an individual’s data. These techniques have been used widely in analysis of data in various areas including data from census, recommendation systems, location-based services and social networks.
Large datasets used for training machine learning algorithms may contain sensitive information that an organization or individual may not be willing to share. Federated learning is a more privacy-friendly approach that enables the training of machine learning algorithms on all available data in such a way that the integrity and privacy of data are protected.
Instead of accumulating all the training data at one centralized point for training, federated learning trains machine learning algorithms on decentralized edge devices (like mobile phones) or servers based on the data available in each node or device. This way, the training of models gets subdivided into training that is performed locally, on-device, or within the organization itself. In these models, data from an edge or a device is not transferred at any point to a centralized server; only the results of trained algorithm are shared in an anonymized form with a central server to produce a ‘global’ model. This approach provides additional safeguards against potential intruders being able to access data or infringe on privacy.
Zero-knowledge Proofs (ZKPs)
First introduced by MIT researchers in 1985, ZKPs employ cryptographic algorithms to verify the veracity of information without exposing the data itself. ZKPs bypass the requirement of sharing personal data for proving one’s personal identity, paving a way for the creation of an identity authentication system without the risk of a data privacy breach. ZKPs can work effectively in the development of fraud prevention systems that require users to validate competence (in terms of sharing sensitive information) for buying a product or service. ZKPs can be used to protect data privacy in a diverse range of scenarios including deanonymization of users in blockchain technology, online voting and demonstration of income or age within the admissible range.
Data Masking Techniques
Some data masking techniques also act as privacy enhancing technologies. Data masking techniques aim to create a fake yet realistic version of sensitive data. They serve the purpose of creating a version of the original data that cannot be deciphered or reverse engineered. Data can be altered using several methods, such as character shuffling, encryption, character or word substitution.
Pseudonymization: There is an explicit mention of pseudonymization in the European Union’s GDPR [Article 4(5)], which discusses different data protection techniques including data masking, encryption and hashing. Pseudonymization is a data management procedure that replaces personally identifiable information (PII) with one or more artificial identifiers. Later, these identifiers could be recalled to re-identify the record.