Data Minimization: Are you doing it right?

Sep 28, 2021
4 min read

Updated: May 9, 2022

A few years ago, businesses were collecting more information than they needed in the hopes of it being useful in the future. The phrase “data is the new oil” was taken quite literally—organizations started accumulating all the data they could and for as long as possible.

As storage costs kept decreasing, many organizations did not make an active effort to delete unwanted data. It was likely driven by the assumption that the stored data could be of some use soon. This has resulted in large quantities of unwanted data accumulating in organizations.

Lately, organizations and regulators have been identifying the risks associated with the collection and storage of too much information. The concept of data minimization became more prominent after it was introduced as one of the fundamental principles under the General Data Protection Regulation (GDPR).

What is data minimization?

Data minimization is the practice of limiting the collection of personal information to what is needed for the intended purpose.

Article 5(1)(c) of the GDPR states, “personal data shall be adequate, relevant and limited to what is necessary for relation to the purposes for which they are processed (data minimization).”

Though not mandated under the California Consumer Privacy Act (CCPA), data minimization is a requirement under the California Privacy Rights Act (CPRA). Embracing data minimization will save time, money and reduce risks for the organizations. Excessive data not only makes them vulnerable to privacy breaches but also makes it difficult to store and manage data. Organizations could also land in regulatory trouble for not complying with the laws governing the collection and storage of data.

In 2019, the electronic payment company UAB Mister Tango was penalized by the Lithuanian State Data Protection Inspectorate (VDAI) for collecting excessive data and storing it longer than necessary. The case marked the first administrative fine in Lithuania for violation of GDPR, emphasizing the fact that data minimization requirements need to be met diligently.

Embracing data minimization, as one of the most important tools in the toolbox for privacy, shows an organization’s commitment to privacy. Furthermore, the value of data diminishes with time. Deleting obsolete data on a regular basis will ensure the availability of current and accurate data for analytics.

Managing more and more data in data centers is also a costly affair, and the present regulatory landscape could make it challenging to comply with the local laws and regulations around data storage. Following a data minimization and retention schedule will help comply with the data regulation laws and allow for better management of data within the organization.

Before deciding what to delete and what to retain, it is necessary to understand the data that is present and how many copies of it are stored within the organization. Data classification helps organize the data by tagging it to identify the type of data, its associated risks, and the value it holds in case of a loss or theft. By understanding these factors, organizations can employ the necessary measures to safeguard the data and comply with the relevant regulations.

The company’s data retention policy and schedule should specify for how long a particular type of data should be stored as per the company’s legal and operational requirements and how the obsolete data should be disposed of. Mapping the data you have against the data retention schedule will help you understand gaps and develop a plan for implementing data minimization across the enterprise.

Challenges with Data Minimization

A growing business with an expanding client base needs to collect and store new customer information on a regular basis. This requires a systemic process that allows to sort the data, analyze it and implement the necessary deletion and retention techniques.

Even a strong data minimization and retention policy can’t provide the desired results if the organization is unaware of where the data exists. It is difficult to track the data stored across multiple systems and servers for many years, especially if it involves a wide range of data.

Another method that has been in practice for over a decade is the big bucket approach to data retention. Contrary to a traditional retention schedule, the big bucket method sorts records into broad categories based on the organization’s business functions, operations, use cases, etc.

But this mechanism is not fully efficient in today’s context as it applies similar retention needs and the same retention period to the entire category. As the retention period is based on the longest retention requirement for any record within the category, all the data in the category must be stored for a longer period, which defeats the purpose of data minimization to some level.

With more diverse data available today, it is difficult to categorize large volumes of data into buckets as there are many differences within the types of data. The big bucket also erases the identity of the record series. Though it can be used for documents saved on network drives, the big bucket approach is not practical for databases that require modification for applying the required retention operation.

A robust data mapping system is critical in understanding what data is created, how it is classified, where it is stored, and how it is used across the organization. For data minimization to be effective, you need to understand how the business is using the data. This is not possible without a comprehensive DataMap.

Click here to know how Meru can help.