Building an AI Model: Data and Risks
The UK government recently announced a national AI strategy that aims to maximize the potential of AI in the country over a ten-year period. This step highlights the potential of AI and how crucial it would be for the development of mankind. We are at a juncture where it is important not just to innovate, but also to improve the existing technologies to make them more acceptable and dependable.
However, risks in AI need to be addressed to realize its potential benefits. Several companies have identified and acknowledged defects in their AI models in the last few years. These range from data bias to societal AI bias that results in inaccurate predictions and unreliable outcomes. A 2021 survey revealed that 34% of the Chief Compliance Officers (CCOs) considered protection/fairness as one of the top regulatory and compliance priorities for an organization. Building trust through transparency, fairness, and equitable treatment is now one of the primary concerns, and fairness has become an integral part of every ethics and compliance program. The growing public awareness, coupled with a focus on ESG objectives and increased scrutiny by regulators, have added to the need for fairness and transparency. In a previous blog post, we discussed the principles that can help develop a framework around responsible AI. While it is not an easy exercise, ensuring that the right practices are incorporated at every stage of development will help in identifying the flaws in the AI model.
Significance of Data in AI Development
Any AI model is essentially dependent on the kind of data being fed to it. AI learns what it is taught, and it can pick up on prejudices (gender, racial or societal bias) and amplify them further. This can cause substantial harm, especially when making life-altering decisions in hiring, healthcare, credit approvals, criminal justice, etc.
For instance, a criminal justice model used by Broward Country, Florida categorized African American individuals as “high risk”, at nearly twice the rate that of Caucasian individuals. Such revelations emphasize the need for understanding the sources of data bias for taking appropriate actions.
While it is important to include accurate data sets for training the model, it is equally important to protect sensitive information hidden in the data. Training the AI model with sensitive data can raise concerns around privacy and data misuse. With data dispersed across various locations in most organizations, it is very important to know what data is being fed into the AI model. Based on the AI framework employed by a particular organization, the choice of data used in the model rests with the overall custodian or owner of the model.
The NIST proposes a list of nine factors that contribute to a person’s potential trust in an AI system (see below). These factors are different than the technical requirements of trustworthy AI recommended by NIST.
Whether an organization follows NIST or a different framework, a risk-based approach is necessary to understand the impact of the decisions made by these AI systems, the type of data used to develop the model, and other factors of the AI system. This approach should assess whether an AI or machine learning model should be used for certain types of decisions and if additional controls or thresholds need to be in place.
Customers are increasingly looking for personalization and AI can provide this experience in real-time. AI makes decisions and recommendations based on various attributes and reduces the decision-making time and improves customer experience. These decisions have the potential to have significant detrimental impacts on individuals. Hence, organizations must take an adaptive approach to governance and ensure their approach is grounded in the principles of transparency and fairness.
How can Meru help?
1. Identify: The first step in mitigating AI risks is identifying them. This can be accomplished by an extensive DataMap that can map the different types of data and their location. The DataMap also documents the processing activities, so it is possible to understand all the places AI/ML is used and how it's being used (for example, is it being used for routine and repetitive tasks or for automated decision making). Since the DataMap identifies the flow of information in and out of systems, it is also possible to identify the type of data used by the model and source. This also helps to understand the type of risks around data.
The DataMap not only identifies the systems or processes that use AI, the type of processing performed by AI, Data, and people impacted by these decisions but also can provide metrics around the 9 factors that contribute to the trust in AI systems as needed.
Once the identification step is complete, it is possible to employ the required control measures such as encryption, using synthetic data or data augmentation, suppression, managing access controls, or other preventive actions.
Taking an approach to data ownership can help everyone be mindful of the risks associated with using AI/ML and proactively take steps to monitor and mitigate those risks. A strong governance strategy can help identify the areas and applications that need further evaluation and control. An end-to-end understanding of the data sets will recognize discriminatory data and recommend the addition of more data to compensate for under-represented data classes.
2. PIA: Our Privacy Impact Assessment (PIA) provides an analysis of how personally identifiable information is collected, used, shared, and maintained within the company. PIA is an excellent instrument to assess how privacy protections have been incorporated at the system level and across the organization throughout the information life cycle. It allows organizations to raise ethical and privacy issues, balance interests, and if necessary, propose mitigating controls or measures.
PIA captures potential risks or vulnerabilities and the likelihood of these vulnerabilities to occur in the AI model. It can be used to identify privacy and bias-related issues within the data sets. The Privacy Risk Register can help in evaluating the overall privacy maturity to measure and manage the privacy and the overall data ethics.
3. Risk Alignment: Meru’s DataMap can help identify and understand the type of data used to train the AI model. The heat maps will significantly assist in conducting risk assessments across all AI systems as it uses the potential risks and impacts to prioritize them for decision making. This will help to focus on those risks that need adequate attention from all stakeholders.
Firms need to consider potential liability from misuse of a system or product, but it is unrealistic to expect companies to anticipate and prevent every possible unintended consequence. Taking a risk-based approach to work closely with various business teams to identify concerns, understand root causes, and deploy effective and streamlined response efforts to both enhance compliance and business controls are recommended.
4. Security and Privacy: The biggest challenge for AI is that there can be many hidden decision processing layers that can make auditability and traceability of AI-related risks challenging. Some of the existing risks around privacy and security can be harder to identify in an effective and timely manner or manifest themselves in unfamiliar ways.
AI also poses some challenges from a privacy perspective, as privacy regulations expect a company to explain to customers how their personal data is used. Customers should also have the ability to exclude themselves from automated decision-making.
As the risks are identified and controls being put in place to manage these risks, designing a methodology for assessing the effectiveness of these measures, including relevant metrics for measuring effectiveness and tolerance thresholds, is necessary. DataMap can prove to be valuable in monitoring performance and providing much-needed metrics. In addition, it is important that organizations that utilize AI models have a robust data minimization process in place. This ensures that the data used in the AI models do not include any outdated or unwanted information.
5. DSAR: When training models with subject data, individuals have the right to make Data Subject Access Requests (DSAR) to access, correct, or delete their personal data being processed by the organization. This includes requests to be excluded from automated decision-making under the California Privacy Rights Act (CPRA).
Meru can quickly create and automate workflows within the organization for processing these DSARs. The workflows can also assign specific tasks that need to be handled manually to individuals and third parties. Responses to certain questions can trigger additional workflows or alert process owners of potential issues ahead of time. Meru provides automated responses to DSARs to improve efficiency and enable timely responses to all requests.
The suite of tools available with Meru can help provide transparency and awareness to proactively manage and address these risks.