Building an AI Model: Data and Risks

The UK government recently announced a national AI strategy that aims to maximize the potential of AI in the country over a ten-year period. This step highlights the potential of AI and how crucial it would be for the development of mankind. We are at a juncture where it is important not just to innovate, but also to improve the existing technologies to make them more acceptable and dependable.

However, risks in AI need to be addressed to realize its potential benefits. Several companies have identified and acknowledged defects in their AI models in the last few years. These range from data bias to societal AI bias that results in inaccurate predictions and unreliable outcomes. A 2021 survey revealed that 34% of the Chief Compliance Officers (CCOs) considered protection/fairness as one of the top regulatory and compliance priorities for an organization. Building trust through transparency, fairness, and equitable treatment is now one of the primary concerns, and fairness has become an integral part of every ethics and compliance program. The growing public awareness, coupled with a focus on ESG objectives and increased scrutiny by regulators, have added to the need for fairness and transparency. In a previous blog post, we discussed the principles that can help develop a framework around responsible AI. While it is not an easy exercise, ensuring that the right practices are incorporated at every stage of development will help in identifying the flaws in the AI model.

Significance of Data in AI Development

Any AI model is essentially dependent on the kind of data being fed to it. AI learns what it is taught, and it can pick up on prejudices (gender, racial or societal bias) and amplify them further. This can cause substantial harm, especially when making life-altering decisions in hiring, healthcare, credit approvals, criminal justice, etc.

For instance, a criminal justice model used by Broward Country, Florida categorized African American individuals as “high risk”, at nearly twice the rate that of Caucasian individuals. Such revelations emphasize the need for understanding the sources of data bias for taking appropriate actions.

While it is important to include accurate data sets for training the model, it is equally important to protect sensitive information hidden in the data. Training the AI model with sensitive data can raise concerns around privacy and data misuse. With data dispersed across various locations in most organizations, it is very important to know what data is being fed into the AI model. Based on the AI framework employed by a particular organization, the choice of data used in the model rests with the overall custodian or owner of the model.