Artificial Intelligence (AI) : Why Metrics Are Not Fair Enough
Classic software testing cannot be easily transferred to Artificial Intelligence (AI). Model governance and internal audits are necessary to ensure fairness.
The use of artificial intelligence (AI) brings with it responsibility. Transparency, explicability, and fairness are essential principles that must be guaranteed, as must the high performance of the AI system. It makes sense to focus on areas with a tradition of verifiable processes to meet these requirements. Although these processes do not function flawlessly, security standards cannot be implemented without them. This is most evident in safety-critical and regulated industries such as medicine, aerospace, and finance.
Similar to how these areas need processes to meet relevant requirements, a company that uses AI systems needs regulated processes through which it controls access to machine learning models (ML), implements guidelines and legal requirements, the interactions with the models and their results, and records on what basis a model was created. Overall, these processes are referred to as model governance. Model governance processes are to be implemented from the beginning in every phase of the ML life cycle (design, development, and operations). The author has commented in more detail elsewhere on the specific technical integration of model governance into the ML life cycle.
Audits As Standardized Review Processes In The Model Governance Framework
An essential part of model governance is audited to check whether AI systems comply with company policies, industry standards, or regulations. There are internal and external audits. The Gender Shades study discussed by the author in the article ” Ethics and artificial intelligence: a new way of dealing with AI systems ” is an example of an external audit process: It tested facial recognition systems from large providers about their accuracy about gender and ethnicity and was able to found a different precision of the model depending on race and gender.
However, this view is limited since external test processes only have access to model results but not to the underlying training data or model versions. These are valuable sources that companies must include in an internal audit process. These processes are designed to enable critical reflection on the potential impact. First of all, however, the basics of AI systems must be clarified at this point.
Peculiarities Of AI Systems
To test AI software, it is essential to understand how machine learning works: Machine learning is a set of methods that computers use to make and improve predictions or behaviors based on data. ML models need to find a function that produces an output (label) for a given input to build these predictive models. The model requires training data that contains the appropriate work for the input data to do this. This learning is called “supervised learning.” In the training process, the model uses mathematical optimization methods to find a function that maps the strange relationship between input and output as well as possible.
An example of classification would be a sentiment analysis intended to examine whether tweets contain positive or negative moods (sentiments). In this case, input would be a single tweet, and the associated label would be the coded sentiment set for that tweet (−1 for negative, 1 for positive sentiment). In the training process, the algorithm uses this annotated training data to learn how to input data related to the label. After training, the algorithm can then independently assign new tweets to a class.
More Complex Components In The Machine Learning Area
Thus, an ML model learns the decision logic in the training process, rather than explicitly defining the sense in code with a sequence of typical if-then rules, as usual in software development. This fundamental difference between traditional and AI software means that methods of classic software testing cannot be directly transferred to Artificial Intelligence (AI) systems.