The three ways to assess AI

The number of regulatory frameworks that involve assessments of AI systems for various reasons has exploded in a short time and yet there is little consensus on questions like why, when, how and for whom they should be done in the first place. 

Big Four firm EY, in a recent report, said there are currently over 1,000 AI policy initiatives that include legislation, regulations, voluntary initiatives and agreements in 70 different countries, citing the Organization for Economic Cooperation and Development. Despite all being concerned with AI, they often differ in their overall policy objectives, the subject matter being assessed, the purpose behind the assessments, the methodology for conducting them, who is expected to perform the assessment, and even the terminology used to describe them.

"AI has been advancing faster than many of us could have imagined, and it now faces an inflection point, presenting incredible opportunities as well as complexities and risks," said Marie-Laure Delarue, EY's global vice chair of assurance. "It is hard to overstate the importance of ensuring safe and effective adoption of AI. Rigorous assessments are an important tool to help build confidence in the technology, and confidence is the key to unlocking AI's full potential as a driver of growth and prosperity."

On this last point, this is the reason why EY chose to simply use the term "assessment" to describe the veritable galaxy of different ways of inspecting AI systems.

EY's London office
The EY offices in London.
Jack Taylor/Photographer: Jack Taylor/Getty

Overall, though, the firm identified three main types of assessments based on looking at all the different regulatory frameworks, from international to national to state to local: 

  • Governance assessments: To determine whether appropriate internal corporate governance policies, processes and personnel are in place to manage an AI system, including in connection with that system's risks, suitability and reliability. 
  • Conformity assessments: To determine whether an organization's AI system complies with relevant laws, regulations, standards or other policy requirements. 
  • Performance assessments: To measure the quality of performance of an AI system's core functions, such as accuracy, non-discrimination and reliability. They often use quantitative metrics to assess specific aspects of the AI system. 

EY conceded that the categorization can be a little slippery sometimes, an assessment that evaluates governance over an AI system, for example, may also be an assessment of conformity such as an assessment of an organization's AI Management System against the ISO/IEC 42001 standard.
The report said that confusion over the myriad AI assessment frameworks can make it difficult to ensure consistent quality and accountability. Even when the objectives align, the specific requirements may still vary across jurisdictions: For example, various U.S. cities and states have policies that include assessments for bias in the AI systems used in hiring and employment, all with different requirements. NYC Local Law 144, for example, has different requirements for measuring bias than the state laws requiring bias assessments in Colorado and Illinois. 

There is also little uniformity over things that build confidence in conclusions, such as the extent of evidence required or the requirements for the providers of the assessments. EY noted, though, that assessments conducted by third parties may be viewed as more credible than those conducted by internal teams, especially if third party providers adhere to standards of professional responsibility, ethics and public reporting that internal teams might not be obligated to follow.

Another complicating factor is that mandatory AI assessments that evaluate compliance with a regulation will often be very different from voluntary assessments against a governance standard, such as the voluntary AI Risk Management Framework of the U.S. National Institute of Standards and Technology (NIST). 

The report also noted that while there can be very little specificity when it comes to terminology, particularly where it concerns broad terms like "fairness," "trustworthiness" and "transparency," which can create ambiguity unless specified further. Without this specificity, the usefulness of certain assessments may be more easily called into question. 

There's also the fact that AI systems themselves are often complex, integrated into larger environments and involve multiple stakeholders, making it difficult to even identify the appropriate subject matter for an assessment. Further, the variation in a model's results over time ("model drift") can make assessment outcomes outdated and misleading, and the variability of AI systems can complicate reproducibility. Lastly, the rapid advancement of AI technology may outpace the development of technical standards for evaluating performance.

The EY report said professionals need to understand just what, specifically, they're assessing and do so with a well-specified business or policy objective. The purpose should then inform the selection of appropriate methodologies and reference standards. AI assessment frameworks should also have a clear and sufficiently defined scope, including the type of assessment (e.g., governance, conformity or performance), the subject matter, and guidance regarding when the assessment should occur. 

When determining who should actually perform the assessment, EY said people should keep in mind the competency and qualifications, the objectivity and the professional accountability requirements of potential providers. This is not only to hold the provider accountable. EY said those that follow these standards and guidelines enable confidence and help stakeholders understand how assessments are provided. 

The report said business leaders should consider having someone assess their AI systems regardless of whether there is a regulatory requirement to do so, as they can help identify and manage evolving risks, as well as whether the systems work as intended. Further, market dynamics, investor demand or internal governance considerations may make a voluntary AI assessment advisable to build confidence in a business's AI systems. Moreover, if some AI systems are subject to regulatory obligations, business leaders may choose to use assessments to help measure and monitor compliance. 

"As businesses navigate the complexities of AI deployment, they are asking fundamental questions about the meaning and impact of their AI initiatives," said Delarue. "This reflects a growing demand for trust services that align with EY's existing capabilities in assessments, readiness evaluations and compliance."  

The report calls to mind another published last month by the AICPA and Chartered Professional Accountants Canada, which said the rapid rise of AI throughout the global economy opens up new opportunities for accounting professionals to provide independent assurance of these systems to help build trust and confidence in their functions.

It made a similar point as the EY report in saying that many are colloquially using terms like "AI audit" or "AI assurance" to refer to different types of engagements and assessments. The report noted that some of the services described as assurance services are performed by entities, such as technology consultancies and internal audit teams, that may not follow the same professional standards as assurance engagements performed by CPAs.

The report clarified that it's referring to an engagement in which an assurance practitioner designs and performs procedures to obtain sufficient appropriate evidence, based on the practitioner's consideration of risk and materiality, in order to express an opinion or conclusion about the subject matter in the form of an assurance report. The two organizations see great opportunity in this area, though not without challenges.

For reprint and licensing requests for this article, click here.
Technology Artificial intelligence EY
MORE FROM ACCOUNTING TODAY