Errors in Deloitte report underscore need for care with AI

In an object lesson for reviewing your AI outputs, Big Four firm Deloitte will partially refund the Australian government for an advisory report containing inaccuracies that were introduced by one of its AI models. 

The report in question pertained to an Australian study on a targeted compliance framework to prevent people from abusing government benefits that was initially released over the summer. A statement from the Australian government earlier this month said, "There have been media reports indicating concerns about citation accuracies which were contained in these reports," and added, "Deloitte conducted this independent assurance review and has confirmed some footnotes and references were incorrect."

The changes were made after an expert in welfare law noted several errors in the report. It contained numerous references to studies that did not actually exist, cited made-up publications, falsely quoted a judge, and faked a reference to a court decision. The revised report, which Deloitte published after excising the inaccuracies, discloses that it was at least partially developed using a generative AI large language model. 

Generative AI
Naret - stock.adobe.com

The Australian government said that despite the errors, the main substance of the review was retained and there were no changes to the actual recommendations.

Governance concerns

The incident underscores the need for strong AI governance in order to mitigate the risks of this new technology. This ranges from finding ways AI fits into current governance and compliance structures to developing policies that specifically pertain to AI

Yet, while organizations generally are aware of the need for AI governance, actual execution has tended to lag behind. Governance, risk and compliance solutions provider AuditBoard came to this conclusion as a result of a survey it conducted that found over 80% of respondents said their organizations are either very or extremely concerned about AI risks but, at the same time, only 25% said they have fully implemented an AI governance program. 

Meanwhile, though 92% of respondents said they are confident in their visibility into third-party AI use, just 67% of organizations report conducting formal, AI-specific risk assessments for third-party models or vendors. That leaves roughly one in three firms relying on external AI systems without a clear understanding of the risks they may pose.

Further highlighting the issue is that organizations seem to struggle with actually controlling AI use among employees. A survey from Top 100 Firm EisnerAmper found only 22% of people said their organizations even monitor AI use in the first place, and only 11% block ChatGPT and other public models. The survey also found that only 36.2% have an AI policy, only 34.2% say their company emphasizes transparency when discussing AI, and only 34% say their company has an AI strategy. In addition, a significant portion of professionals don't really tell their supervisors they're using AI. While slightly more (22.4%) say they get permission first before using AI, almost as many (21.7%) have no such reservations; 22.2% either might or might not. 

Another issue highlighted by this most recent incident is that while people know they should not blindly trust AI outputs due to the possibility of error, most do anyway. The EisnerAmper survey found that while about 81% of respondents were very or somewhat confident in the results of their outputs, when asked how often they find errors, 28.4% said "not very often" and 3.4% never find errors. Only 10.3% were supremely confident in their ability to spot errors. 

Other studies are similarly grim. A McKinsey survey found that just 27% of respondents whose organizations use generative AI say that employees review all content created before it is used. A similar share say 20% or less of gen-AI-produced content is checked before use. And another study from trend analytics company ExplodingTopics found the problem was even more severe: Only 8% of people regularly bother to verify AI information, and 42.1% of web users have experienced inaccurate or misleading content in AI overviews. 

We can see this playing out in the rise of what a Harvard Business Review article dubbed low-effort "AI workslop," which was defined as "AI-generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task." For example, reports that look polished and read well but make no sense, computer code missing vital context, or a slide deck that looks fine until you realize half the information is outright wrong. It is not difficult to imagine that those sending such content likely did not take the time to verify it before passing it on to another worker. 

Of 1,150 U.S.-based full-time employees across industries polled, 40% report having received such content in the past month. Typically, those who receive it have to then spend time verifying information, correcting errors, and otherwise doing work that the person who sent it should have already done. Employees said such low-effort, low-quality content makes up about 15.4% of all content they receive at work. The researchers noted that people spend an average of one hour and 56 minutes dealing with each instance of "workslop." Based on participants' estimates of time spent, as well as on their self-reported salary, the researchers found that these incidents carry an invisible tax of $186 per month per person. 

What all these studies indicate is that while having a human in the loop is vital for organizations using AI, it's more important that those humans actually work to scrutinize AI outputs.

For reprint and licensing requests for this article, click here.
Technology Artificial intelligence Deloitte Data governance
MORE FROM ACCOUNTING TODAY