When it goes right, AI can seem like magic—all the busywork done for you, data shaped into clear insights, problems solved before you even notice. But when it goes wrong it can be a nightmare of wasted time and expensive mistakes.
Unfortunately, there are a lot of ways it can go wrong. But while it can be tempting to blame the software, most cases of AI error come down not to the bot but the user.
One of the biggest issues, mentioned over and over, is that people simply do not doublecheck the outputs of their AI models. While people are told over and over again that AI models can make mistakes, the majority of people generally do not take the time to confirm what AI is telling them.
A
David Wood, an accounting professor at Brigham Young University who

"I think the No. 1 error in terms of frequency is some type of small hallucination, and the user copies and pastes and doesn't review. They don't review the output. That is the biggest one. They go too fast without considering what it actually says. That's 95% of errors I've seen," he said.
Jeff Seibert, founder and CEO of accounting solutions provider Digits, made a similar point. He noted that Digits' goal as a company is to automate the preparation of the books, not to review them or verify them. That is up to the user. He stressed the importance of really making sure humans doublecheck what the AI is telling them.
"That is the biggest risk: a business owner just blindly trusting the AI bookkeeping model. It may look correct, but might not follow the guidelines because it will be biased by what has been previously done in their books. … Our guidance is that every business should still work with a firm or have someone qualified who can review and sign off on the finances," said Seibert in an interview.
Gina Montgomery, director of AI, automation and analytics at Top 25 firm Armanino, said the damage that comes from ignoring the human review element may not just be monetary but legal and reputational as well, as it could lead to firms submitting flawed deliverables just because the AI sounded confident. We've already seen large organizations turn in work with AI-generated errors and suffered for it, she noted.
"That's the exact kind of instance where you talk about human review: fabricated data, hallucinated citations, misapplied logic can reach regulators or investors before [the firm detects it] if you don't have that layer built in," she said.
This type of risk grows as oversight weakens. She said firms should not apply a "set it and forget it" mindset when it comes to AI, particularly where it concerns automation. It's important to clearly define what an AI is and is not allowed to do, but too many organizations ignore this control.
"The most common missing control is a clear definition of what the AI is allowed to do, installing an automation tool and assuming the guardrails are just built in, but not documenting approval thresholds or decision boundaries," she said.
An AI system without these controls operates without context and becomes more likely to make errors.
But even if controls are built into the process, there is also the matter of getting people to follow them, as the phenomenon of "
This means many of the errors come down to the individual.
"It's not Walmart putting this into their AP process and it goes awry," said Wood, the BYU professor. "If there are mistakes, most will be from shadow AI or shadow IT. Someone is using it and not letting other people know. It's not part of the regularly designed process."
Ellen Choi, the founder and CEO of accounting AI consultancy Edgefield Group, noted that many large accounting firms already have strong control frameworks in place. Like Wood, she has not seen much in the way of catastrophic AI failures. However, she did point out there seems to be a consistent thread of people not necessarily knowing who is and is not using AI within the firm.
"I'm not sure what it is, but people don't seem to want to talk about AI use with each other within the firm. … You get these meetings where I ask about AI use, and one person waxes poetic about how they love it and everyone else looks surprised. It happens all the time. It seems pretty obvious that when firms don't know what their people are doing, this can have unintended consequences of not using AI in a way that is compliant according to the guardrails the company may have set up," she said.
For instance, she spoke of one firm where a junior associate put a client document into ChatGPT despite being explicitly warned against doing that specific thing. The only reason this person was caught was that the firm did happen to have IT controls that allowed it to detect the mistake—or else no one would have known. While the firm was able to work with OpenAI to get the documents removed from its training data, Choi said this is very rare and, most commonly, "it's like ink and water. Once something has been added to the training data, it's very, very difficult to isolate and remove it."
"That person did get fired. Did it have actual business consequences beyond the risk exposure, the bottom-line financial impact? No. But I think this kind of stuff is definitely something firms should be vigilant about," she added.
Beyond expense and embarrassment, failing to implement proper controls over AI and keeping the human in the loop can serve to further degrade the usefulness of a firm's AI model. This is because many models will actively learn from what humans do, picking up the processes and procedures particular to that practice. Much like any human worker, if they're given bad information, they'll produce bad results, and if they're not corrected by a supervisor, they'll think that's what they're meant to be doing.
"It learns the best practices of each accounting firm," said Seibert from Digits. "Obviously, if you have a rogue accountant in your firm doing bad accounting, yes, the model could pick up some of those practices." But he stressed there is also a global model that can correct the local firm model if it acquires bad habits.
Montgomery, from Armanino, talked about how letting mistakes into the training data can have cascading effects, which underscores the need for leaders to have independent verification layers before anything reaches the general ledger.
"AI errors that reach the ledger could directly affect reported earnings or compliance status. The most frequent examples might be misclassifications, incorrect accruals or unauthorized payments. These are not code failures but governance failures. When AI is trained on inconsistent data or outdated coding logic, it can replicate past mistakes at scale. A misclassification error that a human might make one month could be repeated thousands of times automatically, which could certainly cause some big issues," she said.
To illustrate her point, she recalled a medical organization client who would have made a very expensive error if not for last minute human intervention. The AI was attached to the procurement system, where it was responsible for ordering supplies as needed. Given a great degree of autonomy, the AI did most of the work itself. In this case, the AI had to order more gloves. Unfortunately, it had confused 20 boxes for 20 cases after misinterpreting a unit field.
"That discrepancy was caught during manual review, so those validation points are not bottlenecks. A lot of people think about it that way, thinking, 'oh no, I've got to add a human to that.' But it's more like the brakes that make the automation safe," said Montgomery.
Controls also need to account for the types of AI used; if someone is using the wrong kind of model for something, it won't matter how good the data is or how strong the guardrails; it won't be able to do the job well. Using the right tool for the right job is important not just in AI but overall. Yet Seibert, from Digits, has seen too many people expecting large language models to do math when that's not really their strong suit. He noted that Digits deliberately does not use LLMs to do any of the actual accounting work, relying instead on deterministic models and calculators to do the math, the results of which can then be communicated via LLM.
"When you ask it a question we don't want it to make up an answer. You can, in Digits, ask a question about finances. It can be: 'How much did we spend on marketing this year versus last year?' Hand that to ChatGPT and it will literally make up an answer, and the math may look right but may be subtly wrong since it's not really doing math. … We have to go to extreme lengths to prevent our models from doing math. This is a common failure place. A lot of companies are not treating it that seriously. There will be subtle issues in the predictions because the models are hallucinating," he said.
Sometimes the right model does not remain the right model. Especially when someone licenses an AI model versus building their own, the company that controls it might make a new version or patch a current one. This can lead to models having wildly different behaviors even from small changes in their code.
"You see it occasionally, where you've designed some kind of process with the API where the old model is not as good as the new one but they didn't update to the newer model. Or the model changes and they don't go back and test it," said Seibert. He added that it is not necessarily true that the most recent version is the best for a particular purpose. "People always expect 5 to be better than 4, but that's not how generative AI works. It might work better at 95% [of things], but that 5% catches you."
Montgomery also talked about this risk, adding that often vendors won't even provide notice that they're changing the behavior of their AI model.
"Contracts with vendors should have a certain level of transparency and notice on what they're doing, and you need a right to audit the AI's behavior based on the way it is stated it is going to work. … Firms have to assess not only their own AI solutions or models, but also the controls of every provider that they depend on," she said.
This speaks to the fact that even if a firm's own controls are strong, there is still the matter of third parties. This is especially the case with AI agents, which can act semi-autonomously. The issue with agents interacting with agents is that, by definition, a human is not involved. If every AI agent is using accurate information in the proper context, this is not such a big deal. But if an agent is acting on bad information, this can create a cascade that spreads through an entire system.
"As AI digital workers interact, there's a growing risk of error propagation: One AI system can accept and reinforce another one's incorrect output, and that is not good. … Without human validation, misinformation can spread faster than any individual could ever have done, so organizations have to design interaction protocols where AIs do not self-validate," said Montgomery. "That's where I think we are. Self-validation between AIs is not a good idea. Any system-to-system exchange should include human confirmation or independent verification logic. Something like that has to happen. Human oversight being the last line of defense ensures accountability even when machines are collaborating."
With all this in mind, Montgomery said firm leaders need to design their AI control structure with the same rigor as they would any internal control.
"Every model should have a named owner, a review schedule, an audit trail," she said. "Accountability had to be specific, not collective. … I think the human relationship defines accountability for both the human side and the machine side. It's easy to write really beautiful code. It's just that the AI is going to execute exactly as it's designed, so if you don't have that validation in place, you're in trouble."





