PwC recently published a set of
It is a future worth working toward. And it's likely closer than most organizations realize.
However, there is a step that most AI-for-tax conversation skips over. A step that sits upstream of every forecast, every risk model and every automated evidence generation workflow. A step that, if you miss it, means your AI strategy will stall — not because the AI failed, but because it lacked the needed data.
That step is document processing.
The data is there. It's just locked
Tax functions do not suffer from an information shortage. Instead, they suffer from an inability to effectively use the information they already have.
Financial statements arrive as scanned PDFs. Intercompany invoices come from counterparties in dozens of jurisdictions, each with its own format. Transfer pricing documentation requires pulling data from ERPs that do not talk to each other. Prior-period tax records live in filing systems that were never designed for machine-readable access.
PwC puts it plainly: Tax-relevant data is often messy and scattered across ERPs and other sources — a bottleneck that AI can potentially alleviate. What they're describing though is not primarily an AI problem. It is a document problem. And document problems fundamentally require document solutions.
What happens when you skip this step
Organizations that deploy tax AI without addressing the document layer quickly encounter the same problem. The AI performs well on the clean, structured inputs it was trained on. Then it encounters real documents — the third-generation scan of a faxed invoice, a handwritten form from an international affiliate, the Excel export with merged cells and non-standard headers — and the results become dramatically unreliable.
More importantly: the AI fails silently. It returns an output with no indication that confidence in it is low. That flawed output enters the workflow. It gets reviewed by a tax professional who may not think to question it. And the error propagates.
In regulated environments — which tax functions always are — that propagation is not just an accuracy problem. It is a compliance problem.
Architecture that actually works
AI alone is not sufficient for mission-critical document data extraction. What's required is a governed system that uses AI as a component — one capable layer within a broader architecture that includes validation, confidence scoring, exception routing and a complete audit trail.
In practice, what should this look like:
- A document arrives — in any format, from any source.
- A probabilistic engine extracts the data, assigning a confidence score to every field.
- The Deterministic Governor validates each extraction against client-defined business rules.
- High-confidence data flows automatically to downstream systems.
- Low-confidence extractions are routed to a human reviewer — with context, not just a flag.
- Every decision is logged, creating an audit trail that is already built when compliance requires it.
The result is what PwC describes as table stakes for the next generation of tax functions: predictive controls, dynamic risk sensing and automated evidence generation. However, those capabilities only exist because the underlying data is trustworthy.
What this means for tax teams
PwC's predictions describe a shift in what tax professionals do: less time on routine data work, more time delivering strategic analysis. New roles like tax data lead and model governance lead. AI-as-a-service models that give tax functions access to tested tools with built-in governance.
Solutions exist that can be applied to the document layer. These solutions need to be turnkey, managed and designed for regulated environments, while being fully integrable with existing systems.
These solutions deliver measurable ROI — particularly for tax functions that may process upwards of 100,000 documents annually.
To achieve the ambitious tax future that PwC optimistically highlighted, stakeholders must first turn their attention thoughtfully and deliberately to the underlying document challenge.







