Skip to main content

Key Insights:

  • Mid-market firms mostly fail inspections at the review step. Manual review spreads attention evenly instead of aiming it at the riskiest work.
  • When AI drafts the work, the reviewer stops redoing it and starts judging it: where AI was used, whether its conclusions hold, and whether the team trusts it too readily.
  • The workpaper now has to explain itself. A file showing how AI was used and checked holds up under regulators; one that doesn't, won't.

Before any audit report leaves the firm, someone other than the engagement team has to sign off that the work holds up. That second-look step, quality review, is where a firm catches the gaps its inspectors and the PCAOB would otherwise catch for it. It's also where partners spend their fall renaming workpapers, chasing missing tickmarks, and reconciling what the team filed against what the methodology actually requires. Two forces are reshaping that job at once: regulators are raising the bar on what a credible quality system looks like, and AI is changing what reviewers actually look at. This article covers what the latest inspection record reveals about mid-market firms, and how the reviewer's role changes once agents take on the initial execution.

Why Manual Quality Review Breaks Down

A reviewer working the old way re-performs the engagement by hand: re-reading workpapers, re-checking tickmarks, re-tracing conclusions back to evidence. It is slow, it is uneven, and it scales badly. The reviewer who has six engagements closing the same week cannot give each one the same depth, so scrutiny ends up spread thin instead of aimed where the risk actually sits. That's how a deficiency slips through: not because no one looked, but because the looking was rationed.

The inspection record bears this out. The eight annually inspected non-Big-Four firms were flagged on 52% of the engagements the PCAOB reviewed in 2024, and a deficiency caught in inspection is usually one the firm's own review step should have caught first. Fraud consideration and ICFR testing keep topping the list of recurring findings, which are exactly the judgment-heavy areas a rushed review handles worst. More hours would help at the margin, but the deeper problem is the method: hand re-performance does not tell the reviewer where to push hardest.

A Weak Review Step Is Now a Firm-Level Risk

Every audit report has to clear an independent reviewer before it goes out. What's new is that the PCAOB no longer treats a weak review as a one-off. When the same review-step problems recur across a firm's work, regulators read them as evidence the firm's quality system itself isn't working, and recent enforcement actions follow that logic: the board has sanctioned firms for review failures that ran across many engagements. A sign-off treated as a formality used to be one engagement's problem. Now it can put the whole firm on the hook.

The new quality management standards put this in writing. The AICPA's quality management standards are in force now, and the PCAOB's QC 1000 lands at the end of 2026; both grade the firm's whole quality system, not just individual files. That shift shows up in two things regulators now look for beyond the workpapers.

The first is root cause analysis: looking back at a deficiency and asking whether the methodology was wrong or the team just didn't follow it. In practice, most deficiencies trace back to the second. The second is culture. The PCAOB's Culture Spotlight found that firms whose partners largely grew up inside the firm had the lowest deficiency rates, and firms with the least internal continuity had the highest. The PCAOB isn't grading work-life balance or staff tenure directly, but it is now asking whether a firm's culture supports quality, which is a harder thing to document than a workpaper.

Where Technology Fits in the Quality Review Picture

Agentic AI changes how work gets documented, reviewed, and challenged. Instead of the reviewer re-performing the engagement by hand, agents do the first pass, and the reviewer's time goes to judging the output. It only works if the tools make the work legible: if the output carries its own inputs, steps, and sources into a file a human can challenge directly. That's the line between AI that survives review and AI that becomes the over-reliance risk regulators are warning about. It's also the line Fieldguide is built on.

The model is simple: an Agent Workforce where Field Agents execute and practitioners review. Practitioners and Field Agents on every engagement: agent executed, human reviewed. In practice, that breaks down into four things.

  • Agents handle the execution. Field Agents take on the parts of an engagement that eat the most time: drafting, validating client-submitted evidence, running testing, and preparing work for review.
  • Humans stay in charge of judgment. Every agent output flows into a review step. The reviewer decides what holds and what goes back.
  • One platform, one record. Scoping, evidence, testing, and review live in one place, which is what makes the documentation and governance regulators are asking about actually possible to produce.
  • Built for audit-grade governance. Fieldguide holds ISO 42001 and AIUC-1 certifications for its AI management system, built on the same human-oversight and documentation discipline the PCAOB and AICPA are asking firms to demonstrate.

Put together, that is a system where the work is fast but never opaque, and where the reviewer is set up to do the one thing the technology can't: judge it.

Redefining the Reviewer's Job in an AI World

This is the part of the operating model that lands on a real person. When agents do the first pass, the reviewer stops being the second pair of hands on the work and becomes the first real judgment applied to it.

From re-performer to risk triage lead

The reviewer's expertise stops getting spent on mechanical checking and goes where it's worth the most: the judgment calls. The question shifts from "did the numbers tie" to "is this the right conclusion, and does the evidence behind it hold."

The work concentrates where the risk is. Instead of spreading the same scrutiny evenly across the file, the reviewer goes deep on the high-stakes calls: the revenue recognition judgment, the going-concern question, the estimate with the widest range of reasonable outcomes. 

On those, they pressure-test the reasoning the way a senior always has, but now with the agent's full work in front of them: the inputs it used, the steps it took, the source behind each result. A conclusion that reconciles to its support gets confirmed. One that doesn't gets sent back. This is the same skepticism the PCAOB has long expected when auditors weigh evidence, only now the evidence shows its own work, which makes the judgment easier to reach, not harder.

Documenting AI use as part of quality review

Good documentation is what makes that judgment possible in the first place. A workpaper that shows how a conclusion was reached, what went into it, what came out, and where the reviewer confirmed or overrode it, is a file the reviewer can actually interrogate instead of taking on faith. The same record is what makes the work defensible later, to an inspector or a peer reviewer asking the same questions the reviewer already asked.

This is where AI that shows its work pays off twice. Every Field Agent run in Fieldguide produces a Trace: a record of the inputs it used, the steps it took, and the sources behind each result, captured as the agent runs. The documentation isn't a separate chore the team writes up after the fact; it's a byproduct of how the work got done. That matters under the new quality management standards, which expect a firm to stand behind the technology in its workflow and to get the right information to the people making the calls. A thin, undocumented AI workpaper meets neither expectation. One that explains itself meets both, and the reviewer who insists on it is doing quality-system work, not just clearing an engagement.

Build Your Quality Review Process on a Single Platform

The quality review pressure firms face this year is not a tooling problem. It's an operating model problem, and disconnected systems and manual handoffs are what make new standards and enforcement trends expensive to absorb.

Fieldguide is the industry's only end-to-end AI-native platform, purpose-built for audit and advisory. With document management and workflow visibility built around how engagements actually run, firms get one place to manage review activity from scoping through reporting, with agent execution and human review built into every step.

Half of the Top 100 US CPA firms, including members of the Big Four, already use Fieldguide. Request a demo to see how Fieldguide fits your quality review workflow.

Amanda Waldmann

Amanda Waldmann

Increasing trust with AI for audit and advisory firms.

fg-gradient-light