Related posts
See all
We hear it on every demo call. A partner has fed a SOC control workpaper into ChatGPT or Claude, watched it produce a tidy chain of reasoning, and asked the obvious question: if a twenty-dollar-a-month chatbot can read this document and write a coherent conclusion, what exactly am I paying an audit-AI vendor for?
It is the right question. This post is the answer, with the mechanism and an illustrative example on the table.
Here's the short version: ChatGPT and Claude can read your evidence. They cannot read your firm. The difference shows up in how an agent walks through a control, where it draws the line on sufficiency, and which exceptions it raises versus dismisses. Agent Knowledge is the part of Fieldguide that closes that gap.
An audit AI has two components, and only one of them is durable.
The first is capabilities, what the agent can do: read a PDF, parse a screenshot, pull dates out of a policy, write structured reasoning. These come from the underlying model. As our engineering vision puts it bluntly: "Capabilities are a commoditizing moat. A new YC startup tomorrow can build these." Every frontier lab is racing here. Any vendor's advantage on raw capability, whether context window, citation discipline, or tool use, erodes within months because the labs ship.
The second is context, what the agent knows about your firm: your prior-year tests, your senior auditor's standard for what counts as a deviation. The way your team handled a logical-access control last year when the policy was clear but one domain's settings were partially blank. The discipline your reviewers expect when an auditor walks bullet-by-bullet through a multi-system control. None of this lives in a public document; all of it lives in your historical workpapers.
“Context, firm-specific audit methodology encoded as agent context, is the durable moat.” That is the spine of how we build. Today, that context comes from your prior-year engagements. Over time, Agent Knowledge will expand to incorporate your firm’s proprietary methodology and procedures, regulatory guidance, and insights from your team’s corrections and feedback on agent output.
Consider a SOC 2 audit team testing whether a client's password policy is actually enforced in their Windows environment. Call the client Acme Corp. The team gives the agent ten-plus screenshots, domain controller settings across half a dozen production and non-production domains, and the firm's written Control Standards, which spell out the requirements: minimum length, complexity, history, max age, and account lockout after a small number of failed attempts within a short window.
Most settings match policy across most domains. But on one screenshot, a standard-user policy for one of the older domains (call it Domain A), the account lockout fields are empty. Not "set to a weak value." Just blank.
A generalist agent looking at the same evidence sees ninety-five percent of the configuration looking healthy and calls it. On evidence like this, a generalist model reasons something along the lines of: one of the standard-user domain screenshots does not explicitly show the failed-logon-attempt threshold for lockout, although a lockout duration is configured, a minor observation, not a true exception. And it concludes: no exceptions noted, except for a minor observation about the explicit display of the lockout threshold; all other evidence supports that the password complexity requirements are implemented as defined.
It notices the blank fields, talks itself out of them, and signs off clean. The firm's own Control Standards document explicitly required lockout enforcement after a small number of invalid attempts, but the model's reasoning never re-anchors against that line.
The same agent, with Agent Knowledge enabled, does something different. Before it evaluates the evidence, it pulls three prior workpapers from the firm's own audit history where similar password-configuration tests had been performed. None of those prior tests had this particular failure. But the prior workpapers showed the agent how the firm's auditors do this kind of analysis, line-by-line comparison between the written policy and the actual configuration, with any deviation called out as an exception.
On evidence like this, an Agent-Knowledge-equipped agent reasons something like: the prior examples confirm that this test requires extracting each policy requirement, comparing it directly against the actual system configuration, and noting any deviation or lack of evidence as an exception.
That analytical discipline carries over. The agent builds its own bullet-by-bullet comparison, hits the line for lockout enforcement, compares the policy's stated threshold against the blank fields on the Domain A standard-user policy, and concludes: the standard-user policy on Domain A does not have lockout enforced, the relevant fields are empty, which is a deviation from the documented policy. Exception noted.
The human auditor on this engagement had reached the same conclusion. Without Agent Knowledge, the agent missed it. With Agent Knowledge, drawing on the firm's own prior reasoning patterns rather than on a foundation model's generic priors, the agent matched the auditor.
A note on what this example is and is not. None of the three retrieved prior workpapers flagged a lockout gap, because none of their evidence had one. So this is not the story of "the agent remembered last year's verdict on this exact pattern." It is the more interesting story: the prior examples taught the agent the shape of the analysis, extract every requirement, compare it line-by-line to each system's actual configuration, flag any miss, and the agent then applied that discipline to evidence it had never seen before. Method transfer, not verdict transfer. That distinction matters when an auditor signs the workpaper.
Agent Knowledge is the system inside Fieldguide that injects your firm’s prior-year work into Testing Agent at runtime, before the agent reasons over new evidence. The mechanism is plain.
When the agent picks up a control on a current-year engagement, it first retrieves the most similar controls your auditors have tested previously, same control objective, similar evidence shape, ideally the same client. The retrieval is two-level: a similarity search narrows the candidate set; a language-model rerank chooses the best matches. Those prior controls, the actual workpaper, the auditor's reasoning, the conclusion, the evidence accepted as sufficient, are placed into the agent's working context. Only then does the agent walk through the current-year evidence. The auditor does not do anything differently. They open the workpaper, kick off the test, and the agent silently absorbs how this firm tested this control before it produces a word.
Agent Knowledge is firm-specific by design. We build a separate corpus per firm, client, and team, so the agent sees only the prior work your team actually owns.
The temptation is to say "I'll just paste last year's workpaper into Claude and ask the same question." It is technically possible. It is also a different job, and a smaller one than what Agent Knowledge does.
A general-purpose chatbot does not live inside your audit workflow. It does not have access to your prior workpapers. It cannot. That is not a capability gap a future model release will close. It is a security and methodology boundary that defines what a general-purpose tool is. Your prior workpapers contain client evidence, partner annotations, sign-off chains, and engagement-period scoping that your firm does not (and should not) hand to a public chatbot.
Even if you tried to bridge that boundary by hand, pasting last year's relevant workpaper into a Claude session for every control, you would be doing the retrieval-and-context-assembly job that Agent Knowledge does automatically at the moment the work happens. You would have to know which prior-year control is the right analog for the current-year control, locate and extract its workpaper text, position it correctly relative to the current evidence, and repeat that for every single control, every single engagement, for every single auditor on the team. That is not AI assistance. That is a person doing retrieval.
And even if a partner did all that by hand, a general-purpose model still would not know how your firm weights evidence, organizes deviation flags, or escalates close calls. Those patterns are not in the workpaper itself. They are in how your team has applied the workpaper across hundreds of engagements. Agent Knowledge is not a smarter model. It is the model plus your firm's institutional memory, retrieved and applied at the moment of reasoning. That combination is structurally unavailable to anyone whose institutional memory we do not hold.
In early deployments, firms using Agent Knowledge have seen testing time per control drop by roughly seventy-five percent, not because the agent skips steps, but because it arrives at the control already knowing how your firm does the work. The retrieval, the context assembly, the line-by-line comparison scaffold: all of that happens before the auditor opens the workpaper. What used to be setup time becomes agent time.
Capabilities commoditize. Every frontier lab ships them. By next quarter, the model under your favorite chatbot will be better at reading a PDF and writing a paragraph than the model under it today. We use those same models. We benefit from every release.
What does not commoditize is the body of audit work your firm has produced, every prior-year test, every reviewer's deviation call, every partner's sign-off pattern. That work is yours. Agent Knowledge is how you put it to use on the next engagement, without anyone on your team doing anything differently than they already do.
The agent handles the procedure. Your auditors own the conclusion. Your firm’s best work becomes the way your firm’s next engagement gets tested. And it compounds: when your reviewers accept or correct an agent’s output, that feedback refines future runs, so the agent’s judgment sharpens with every interaction.
A small caveat for the curious reader: Agent Knowledge is a powerful lever, not a replacement for partner judgment. We treat every agent conclusion as a draft for a reviewer, not as a sign-off.
If you would like to see Agent Knowledge running on a workpaper from your own engagement portfolio, talk to your Fieldguide customer support representative. We will run a paired comparison on controls your team has already concluded on, and you can see for yourself whether the agent walks the work the way your firm does.