Key Insights
- Many AI platforms layer chat or copilots onto legacy workflow software. AI-native platforms are built so AI can execute engagement work, not just help with it.
- SOC 2 Type 2 is the security baseline most firms know. ISO 42001 is the newer AI governance standard. Large firms expect both.
- Only 13% of CPA firms have moved AI into engagements, despite 65% being open to it. The gap is implementation, not enthusiasm.
A managing partner walks out of a strategy offsite with a deck full of AI ambition, a shortlist of vendors, and three pilots greenlit for the next quarter. Six months later, two of the pilots have stalled in procurement, the third is running in a sandbox no one touches during busy season, and the engagement workflow looks exactly like it did before. This article covers six criteria for evaluating AI platforms, how to weigh them based on where your firm is in its AI adoption, and what to look for in a demo.
1. Agent-native architecture for engagement work
The first question to ask is whether the platform was built around AI from the start, or whether AI is a feature layer on top of legacy workflow software. AI-native means the former: the data model, the user interface, and the underlying workflows are structured so AI can act on them, with clean inputs, traceable outputs, and review checkpoints built in. In practice, that means the platform can evolve as AI capabilities advance, rather than needing to be re-platformed each time the technology shifts. Fieldguide is built this way, with two layers of AI working together: AI Assist (chat, column-level actions, and copilots the practitioner triggers and reviews) and an Agent Workforce of Field Agents that execute multi-step procedures end-to-end and hand the result back for human judgment. Most platforms include some form of the first layer. The harder question is whether the second is actually present, or whether existing features have been rebranded as agents.
What agent-native looks like in practice
In an agent-native platform, engagement and methodology context are built in. The agent knows what framework you're testing against, what evidence has been uploaded and validated, what is still outstanding, and where you are in the engagement lifecycle. Agents operate inside the workflow, not alongside it in a separate window, and can adjust across situations that previously required human direction at every step.
Red flags in a demo
Red flags include AI features that require switching tools, copilot branding without an underlying execution model, and demos where the "agent" is really just a chat sidebar pointed at a workpaper. Bolted-on AI rarely delivers what firms are actually looking for, because it does not change the underlying process architecture. The work still gets done the same way; the AI just has a new window to live in.
2. Audit and advisory depth
Audit and advisory firms have specific workflows, framework requirements, and review patterns that generic compliance or project management tools weren't built to handle. Pre-built frameworks (SOC 2, PCI DSS, HITRUST, ISO 27001, financial audit), agents trained on practitioner workflows, and a founding team with engagement experience are signals that the platform understands how the work actually moves.
The platform needs to fit your firm's methodology and support the frameworks you use. Does the platform absorb your firm's methodology, or does your firm reshape its methodology to fit the platform? That is the difference between a tool that accelerates how you already work and one that forces you into someone else's process.
Red flags here are easier to spot once you know what to look for: flexible-for-any-industry positioning, demos that show generic task management rather than actual engagement execution, and no named audit and advisory customers. Adoption density in one vertical is also a signal of depth.
3. Security, privacy, and AI governance
Audit data is sensitive in specific ways: client financials, working papers, PII, and audit conclusions that have legal and regulatory weight. The evaluation should cover not just whether the platform is secure, but what specifically it does with the data once it's there.
SOC 2 Type 2 attestation is the baseline. The scope to ask about is which Trust Services Criteria the report covers. Security is mandatory in every SOC 2, but the more thorough reports also cover availability (uptime and continuity) and confidentiality (controls on who can see what). Beyond the report itself, the questions worth asking are where client data is stored, how long it is retained, whether it is segregated from other customers, and who at the vendor can access it.
Ask the vendor in writing what happens to your data when AI runs on it. The DPA should state that your data won't be used to train models, including the third-party LLMs the vendor sends data to. Without that clause, your firm's working papers and client information could end up improving a model your competitors will use next year.
The newer bar: ISO 42001 certification
SOC 2 Type 2 covers controls under the Trust Services Criteria, but AI governance is a different question. ISO/IEC 42001 covers AI governance at the management-system level: how an organization assesses AI risk, sets policy, and improves over time. It uses the same management-system structure as ISO 27001 but applies to AI rather than information security. Large firms are pursuing ISO 42001 as part of their AI governance approach.
Fieldguide holds ISO 42001 certification, AIUC-1 certification, and SOC 2 Type 2 attestation, among the first audit and advisory platforms to achieve ISO 42001. Where a platform sits on certification (already there, in progress, or not on the roadmap at all) is one of the clearer signals of how seriously a vendor is taking AI governance.
4. Auditability of the AI itself
If you cannot see how the AI reached its conclusion, it becomes much harder to rely on the output in a review or sign-off context. The standard for sufficient appropriate audit evidence does not relax because the work was AI-assisted, so the platform has to make the reasoning inspectable.
In practice, every agent run should produce a trace: what went in, what came out, and the reasoning in between. Citations back to source documents are what let you defend the work in a review. Fieldguide's Agent Workforce is built around this expectation: agent executed, human reviewed, with every run producing a record that holds up under inspection.
Even with a clear trace, the human side of the equation matters. People can favor AI suggestions over their own judgment and ignore contradictory information. The PCAOB has flagged this as automation bias in AI-assisted audit work. A well-designed platform puts review checkpoints where they matter.
Red flags include black-box AI with no proof-of-work artifact, absent review checkpoints, or a vendor unable to walk through how an agent reached a specific conclusion.
5. Your data, your methodology, your IP
Agents that know your firm's methodology, templates, and prior-year work produce better results than agents working from generic training alone. The first engagement is where that difference shows up.
The flip side: once you've given a vendor your firm's IP, you need to know it stays yours. Ask in writing how customer data is segregated, what controls govern internal employee access, and whether your methodology or workpapers contribute to improvements that benefit other customers. Vague answers are a real warning sign.
6. Implementation, support, and customer outcomes
The gap between AI ambition and AI in production is wide. The 2025 AICPA MAP Survey found only 13% of CPA firms have successfully implemented AI despite 65% reporting a proactive mindset. The top barrier was lack of time to explore or implement. Limited budget and ROI concerns came up as well, alongside several other barriers. Only 5% of firms had formal training programs in place.
Those numbers point to what to look for in a platform partner:
- Structured onboarding through programs like Fieldguide Accelerator
- Role-based training that meets staff, managers, and partners where they are
- Change management support, not just product training
- A named customer success contact who knows your engagements
The last one matters more than firms expect. A named customer success contact who knows your engagements is a different kind of partner than a generic help desk. Customer outcomes from comparable firms (similar in size, practice mix, and maturity stage) are the strongest proof points to ask for, alongside a clear view of the onboarding timeline and what "live" actually means.
Build on the right foundation
Together, the six criteria separate AI-native platforms built for engagement work from vendors layering AI features onto existing tools. They're a practical screen for narrowing the field:
- Architecture built for AI from the start
- Audit and advisory depth, not generic compliance
- Security, privacy, and AI governance
- Auditability of the AI itself
- Protection for firm data, methodology, and IP
- Implementation support and customer outcomes
Firms that evaluate partners against the full set tend to move from pilot to production faster, with fewer expensive course corrections later.
Fieldguide for audit and advisory firms
Fieldguide is an end-to-end AI-native platform purpose-built for audit and advisory firms. It holds ISO 42001 certification, AIUC-1 certification, and SOC 2 Type 2 attestation, with a full Agent Workforce that produces a reviewable trace on every run. The platform is built around firm methodology, with audit-grade evidence trails and onboarding support that gets firms from contract to first engagement in weeks. Practitioners review every agent output before it moves forward. Request a demo to see how the six criteria play out in practice.