Loading...
Loading...
The major GRC platforms are all repositioning around AI, and they're betting in noticeably different directions. This piece uses Rory Sutherland's explore/exploit framing to argue that the instability is mostly the category doing its job, looks at where Diligent, OneTrust, IBM and Archer are scouting, and sets out how buyers should engage without abandoning decision discipline.

Most of the GRC buyers I talk to are quietly furious with their vendors right now. The pitch changes every quarter. The AI story shipped in spring is gone by autumn. The roadmap slide moves between meetings. Procurement teams, trained to value consistency, read this as a category in disarray and pull back. Some of them are already telling their boards the AI thing was overhyped and they're going to wait it out.
I think they're misreading what they're looking at. And I want to borrow a frame from Rory Sutherland to explain why.
Sutherland, the Ogilvy behavioural scientist, talks about the explore/exploit trade-off, mostly in the context of why so many businesses are bad at growth. The shorthand version: a beehive needs two kinds of bees. Exploit bees work the patch of flowers the colony already knows about. Explore bees fly off in semi-random directions and mostly come back with nothing. Occasionally one finds a richer patch nobody else has spotted, comes home, and performs the waggle dance: a figure-eight with a vibrating middle that encodes direction and distance, telling the rest of the hive where to fly next.
Sutherland's point, is that businesses dominated by finance, procurement and management consulting (his list, not mine) systematically over-invest in exploit and starve the explore side. Efficiency wins on the spreadsheet. The scout bee, by definition, looks inefficient: she comes back empty most days. But without her, the colony optimises perfectly into a local maximum and then, when the field next door changes, dies.
GRC is in its explore phase on AI. And the wiggling that buyers are reading as weakness is, in a lot of cases, the dance.
If you've spent any time in GRC software demos in the last 12 months, you've seen the symptoms. Vendors who pitched "AI-powered control testing" in Q1 are pitching "AI-powered evidence triage" by Q3. The agentic narrative arrives, then mutates, then arrives again wearing a different lanyard. The same product appears in three different domain conversations depending on who's in the room. Roadmap slides change between meetings.
It's easy, tempting, and in some cases correct, to call this opportunism. Some of it is. There are vendors strapping a chatbot onto a legacy schema and calling it transformation, and they deserve the scepticism they're getting.
But a lot of what looks like opportunism is actually scouting. The honest version of "we pivoted our AI feature" is we shipped what we thought customers wanted, watched what they actually used, and the gap was bigger than we expected. That's not a failure of strategy. That's a scout coming home with a different bearing than the one she left with. The vendors who aren't wiggling at all right now are the ones I'd worry about, because either they've genuinely cracked it (rare) or they've decided to stop scouting (common).
I've been making the case for treating the wiggle generously, and I want to be honest that this argument has a soft spot. Not all wiggle is scouting. A meaningful slice of what's currently happening in GRC AI is noise: reactive marketing that follows whichever buzzword the last conference produced, weak product strategy dressed up as iteration, and demos engineered to look impressive on stage rather than survive contact with a customer's data. Treating that as exploration is a category error, and a buyer who can't tell the difference will burn pilot budget on vendors who never had a thesis in the first place.
So how do you tell? A few signals I've found useful:
A real scout can describe the bet they were placing, what they expected to learn, and what surprised them. Noise sounds like the market shifted and we responded. Scouting sounds like we believed X, we shipped, here's what we found out about X, here's the new bet. The presence of a falsifiable hypothesis, before and after, is the tell.
A real scout's pivots are smaller and more frequent than noise's pivots. A vendor who reframes their entire AI story between Q1 and Q3 isn't scouting. They're rebranding. Real scouting looks like the same thesis getting more specific over time, not a different thesis every quarter.
A real scout still has non-negotiables. The product still has an audit trail. The architecture is still defensible. The roadmap still has a recognisable spine. Wiggle in the surface story, stability in the underlying engineering, is a sign of a company that knows the difference between the bit it's exploring and the bit it isn't. Wiggle everywhere is a sign of a company that doesn't.
Noise is also easier to spot in what's missing than what's said. If a vendor cannot show you what their AI does when it's wrong, cannot describe the human-in-the-loop design, cannot quantify the cost of a false positive in a real customer's environment, then whatever they're shipping is not yet a product you can govern with. That isn't a wiggle-phase problem. That's a baseline problem, and it doesn't get a pass because the category is exploratory.
This is the part I want to be careful about, because I've spent the last thousand words making the case for engaging with mess. Engaging with mess is not the same as lowering your standards. The whole point of the explore/exploit frame is that they're complementary, not substitutes. A buyer who reads this article as permission to skip the hard questions has read it wrong.
The four big incumbents are all wiggling, and they're wiggling in noticeably different directions. That's not chaos. That's four scouts flying out of the same hive and reporting four different bearings.
Diligent is dancing toward an agentic, board-down GRC. The Elevate 2026 launch of AI Board Member and a "coordinated network of agents" across the platform is the most aggressive bet of the four, and the language is unambiguous: digital workforce, fewer humans, full-suite. They've put the analyst-rating trophy cabinet behind it (Leader rankings from five major firms) as evidence that the bet is paying off.
OneTrust is dancing toward real-time AI governance as the central category, with the privacy heritage as the credential. Their pitch is that point-in-time compliance is over, and the future is continuous, runtime control across the AI build stack: Bedrock, Azure Foundry, Vertex, Databricks. They're trying to become the control plane for enterprise AI itself, not just a better version of GRC.
IBM OpenPages is dancing toward open, model-agnostic infrastructure. The explicit pitch is no AI lock-in: bring your own model (OpenAI, Gemini, Claude, Microsoft, watsonx) and use MCP to let agents read and write GRC data. It's the most developer-flavoured story, and it's aimed at enterprise buyers who've already been burned once by a single-vendor AI commitment. What I'd add from a recent review of theirs is that the AI integration into the low-code/no-code build experience is more thoughtfully structured than the headline marketing suggests. The interesting work isn't the model neutrality, it's where the AI shows up inside the configuration flow itself, which is the kind of thing that doesn't demo as well as an agent launch but tends to age better.
Archer is dancing more conservatively, they've made specific AI bets, including the Compliance.ai acquisition for regulatory change and a separate push on risk quantification, but the original differentiation on the first has been eroded as the wider market caught up. What's currently visible from the outside is "AI-powered analytics" messaging on the homepage. They may well have a coherent answer internally. From where I'm sitting, it's the hardest of the four directional stories to characterise, which is itself a piece of information about the dance.
These are not the same bet. They are not even close to the same bet. One company is trying to replace headcount, one is trying to become the control plane for enterprise AI, one is wiring AI into the building blocks, and one is harder to summarise than that. A buyer trying to pick between them on a feature comparison matrix is going to produce a spreadsheet that hides the actual decision they're making, which is a directional one about where the category is heading.
This is what scouting looks like when you can see the whole hive at once. None of them is necessarily wrong. Most of them probably are, in the specific direction they're flying, and that's also fine: the value of the dance is the variance.
The bit Sutherland is careful about, and the bit I think GRC is bad at, is that the scout has to come back and share. The dance is the feedback loop. Half the value of explore is in exploit being able to act on what explore found. And the thing the four vendor pitches above have in common, despite all their differences, is that they are heavy on capability claims and analyst recognition and light on independent evidence. Nobody is showing what happens when the AI gets it wrong. Nobody is publishing accuracy benchmarks. Nobody is showing the cost of the false positives. The scouts are dancing, but a lot of the dance is "trust me."
The buyer side is wiggling too, and that's healthier than it gets credit for. GRC teams running three pilots in parallel, abandoning two, doubling down on the one that produced something useful: that's a colony allocating scouts. The teams that aren't doing this, the ones waiting for the category to settle before they engage, are going to wake up in 2027 having learned nothing about how AI changes their own work.
But here's the part that makes me uncomfortable as someone who rates this software for a living: the most interesting AI use cases I'm hearing about from GRC practitioners aren't the ones any of the four vendors above are leading with.
Vendors lead with control testing automation, evidence collection, policy generation. Reasonable, demoable, fits the existing buyer mental model.
Practitioners, when you actually get them talking, are doing things like: using LLMs to draft the intent behind a control before they ever write the control itself, because articulating intent has always been the bottleneck and nobody had time. Using AI to interrogate their own risk register for internal contradictions. Using it to translate between the language auditors want and the language engineers will actually act on. Using it as a thinking partner for the judgment calls that the GRC discipline pretends are deterministic but never were.
None of that fits cleanly into a product domain. None of it shows up well on a feature comparison matrix. Most of it isn't even a product yet. It's a workflow someone built in an afternoon with a chat interface and a copy of their own documentation.
That's the patch of flowers two valleys over. And the dance hasn't fully reached the hive.
Including mine.
Analyst frameworks (Applied Verdict's included) are built for exploit. They reward the things you can score: feature presence, evidence quality, integration depth, performance against a defined job-to-be-done. That's the right tool when a category is mature. It's how you separate the serious vendors from the marketing.
It's the wrong tool when the category is still searching for itself. A scoring rubric tuned to last year's jobs-to-be-done will systematically under-rate a vendor who's correctly bet on next year's, and over-rate a vendor who's polished an answer to a question buyers are about to stop asking. The methodology rewards the bee that's already at the known patch, not the scout mapping the next one. You can tweak the weights. You can add a column for "innovation." It doesn't fix the underlying mismatch, because the underlying mismatch is that scoring is what you do when you know what good looks like, and right now nobody does.
The version of this I keep coming back to: sometimes the intention is the thing you have to work on. Not as a soft criterion in a rubric (that road leads to "vision quadrants" and the kind of analysis that's useful to nobody) but as an admission that there are phases of a category where the most useful thing an outside observer can do is describe what's being attempted, what's being learned, and what the field hasn't figured out yet. That's a different job from rating. It might be the more honest job for this moment.
I should also be straight that this argument is convenient for me. Applied Verdict makes its money partly from the rating methodology I've just spent four paragraphs critiquing, and partly from advisory work that fits the wiggle phase better than the scoring does. When I argue the category is too unsettled for hard scoring alone, you also need interpretation, that is, transparently, an argument for a posture I happen to sell. You should weight it accordingly. The reason I'm telling you this rather than hiding it is that the alternative is pretending an analyst arguing for their own utility is a neutral observation, and that pretence is exactly the kind of thing that makes buyers right to be sceptical of analyst content in the first place.
I'm not going to pretend I've solved this. I haven't.
The honest answer is that the standard buying playbook (write a requirements doc, score vendors against it, pick the highest score) is built for a category that has stopped wiggling. GRC has not stopped wiggling. Running that playbook today gets you a vendor who's good at answering 2024's questions.
What works better in a wiggle phase looks more like this:
Start from the work, not the feature list. Before you talk to a single vendor, get specific about which judgment calls in your team's week are the actual bottleneck. Not "we need AI for control testing." Which control, which part of the testing, which moment is the one where someone smart spends two hours and produces something a junior could have produced in twenty minutes if the framing had been right. That's your use case. It will not match a product domain cleanly. Good.
Pilot to learn, not to evaluate. A pilot that confirms what you already believed is a pilot that taught you nothing. Design pilots that can surprise you, including by failing in ways that tell you the use case itself was wrong. Three small pilots that kill two of themselves are worth more than one big pilot that limps to a yes.
Hold the line on the non-negotiables. Engaging with the wiggle is not the same as relaxing your bar. Auditability is a non-negotiable. Explainability is a non-negotiable. The cost of a false positive, and what your team does when one occurs, is a non-negotiable. The audit trail when an AI agent acts on a customer's data is a non-negotiable. These are baseline questions for any GRC tool you'd put in production, and they don't get a pass because the category is exploratory. If anything, they matter more in the wiggle phase, because the vendors most likely to skip them are the ones racing to demo features they haven't yet figured out how to govern.
Interrogate the wiggle, and ask for the evidence. When a vendor's pitch has changed in nine months, that's a signal, not a disqualification. Ask what they shipped, what customers actually used, what the gap was, and what they did about it. A vendor who can answer that crisply is scouting. A vendor who pretends nothing changed is either lying or asleep. And then ask the question none of the major platforms is currently answering well: show me what happens when your AI gets it wrong. Accuracy on the happy path is table stakes. The cost of the false positives, the audit trail when the agent acts incorrectly, the human-in-the-loop design: that's where the actual product is, and right now it's mostly absent from the pitches.
Get help that's built for this phase. That's the part Applied Verdict exists for. Our domain ratings are one lens: useful, deliberately narrow, scored against the established jobs-to-be-done. The other half of the work is buyer-side: helping you figure out which of your jobs are actually the ones AI changes, which vendors are scouting in directions that match where your team is going, and which parts of the standard procurement process are quietly wasting your time in a category that hasn't settled. We also do the thing the vendor pitches mostly don't: independent testing, with evidence, including the failure modes.
If you're mid-buying-cycle on a GRC tool right now and the process feels like it's optimised for a different decade, it probably is. Talk to us before you commit. We'll tell you what the rating says, what it doesn't, and where the dance is actually pointing.
The wiggle is the work. The point is to wiggle on purpose.