The sampling illusion
If you've run an expense audit team, you know the drill. You pull a random sample - maybe 3%, maybe 5% if you're thorough - run it through your rules engine, and report on what you find. The implicit assumption is that the sample represents the whole population.
It doesn't.
Random sampling was designed for quality control in manufacturing, where defects are randomly distributed. Expense fraud is the opposite. It's intentional, patterned, and adaptive. The people gaming the system aren't distributing their behavior randomly across reports. They're concentrating it in the places they know you're least likely to look.
Three patterns that live in the 95%
After working with organizations managing hundreds of millions in annual spend, certain patterns repeat. These aren't the obvious violations that static rules catch. They're the subtle, behavioral patterns that only surface when you review everything.
1. The slow drip
An employee submits expenses that are consistently 10-15% above peer median - never enough to trigger a threshold alert, always within "reasonable" range. Over a year, this adds up to tens of thousands in overspend. A rules engine set to flag amounts over $500 won't catch someone who submits $480 meals twice a week. But a behavioral model that tracks peer comparisons across time will surface the pattern immediately.
2. The split and stagger
A $4,800 dinner gets split across three receipts submitted on different days, each comfortably below the $2,000 approval threshold. The individual transactions look unremarkable. It's only when you connect the vendor, the dates, and the submitter across the full dataset that the pattern emerges. Sampling at 5% makes catching this a matter of pure luck.
3. The vendor loop
An employee develops a disproportionate relationship with a single vendor - say, a consulting firm or a travel agency - where a significant share of their discretionary spend is concentrated. Each individual transaction passes policy checks. But the concentration pattern, visible only across 90+ days of complete data, often signals kickbacks, preferential treatment, or undisclosed conflicts of interest.
The common thread: none of these patterns trigger Boolean rules. They don't exceed thresholds. They don't match keyword lists. They only become visible when you analyze 100% of transactions with models that understand behavioral context - not just line-item data.
Why "more rules" doesn't solve it
The instinct, when sampling falls short, is to write more rules. Flag meals over $200. Flag weekend charges. Flag international transactions. The problem is that every rule you write has two effects: it catches some real violations, and it generates a mountain of false positives that bury your team.
We've seen teams running 50+ custom rules that generate a 40% false positive rate. That means nearly half of every reviewer's day is spent clearing transactions that are completely legitimate. Meanwhile, the genuinely problematic spend - the behavioral patterns that don't match any rule - sails through untouched.
The more rules you write, the more your employees learn the boundaries. They know the thresholds. They know which categories get scrutinized. They adapt. Static rules are a snapshot of yesterday's problems applied to today's behavior. By definition, they can't catch what they weren't designed for.
What changes when you see everything
The shift from sampling to 100% coverage isn't incremental. It's a different category of insight entirely. When every transaction is analyzed in context, the conversation changes from "did we catch anything?" to "here's what's actually happening across the organization."
You stop reacting to individual violations and start understanding behavioral trends: which departments have the highest anomaly density, which policies are being stretched systematically, which vendors warrant deeper investigation. This is the difference between auditing and intelligence.
And it doesn't require more headcount. The entire premise of AI-powered expense audit software is that your existing team works on the exceptions that actually matter, instead of manually reviewing random samples and clearing false positives. One auditor with the right tools can cover what previously required a team of ten.
The bottom line
5% sampling made sense when manual review was the only option. It doesn't make sense when AI can analyze every transaction behaviorally, in real time, without adding headcount. The question isn't whether your sampling is missing things. It's how much it's missing, and what that's costing you.
If you're still sampling, you're still guessing.