Your A/B test hit 95% confidence. The variant is up 12%. The team is celebrating. You ship — and conversions stay flat.
This is Sample Ratio Mismatch (SRM), and it affects more experiments than anyone wants to admit. Ronny Kohavi's large-scale study at Microsoft found SRM in roughly 10% of controlled experiments on the Bing platform alone. In smaller organisations with less mature infrastructure, the rate is higher.
What is Sample Ratio Mismatch?
An experiment configured for a 50/50 split should send exactly half its traffic to control and half to variant. SRM occurs when the observed ratio diverges significantly from that expectation.
A 50/50 experiment that ends up 48/52 might not sound alarming, but it is. You cannot trust any metric from that experiment because something in your assignment pipeline affected the two groups differently — and whatever caused the imbalance almost certainly also affected your conversion metric.
SRM is not a statistical quirk you can correct for. It indicates a flaw in the experiment itself. The only correct response is to investigate, fix, and re-run.
The most common causes
SRM arises wherever traffic touches your infrastructure between assignment and observation. The usual suspects:
- Bot filtering applied only to one variant (bots get assigned, then scrubbed post-hoc from variant but not control, or vice versa)
- Redirect chains adding latency to one variant, causing browsers to time out and drop the session entirely
- Client-side assignment firing after page load — users who leave before the JS executes are counted in the denominator but not the numerator
- Cache layers serving stale control pages to users already assigned to variant
- A/A test contamination — users previously in an overlapping experiment inherit a biased assignment
- Logging bugs where one variant's event stream is sampled at a different rate
How to detect it
The detection is a straightforward chi-squared test against the expected split. For a 50/50 experiment:
// Expected vs observed traffic
const expected_control = total_users * 0.5;
const expected_variant = total_users * 0.5;
const chi_sq =
Math.pow(observed_control - expected_control, 2) / expected_control +
Math.pow(observed_variant - expected_variant, 2) / expected_variant;
// chi_sq > 3.84 → p < 0.05 → SRM detected (1 degree of freedom)
const srm_detected = chi_sq > 3.84;ACO runs this check automatically every 15 minutes during a live experiment and surfaces an `invalid_srm` verdict before any conclusion is drawn. The experiment is flagged, not concluded.
Fixing SRM in practice
Once you detect SRM, work backwards through your assignment pipeline:
- Check your logs for drop-off between assignment events and the first page-view event for each variant
- Run a simple ratio test per hour — if the drift is worse at specific times, look at deployment events, cache purges, or bot spikes
- Compare bot / crawler sessions between groups — if one group has 3× the crawler rate, bot filtering is the culprit
- Audit redirect chains — a 301 on variant and a 200 on control will cause differential abandonment
- If you use server-side assignment, confirm the experiment definition was not deployed gradually (rolling deploys contaminate early data)
Why tooling matters
Most visual A/B testing tools do not check for SRM by default. They show you a confidence interval and call it done. This is dangerous.
Proper experimentation infrastructure validates the experiment before it interprets the experiment. SRM detection should gate every result — if the ratio is off, no metric should be presented as conclusive.
This is why ACO runs the chi-squared SRM check as a first-class quality gate in every evaluation cycle, before computing lift, z-scores, or Bayesian posteriors. Bad data in, bad decision out, regardless of how sophisticated the statistics on top are.
Key takeaways
- 10% of A/B tests have SRM significant enough to invalidate results
- SRM indicates a broken pipeline — you cannot correct for it after the fact
- Chi-squared against your expected split ratio is sufficient for detection
- Any good experimentation platform should surface SRM before showing a conclusion
- The most common cause is differential data loss between control and variant, not a fluke in your random number generator