One of the most common questions in A/B testing: "We have been running this for a week — can we call it?"
Usually, the answer is no. Not because a week is too short as an absolute rule, but because the right answer depends entirely on your traffic volume, your current conversion rate, and the minimum effect size you care about detecting.
The good news: you can calculate the required sample size before you start. The bad news: most teams skip this step, which is why most A/B tests produce unreliable results.
The three inputs you need
Sample size calculation for a two-sample proportions test requires three numbers:
The MDE is the most important input and the most frequently miscalibrated one. Teams often set it too low (wanting to detect a 2% relative improvement) which requires hundreds of thousands of users per variant. Set your MDE based on what would actually change a business decision.
- Baseline conversion rate (p₁): your current measured conversion rate on the control
- Minimum detectable effect (MDE): the smallest lift you actually care about — e.g., a 10% relative improvement on a 5% baseline rate means you want to detect moves from 5% to 5.5%
- Statistical power (1 - β): typically 80% or 80%. This means: if the true effect is exactly your MDE, what fraction of experiments would correctly detect it?
The sample size formula
For a two-sided test at α = 0.05 and 80% power, a reasonable approximation is:
// n = required users per variant
// p1 = baseline conversion rate
// mde = minimum detectable effect (absolute, not relative)
// e.g. if baseline is 0.05 and you want 10% relative lift, mde = 0.005
function sampleSizePerVariant(p1: number, mde: number): number {
const p2 = p1 + mde;
const p_bar = (p1 + p2) / 2;
// z-scores for α=0.05 (two-sided) and β=0.20 (80% power)
const z_alpha = 1.96;
const z_beta = 0.842;
return Math.ceil(
Math.pow(z_alpha * Math.sqrt(2 * p_bar * (1 - p_bar)) +
z_beta * Math.sqrt(p1 * (1 - p1) + p2 * (1 - p2)), 2) /
Math.pow(mde, 2)
);
}
// Example: 5% baseline, 10% relative MDE → 0.5pp absolute MDE
sampleSizePerVariant(0.05, 0.005); // ≈ 29,000 per variant → 58,000 totalTranslating to time
Once you have the required total sample, divide by your daily unique visitor count to get the minimum runtime in days.
Some important caveats:
- Always run for at least one full week to capture day-of-week effects — weekday and weekend behaviour differ significantly for most products
- If your traffic is highly seasonal (a sale event, a product launch), wait until a 'normal' traffic period to run the experiment
- Two weeks is a common default because it captures two weekday/weekend cycles — but this is a minimum floor, not a guarantee
- If your calculation says 6 months, your MDE is too small relative to your traffic — either accept that you cannot detect small effects, or increase the effect you care about
The peeking problem in practice
If you run a sample size calculation beforehand and then stop the experiment when it hits significance — even before reaching your target sample — you are peeking.
This is the most common source of inflated false positives in practitioner A/B testing. At a 5% alpha, peeking daily and stopping early can drive the true false positive rate above 20-25%.
The fix: commit to your sample size before you start and do not interpret interim results as final. If your platform shows results in real time, use it for monitoring (detecting anomalies, SRM) — not for early stopping decisions.
Pre-register your sample size. Write it down before you launch. Do not let 'it looks significant after three days' override your pre-specification.
What to do with low-traffic sites
If your calculation says you need 100,000 users per variant and you get 5,000 visitors a month, you have a low-traffic problem, not an A/B testing problem.
In this situation:
- Raise your MDE — focus on testing changes large enough to produce large effects
- Increase conversion rate first through qualitative research (user interviews, session recordings) before quantitative testing
- Test higher in the funnel where volume is greater, not at the checkout step where you have 50 conversions per month
- Consider Bayesian approaches that can make useful decisions with less data by explicitly incorporating prior knowledge
ACO surfaces this directly as a traffic_guide in the dashboard — each plan tier shows the minimum monthly visitors needed to reliably detect improvements of a given size. There is no point running experiments you are statistically underpowered to conclude.
Quick reference
- Calculate sample size before you start — not after
- Use an MDE that matches what would actually change a decision
- Run for at least one full week regardless of sample count
- Do not stop early because results look significant
- Low traffic? Raise your MDE or test higher in the funnel
- SRM check before you interpret any results — if traffic split is off, the data is invalid