Design & Analyze A/B Tests
Design rigorous A/B tests, run them correctly, and analyze results for clear decisions.
Overview
A complete A/B testing playbook from hypothesis to decision. Covers test design, sample size, implementation, monitoring, statistical analysis, and making decisions. Ensures your experiments are valid and actionable.
Prerequisites
- Clear hypothesis to test
- A/B testing infrastructure in place
- Sufficient traffic for statistical power
- Metrics tracking set up
Steps
Form Your Hypothesis
1-2 hoursCreate a clear, testable hypothesis with expected outcome.
Prompts to use:
Deliverables:
- •Hypothesis statement
- •Expected outcome and direction
- •Primary metric to move
- •Rationale for the change
Tips:
- •Use format: "If we [change], then [metric] will [direction] because [reason]"
- •Be specific about expected magnitude
- •One hypothesis per test
- •Base hypothesis on research or data, not just opinion
Design the Test
2-3 hoursDefine control, treatment, metrics, and test parameters.
Prompts to use:
Deliverables:
- •Control and treatment definitions
- •Primary and secondary metrics
- •Guardrail metrics
- •Target population
- •Exclusion criteria
Tips:
- •Change one variable at a time
- •Define primary metric upfront (don't change mid-test)
- •Include guardrail metrics to catch negative effects
- •Document what exactly differs between variants
Calculate Sample Size
1-2 hoursDetermine how many users and how long to run the test.
Prompts to use:
Deliverables:
- •Required sample size
- •Expected test duration
- •Statistical power (typically 80%)
- •Significance level (typically 95%)
- •Minimum detectable effect
Tips:
- •Use a sample size calculator (Evan Miller, Optimizely)
- •Plan for at least 1 full week to capture weekly patterns
- •Don't peek and stop early when you see significance
- •Account for your baseline conversion rate
Implement the Test
1-3 daysBuild variants and set up the experiment infrastructure.
Deliverables:
- •Variants implemented
- •Tracking verified
- •Randomization working
- •QA completed
Tips:
- •Verify tracking fires correctly for both variants
- •Check randomization is truly random
- •QA both variants thoroughly
- •Test on multiple devices and browsers
Launch & Monitor
1-4 weeksStart the test and monitor for issues.
Deliverables:
- •Test launched
- •Daily monitoring in place
- •Sample ratio check
- •No major issues detected
Tips:
- •Check sample ratio mismatch (should be close to 50/50)
- •Monitor guardrail metrics for red flags
- •Don't peek at primary metric results
- •Have a plan to stop if something breaks
Analyze Results
2-4 hoursConduct statistical analysis when test reaches sample size.
Deliverables:
- •Statistical significance assessment
- •Effect size and confidence interval
- •Segment analysis
- •Guardrail metric results
Tips:
- •Wait for full sample size before analyzing
- •Report confidence intervals, not just p-values
- •Check for novelty effects (early vs late results)
- •Segment results to understand who was affected
Make a Decision
1-2 hoursDecide whether to ship, iterate, or abandon based on results.
Prompts to use:
Deliverables:
- •Ship / Don't ship decision
- •Rationale documented
- •Learning captured
- •Next steps defined
Tips:
- •Statistically significant ≠ practically significant
- •Consider effect size, not just significance
- •Check guardrail metrics before shipping
- •Document decision rationale for future reference
Document Learnings
1-2 hoursArchive results and share learnings with the team.
Prompts to use:
Deliverables:
- •Test documentation
- •Results summary
- •Learnings for future tests
- •Shared with team
Tips:
- •Document regardless of outcome (negative results are valuable)
- •Include what you'd do differently
- •Share learnings broadly
- •Build organizational knowledge base