Experimentation Guide

Context Slice

Experiment Design Framework

Elements of a Good Experiment

Clear hypothesis — What do you believe and why?
Measurable outcome — What metric will change?
Defined segments — Who sees what?
Statistical rigor — Sample size and duration
Decision criteria — What results mean what action?

Hypothesis Formula

Template: "If we [change], then [metric] will [direction] by [amount], because [rationale]."

Example: "If we add progress indicators to onboarding, then completion rate will increase by 15%, because users will have clearer expectations and motivation to continue."

Experiment Design Template

# A/B Test: [Test Name]

## Overview
**Owner:** [Who's responsible]
**Status:** [Proposed / Running / Completed]
**Duration:** [Start date] to [End date]

## Hypothesis
If we [specific change], then [metric] will [increase/decrease] by [expected amount], because [reasoning based on user behavior or prior evidence].

## Changes Being Tested

### Control (A)
[Description of current experience]

### Variant (B)
[Description of changed experience]

[Include mockups or screenshots if available]

## Success Metrics

### Primary Metric
**Metric:** [Name]
**Current baseline:** [Value]
**Target:** [Value or % change]
**Why this metric:** [Connection to hypothesis]

### Secondary Metrics
| Metric | Baseline | Watch for |
|--------|----------|-----------|
| [Metric 1] | [Value] | [Expected direction] |
| [Metric 2] | [Value] | [Expected direction] |

### Guardrail Metrics
Metrics that should NOT change negatively:
- [Metric] — Acceptable range: [Range]
- [Metric] — Acceptable range: [Range]

## Experiment Setup

### Traffic Allocation
- Control: [X%]
- Variant: [X%]

### User Segments
**Included:** [Who's in the experiment]
**Excluded:** [Who's excluded and why]

### Sample Size & Duration
- **Minimum sample size:** [N per variant]
- **Estimated duration:** [Days/weeks to reach significance]
- **Statistical significance threshold:** [Usually 95%]

## Decision Framework

| Result | Action |
|--------|--------|
| Variant wins significantly | Ship to 100% |
| Variant wins marginally | Consider extending test or iterating |
| No significant difference | Evaluate cost; may ship simpler version |
| Control wins | Don't ship; analyze why hypothesis was wrong |
| Guardrails violated | Stop test, investigate |

## Risks & Mitigations

| Risk | Likelihood | Mitigation |
|------|------------|------------|
| [Risk 1] | [H/M/L] | [How to address] |
| [Risk 2] | [H/M/L] | [How to address] |

## Post-Test Analysis Plan
- [What additional analysis we'll do]
- [Segments to investigate]
- [Follow-up experiments to consider]

Multi-Variant Tests

When testing more than one variant:

## Variants

### Control (A): [Name]
[Description]

### Variant B: [Name]
[Description]

### Variant C: [Name]
[Description]

## Traffic Split
- Control (A): [X%]
- Variant B: [X%]
- Variant C: [X%]

## Comparison Plan
- Compare B vs A (primary comparison)
- Compare C vs A
- Compare B vs C (if both beat control)

Experimentation Tips

Before Running

Validate tracking — Can you actually measure what you need?
Check for conflicts — Other tests running on same users?
Document baseline — Know your starting point precisely
Align stakeholders — Everyone agrees on decision criteria?

While Running

Don't peek too often — Multiple looks increase false positives
Watch for bugs — Variant errors can invalidate results
Monitor guardrails — Stop if something breaks

After Running

Segment analysis — Did it work differently for different users?
Learn from losses — Failed tests teach more than wins
Document everything — Future you will thank past you

Common Pitfalls

Underpowered tests — Not enough traffic to detect real effects
Too many metrics — With enough metrics, something will be "significant"
Stopping early — That early winner might regress to mean
Ignoring segments — Average hides important differences
No baseline — Can't measure change without a starting point

After the Experiment

Once results are in, export your experiment data as CSV and use the Product Data Analyzer skill to interpret results. It can calculate statistical significance, analyze segment effects, and provide ship/no-ship recommendations based on your data.

Slice Info

Description

Framework for designing A/B tests and experiments

Tokens

1,044

Used By

Design Experiment task

Product Strategy Rules slice

slice:product.experimentation.guide