Design Experiment
Requirements
Feature or change to test. Optional: current metrics, baseline rates, daily traffic.
2
Gather experiment context from the user:
What are we testing?
- The specific change or feature
- Control experience vs variant experience
- Any mockups or screenshots available?
Hypothesis
Help them formulate: "If we [change], then [metric] will [direction] by [amount], because [rationale]."- What's the expected behavior change?
- What evidence supports this hypothesis?
Success metrics
- Primary metric (the one that decides ship/no-ship)
- Secondary metrics (supporting signals)
- Guardrail metrics (must not regress)
- Current baseline values for each
Experiment parameters
- Target user segments
- Any users to exclude?
- Available daily traffic
- Any timing constraints?
Check Product Context Profile for relevant product context.
3
Calculate sample size using the experiment sampler:
- Identify the primary metric's baseline rate (e.g., 12% conversion = 0.12)
- Determine minimum detectable effect (e.g., 15% relative lift = 0.15)
- Use standard 95% significance and 80% power unless user specifies otherwise
Run A/B Test Sample Size Calculator with:
- baselineRate: the current conversion rate
- mde: the minimum detectable effect as relative change
- significance: 0.95 (default)
- power: 0.8 (default)
- dailyTraffic: if provided, to estimate duration
Report the sample size requirements and estimated duration.
4
Build the complete experiment specification using the template from the guide:
- Overview section with owner, status, dates
- Hypothesis in the standard formula format
- Changes being tested with control and variant descriptions
- Success metrics with baselines and targets
- Experiment setup with traffic allocation, segments, sample size
- Decision framework for each possible outcome
- Risks and mitigations
- Post-test analysis plan
Make the spec complete enough that someone else could run the experiment.
5
After presenting the experiment design:
- Ask if the hypothesis feels right based on their domain knowledge
- Confirm the minimum detectable effect is meaningful (worth detecting)
- Verify excluded segments are correct
- Offer to adjust parameters and recalculate sample size
- Mention they can use Product Data Analyzer skill to analyze results when complete
To run this task you must have the following required information:
> Feature or change to test. Optional: current metrics, baseline rates, daily traffic.
If you don't have all of this information, exit here and respond asking for any extra information you require, and instructions to run this task again with ALL required information.
---
You MUST use a todo list to complete these steps in order. Never move on to one step if you haven't completed the previous step. If you have multiple read steps in a row, read them all at once (in parallel).
Add all steps to your todo list now and begin executing.
## Steps
1. [Read Experimentation Guide]: Read the documentation in: `./skills/sauna/[skill_id]/references/product.experimentation.guide.md` (Get experiment design framework and output template)
2. Gather experiment context from the user:
1. **What are we testing?**
- The specific change or feature
- Control experience vs variant experience
- Any mockups or screenshots available?
2. **Hypothesis**
Help them formulate: "If we [change], then [metric] will [direction] by [amount], because [rationale]."
- What's the expected behavior change?
- What evidence supports this hypothesis?
3. **Success metrics**
- Primary metric (the one that decides ship/no-ship)
- Secondary metrics (supporting signals)
- Guardrail metrics (must not regress)
- Current baseline values for each
4. **Experiment parameters**
- Target user segments
- Any users to exclude?
- Available daily traffic
- Any timing constraints?
Check `./documents/product/profile.yaml` for relevant product context.
3. Calculate sample size using the experiment sampler:
1. Identify the primary metric's baseline rate (e.g., 12% conversion = 0.12)
2. Determine minimum detectable effect (e.g., 15% relative lift = 0.15)
3. Use standard 95% significance and 80% power unless user specifies otherwise
Run `./skills/sauna/[skill_id]/scripts/product.experiment.sampler.js` with:
- baselineRate: the current conversion rate
- mde: the minimum detectable effect as relative change
- significance: 0.95 (default)
- power: 0.8 (default)
- dailyTraffic: if provided, to estimate duration
Report the sample size requirements and estimated duration.
4. Build the complete experiment specification using the template from the guide:
1. **Overview section** with owner, status, dates
2. **Hypothesis** in the standard formula format
3. **Changes being tested** with control and variant descriptions
4. **Success metrics** with baselines and targets
5. **Experiment setup** with traffic allocation, segments, sample size
6. **Decision framework** for each possible outcome
7. **Risks and mitigations**
8. **Post-test analysis plan**
Make the spec complete enough that someone else could run the experiment.
5. After presenting the experiment design:
- Ask if the hypothesis feels right based on their domain knowledge
- Confirm the minimum detectable effect is meaningful (worth detecting)
- Verify excluded segments are correct
- Offer to adjust parameters and recalculate sample size
- Mention they can use Product Data Analyzer skill to analyze results when complete