Task

Design Experiment

Requirements

Feature or change to test. Optional: current metrics, baseline rates, daily traffic.

Read Experimentation Guide slice →

Get experiment design framework and output template

Gather experiment context from the user:

What are we testing?
- The specific change or feature
- Control experience vs variant experience
- Any mockups or screenshots available?
Hypothesis
Help them formulate: "If we [change], then [metric] will [direction] by [amount], because [rationale]."
- What's the expected behavior change?
- What evidence supports this hypothesis?
Success metrics
- Primary metric (the one that decides ship/no-ship)
- Secondary metrics (supporting signals)
- Guardrail metrics (must not regress)
- Current baseline values for each
Experiment parameters
- Target user segments
- Any users to exclude?
- Available daily traffic
- Any timing constraints?

Check Product Context Profile for relevant product context.

Calculate sample size using the experiment sampler:

Identify the primary metric's baseline rate (e.g., 12% conversion = 0.12)
Determine minimum detectable effect (e.g., 15% relative lift = 0.15)
Use standard 95% significance and 80% power unless user specifies otherwise

Run A/B Test Sample Size Calculator with:

baselineRate: the current conversion rate
mde: the minimum detectable effect as relative change
significance: 0.95 (default)
power: 0.8 (default)
dailyTraffic: if provided, to estimate duration

Report the sample size requirements and estimated duration.

Build the complete experiment specification using the template from the guide:

Overview section with owner, status, dates
Hypothesis in the standard formula format
Changes being tested with control and variant descriptions
Success metrics with baselines and targets
Experiment setup with traffic allocation, segments, sample size
Decision framework for each possible outcome
Risks and mitigations
Post-test analysis plan

Make the spec complete enough that someone else could run the experiment.

After presenting the experiment design:

Ask if the hypothesis feels right based on their domain knowledge
Confirm the minimum detectable effect is meaningful (worth detecting)
Verify excluded segments are correct
Offer to adjust parameters and recalculate sample size
Mention they can use Product Data Analyzer skill to analyze results when complete

                    To run this task you must have the following required information:

> Feature or change to test. Optional: current metrics, baseline rates, daily traffic.

If you don't have all of this information, exit here and respond asking for any extra information you require, and instructions to run this task again with ALL required information.

---

You MUST use a todo list to complete these steps in order. Never move on to one step if you haven't completed the previous step. If you have multiple read steps in a row, read them all at once (in parallel).

Add all steps to your todo list now and begin executing.

## Steps

1. [Read Experimentation Guide]: Read the documentation in: `./skills/sauna/[skill_id]/references/product.experimentation.guide.md` (Get experiment design framework and output template)

2. Gather experiment context from the user:

1. **What are we testing?**
   - The specific change or feature
   - Control experience vs variant experience
   - Any mockups or screenshots available?

2. **Hypothesis**
   Help them formulate: "If we [change], then [metric] will [direction] by [amount], because [rationale]."
   - What's the expected behavior change?
   - What evidence supports this hypothesis?

3. **Success metrics**
   - Primary metric (the one that decides ship/no-ship)
   - Secondary metrics (supporting signals)
   - Guardrail metrics (must not regress)
   - Current baseline values for each

4. **Experiment parameters**
   - Target user segments
   - Any users to exclude?
   - Available daily traffic
   - Any timing constraints?

Check `./documents/product/profile.yaml` for relevant product context.


3. Calculate sample size using the experiment sampler:

1. Identify the primary metric's baseline rate (e.g., 12% conversion = 0.12)
2. Determine minimum detectable effect (e.g., 15% relative lift = 0.15)
3. Use standard 95% significance and 80% power unless user specifies otherwise

Run `./skills/sauna/[skill_id]/scripts/product.experiment.sampler.js` with:
- baselineRate: the current conversion rate
- mde: the minimum detectable effect as relative change
- significance: 0.95 (default)
- power: 0.8 (default)
- dailyTraffic: if provided, to estimate duration

Report the sample size requirements and estimated duration.


4. Build the complete experiment specification using the template from the guide:

1. **Overview section** with owner, status, dates
2. **Hypothesis** in the standard formula format
3. **Changes being tested** with control and variant descriptions
4. **Success metrics** with baselines and targets
5. **Experiment setup** with traffic allocation, segments, sample size
6. **Decision framework** for each possible outcome
7. **Risks and mitigations**
8. **Post-test analysis plan**

Make the spec complete enough that someone else could run the experiment.


5. After presenting the experiment design:

- Ask if the hypothesis feels right based on their domain knowledge
- Confirm the minimum detectable effect is meaningful (worth detecting)
- Verify excluded segments are correct
- Offer to adjust parameters and recalculate sample size
- Mention they can use Product Data Analyzer skill to analyze results when complete

Task Info

Description

Create a complete A/B test specification with hypothesis, metrics, and sample size

Steps

Tokens

698

Used By

Product Strategy Advisor skill

task:product.experiment.design