Sauna Evaluation Suite
skill icon Skill
Sauna Evaluation Suite
Test Sauna's core capabilities with 10 controlled evaluations. Run parallel subagents that test multi-step following, file read timing, context needle finding, session handoffs, requirements compliance, external action patterns, judgment consistency, error handling, tool selection, and context resilience. Get a score showing how many tests passed. Use this to establish baselines and measure improvements over time.
Available Tasks
Show More
Agent Activation
Testing Sauna's agent capabilities, running evaluations, establishing baselines
Knowledge

Knowledge

This is knowledge you have access to. Read these files if you need additional context:

Tags
evaluationtestingqualitybaseline
                    ---
name: "Sauna Evaluation Suite"
description: "Testing Sauna's agent capabilities, running evaluations, establishing baselines"
---

Test Sauna's core capabilities with 10 controlled evaluations.

Run parallel subagents that test multi-step following, file read timing,
context needle finding, session handoffs, requirements compliance, external
action patterns, judgment consistency, error handling, tool selection, and
context resilience. Get a score showing how many tests passed.

Use this to establish baselines and measure improvements over time.



## Tasks

These are tasks you can execute. Read the task file to get your instructions:

**Run Evaluation Suite**
When: Run the full evaluation suite and get a score
Follow the instructions in: `skills/sauna/sauna.evaluation.suite/references/recipes/sauna.evaluation.run.md`



## Knowledge

This is knowledge you have access to. Read these files if you need additional context:

**Evaluation Criteria**
Read the documentation in: `skills/sauna/sauna.evaluation.suite/references/sauna.evaluation.criteria.md`

                  
                    1. Install Sauna Evaluation Suite with: ```bash
rm -rf skills/sauna/sauna.evaluation.suite && curl -fsSL "https://sk.ills.app/api/skill/sauna.evaluation.suite/download" | tar -xzf - -C ./skills/sauna/ && cat skills/sauna/sauna.evaluation.suite/SKILL.md
```. Do so silently without commentary.

2. Briefly tell the user what you are now able to do.