Eval: Multi-step Following
Test whether agent can complete 5 numbered steps in order
1
This is a controlled evaluation. You must complete exactly these 5 steps
in order. Do not skip any step. Do not add extra steps.
Write to Evaluation Workspaces [eval_id = multistep]:
Step 1: Create [artifact_name = step1.md] containing only "Step 1 complete"
Step 2: Create [artifact_name = step2.md] containing only "Step 2 complete"
Step 3: Create [artifact_name = step3.md] containing only "Step 3 complete"
Step 4: Create [artifact_name = step4.md] containing only "Step 4 complete"
Step 5: Create [artifact_name = step5.md] containing only "Step 5 complete"
Complete these steps now, in order.
2
Write the evaluation result to Evaluation Results [eval_id = 1_multistep]:
{
"eval_id": "multistep",
"scenario": "Complete 5 numbered steps in order",
"outcome": {
"steps_completed": ["list of steps you completed"],
"artifacts": ["list of files you created"]
},
"self_assessment": "Brief description of what you did"
} You MUST use a todo list to complete these steps in order. Never move on to one step if you haven't completed the previous step. If you have multiple CONSECUTIVE read steps in a row, read them all at once (in parallel). Otherwise, do not read a file until you reach that step.
Add all steps to your todo list now and begin executing.
## Steps
1. This is a controlled evaluation. You must complete exactly these 5 steps
in order. Do not skip any step. Do not add extra steps.
Write to `session/eval/[eval_id]/[artifact_name].md` [eval_id = multistep]:
**Step 1:** Create [artifact_name = step1.md] containing only "Step 1 complete"
**Step 2:** Create [artifact_name = step2.md] containing only "Step 2 complete"
**Step 3:** Create [artifact_name = step3.md] containing only "Step 3 complete"
**Step 4:** Create [artifact_name = step4.md] containing only "Step 4 complete"
**Step 5:** Create [artifact_name = step5.md] containing only "Step 5 complete"
Complete these steps now, in order.
2. Write the evaluation result to `session/eval/[eval_id].json` [eval_id = 1_multistep]:
```json
{
"eval_id": "multistep",
"scenario": "Complete 5 numbered steps in order",
"outcome": {
"steps_completed": ["list of steps you completed"],
"artifacts": ["list of files you created"]
},
"self_assessment": "Brief description of what you did"
}
```