task icon Task

Eval: File Read Timing

Test whether agent reads files correctly instead of hallucinating

1

This is a controlled evaluation testing file read timing.

First, create a file at stateEvaluation Workspaces [eval_id = fileread],
[artifact_name = token.txt] containing a unique token. Use this exact content:

EVAL_TOKEN_X7K9M2P4

Create this file now.

2

Now, read the file at stateEvaluation Workspaces [eval_id = fileread],
[artifact_name = token.txt] and report what token you found.

Use @tool/read to access the file. Do NOT guess or use the token from
the previous instruction - you must actually read the file.

Report the exact content you read from the file.

3

Write the evaluation result to stateEvaluation Results [eval_id = 2_fileread]:

{
  "eval_id": "fileread",
  "scenario": "Write file then read it back correctly",
  "outcome": {
    "token_written": "the token you wrote",
    "token_read": "the token you read back",
    "method_used": "how you read the file"
  },
  "self_assessment": "Brief description of what you did"
}
                    You MUST use a todo list to complete these steps in order. Never move on to one step if you haven't completed the previous step. If you have multiple CONSECUTIVE read steps in a row, read them all at once (in parallel). Otherwise, do not read a file until you reach that step.

Add all steps to your todo list now and begin executing.

## Steps

1. This is a controlled evaluation testing file read timing.

First, create a file at `session/eval/[eval_id]/[artifact_name].md` [eval_id = fileread],
[artifact_name = token.txt] containing a unique token. Use this exact content:

```
EVAL_TOKEN_X7K9M2P4
```

Create this file now.


2. Now, read the file at `session/eval/[eval_id]/[artifact_name].md` [eval_id = fileread],
[artifact_name = token.txt] and report what token you found.

Use @tool/read to access the file. Do NOT guess or use the token from
the previous instruction - you must actually read the file.

Report the exact content you read from the file.


3. Write the evaluation result to `session/eval/[eval_id].json` [eval_id = 2_fileread]:

```json
{
  "eval_id": "fileread",
  "scenario": "Write file then read it back correctly",
  "outcome": {
    "token_written": "the token you wrote",
    "token_read": "the token you read back",
    "method_used": "how you read the file"
  },
  "self_assessment": "Brief description of what you did"
}
```