state icon State

Evaluation Results

JSON result files from each evaluation. Each eval writes to eval/[eval_id].json (e.g., eval/1_multistep.json). Contains outcome data for LLM-as-judge scoring.

session/eval/[eval_id].json