Evaluation Results
JSON result files from each evaluation. Each eval writes to eval/[eval_id].json (e.g., eval/1_multistep.json). Contains outcome data for LLM-as-judge scoring.
session/eval/[eval_id].json JSON result files from each evaluation. Each eval writes to eval/[eval_id].json (e.g., eval/1_multistep.json). Contains outcome data for LLM-as-judge scoring.
session/eval/[eval_id].json