Evals Suite Runner
Prepare eval-case.json, run a deterministic local scorer, and send the scorecard to Evals when available.
eval-case.json
{
"suite": "sandbox-tool-test",
"checks": [
{
"name": "has_ok",
"expected": true
}
]
}