Evals Suite Runner

Prepare eval-case.json, run a deterministic local scorer, and send the scorecard to Evals when available.

Instantiate locally
eval-case.json
{
  "suite": "sandbox-tool-test",
  "checks": [
    {
      "name": "has_ok",
      "expected": true
    }
  ]
}
Help