No server. No signup. Multi-objective scoring from YAML specs. Deterministic code judges + customizable LLM judges, version-controlled in Git.
No cloud dependency. All data stays on your machine. Zero overhead to get started.
Correctness, latency, cost, and safety measured in a single evaluation run.
Deterministic code validators and customizable LLM judges, composable and extensible.
Direct LLM providers plus Claude Code, Codex, Pi, Copilot, OpenCode agent targets.
Structured criteria with weights and auto-generation. Google ADK-style object rubrics.
Compare evaluation runs side-by-side with statistical deltas and regression detection.
npm install -g agentv agentv init Copy .env.example to .env and add your API keys.
description: Math evaluation
execution:
target: default
tests:
- id: addition
criteria: Correctly calculates 15 + 27 = 42
input: What is 15 + 27? agentv eval ./evals/example.yaml | Layer | Tool | When | What it does |
|---|---|---|---|
| Evaluate | AgentV | Pre-production | Score agents, detect regressions, gate CI/CD |
| Govern | Agent Control | Runtime | Enforce policies on agent actions |
| Observe | Langfuse | Runtime | Trace execution, monitor production |