Glossary

67 terms.

Agent Variant (variant)

A version of the agent differing by one of the five improvement levers — prompt, tool definitions, router logic, skill structure, or model selection. The discipline is to change one lever at a time: alter the prompt and the model together and you can’t attribute the result to either. Variants are compared head-to-head in an experiment.

Agreement Rate (judge agreement)

The fraction of cases where an LLM judge produces the same label as the ground-truth evaluator. It is the headline meta-evaluation metric — target ≥ 0.90, and below about 0.85 don’t ship the judge. Its blind spot: it hides which way the judge errs, so a high agreement rate can still sit on top of a dangerous false-positive (leniency) rate. Always report it alongside FPR.

AI Agent (agent, agentic system)

A software system that takes actions on behalf of a user using reasoning — deciding what to do next, which tool to call, and when to stop, rather than emitting a single fixed response. It composes three capabilities: reasoning (the LLM), routing (tool selection), and action (execution), then loops on the result. What makes an agent harder to evaluate than a plain LLM application is the third, agent layer: multi-step, dynamic paths whose success depends on choices made at run time.

Alert Threshold (monitoring threshold)

The deviation that fires a production alert — commonly about >10% from a metric’s baseline, set per metric (evaluator scores, convergence, LLM-calls/query, p50/p99 latency, cost/query). The tuning trade-off: too tight and the alert is noisy and ignored, too loose and a silent regression slips past. Thresholds turn the monitored time-series into actionable signals.

Automatic Instrumentation (auto-instrumentation, zero-code instrumentation)

Zero-code tracing: a library hooks the LLM API client and emits a span for every call without touching your code. It delivers instant LLM-level detail — prompts, completions, tokens, latency — but cannot see your router logic, tool dispatch, or business code, since those aren’t API calls. The fast first step; pair it with manual spans to fill the gaps.

Centralised Routing (central router)

A routing architecture where a single router makes every decision. Because all routing flows through one decision point, it is the easiest to trace and to evaluate — but that single point becomes a bottleneck for complex, branching flows. Contrast with distributed routing, which trades evaluation simplicity for flexibility.