Glossary

AI Evaluation

AI evaluation is the practice of systematically measuring how well an AI system performs against defined criteria—accuracy, latency, cost, safety, user satisfaction—before and after deployment. This sounds obvious. It is not widely practiced. Most organizations deploying AI in 2025 evaluate by vibes: someone runs a few test queries, the results look reasonable, and the system ships. Rigorous evaluation requires test datasets that represent real usage, metrics that map to business outcomes (not just model benchmarks), and automated pipelines that run evaluations on every change. The gap between "works in a demo" and "works reliably in production" is almost entirely an evaluation gap. Without good evals, you cannot tell whether a prompt change, model upgrade, or architecture tweak made things better or worse. You are flying blind and calling it agile.

Related terms:

Structured Output

Structured output occurs when a language model returns data in predictable, machine-readable formats—such as JSON, XML, or typed objects—rather than free-form prose, enabling software systems to reliably parse fields like names, dates, and dollar amounts. By using constrained generation to enforce a JSON schema, structured output transforms AI from a conversational interface into a dependable system component.

Prompt Engineering

Prompt engineering involves designing and refining inputs—ranging from simple instructions to detailed system prompts with examples, constraints, personas, and chain-of-thought scaffolding—to elicit desired outputs from a language model. It’s the most accessible way to boost AI performance, requiring no training data or ML expertise, but prompts can be fragile, hard to version-control, and easy to overfit.

Prompt Injection

Prompt injection is an attack where a user or data source inserts instructions that override a language model’s intended behavior. This vulnerability affects any system accepting untrusted text—chatbots, RAG pipelines, email summarizers—and currently has no complete defense.