RAG (Retrieval-Augmented Generation)

Retrieval-augmented generation (RAG) is an architecture pattern that connects a large language model to external knowledge sources—documents, databases, APIs—so its responses draw on real, current information rather than relying solely on what it memorized during training. The model retrieves relevant context at query time, then generates an answer grounded in that evidence. RAG is one way enterprises make general-purpose AI useful for their specific business, though the approach is not without its limitations and has been surpassed by simpler read/write/grep tools in many instances.

Related terms:

Foundation Model

A foundation model is a large AI model trained on broad data at massive scale, designed to be adapted to a wide range of downstream tasks rather than built for any single one. Coined in 2021 by Stanford’s Center for Research on Foundation Models, this approach boosts efficiency but concentrates power among providers like OpenAI, Google, Meta, and Anthropic.

LLM (Large Language Model)

A large language model is a neural network with billions of parameters trained on massive text corpora to predict the next word in a sequence, powering tasks from coding and summarization to translation and conversation. Though general-purpose by default, LLMs require prompting, fine-tuning, or data integration to excel at specific tasks.

System Prompt

A system prompt is an invisible set of instructions given to a language model—defining its persona, constraints, output format, and behavioral rules—and occupies the “system” role in APIs like OpenAI and Anthropic. It shapes every response by encoding business logic and is the most efficient way to control model behavior.