RLHF
Reinforcement Learning from Human Feedback (RLHF) is the training technique that transformed large language models from impressive autocomplete engines into useful assistants by systematically aligning their outputs with human preferences. First popularized by OpenAI's InstructGPT paper in 2022, the process trains a reward model on thousands of human comparisons—which response is better?—then uses reinforcement learning to tune the base model toward responses humans actually prefer. This alignment layer is why modern AI can follow complex instructions, refuse harmful requests, and match organizational tone—making it the invisible substrate beneath every enterprise AI deployment.
Referenced in these posts:
Satisficing for LLMs
By applying Herbert Simon’s concept of satisficing to AI, this post argues that language models might prefer logical‐sounding content over emotional appeals,...
Related terms:
Chain-of-Thought
Chain-of-thought prompting, introduced by Google Research in 2022, transforms AI from an answer machine into a reasoning partner by explicitly modeling the...
Generative AI
Generative AI refers to AI systems that learn statistical patterns from training data to create new content—such as text, images, code, audio, or...
AI Strategy
AI strategy is an organization’s plan for how it will use—and what it will not use—AI to achieve business outcomes, answering concrete questions about...