Glossary

RLHF

Reinforcement Learning from Human Feedback (RLHF) is the training technique that transformed large language models from impressive autocomplete engines into useful assistants by systematically aligning their outputs with human preferences. First popularized by OpenAI's InstructGPT paper in 2022, the process trains a reward model on thousands of human comparisons—which response is better?—then uses reinforcement learning to tune the base model toward responses humans actually prefer. This alignment layer is why modern AI can follow nuanced instructions, refuse harmful requests, and match organizational tone—making it the invisible substrate beneath every enterprise AI deployment.

Referenced in these posts:

Satisficing for LLMs

By applying Herbert Simon’s concept of satisficing to AI, this post argues that language models might prefer logical‐sounding content over emotional appeals, mirroring human biases but inverted. It unveils a paradox: humans use emotion to decide rationally, while LLMs use pseudo‐rational style to appear helpful.

Related terms:

Chain-of-Thought

Chain-of-thought prompting, introduced by Google Research in 2022, transforms AI from an answer machine into a reasoning partner by explicitly modeling the problem-solving process step by step. By decomposing complex queries into sequential reasoning steps and making implicit thinking explicit, it fundamentally improves AI performance.

Strategic Software

Strategic software combines frontier AI models, custom code, and unique organizational expertise to tackle qualitative, strategic marketing challenges—from analyzing brand positioning and predicting market trends to optimizing customer journey orchestration. Its continuous learning loops compound organizational intelligence over time, delivering consultant-level insights at operational speed and scale.

Zero-Shot Prompting

Zero-shot prompting is the most basic form of AI interaction where questions are posed without any examples or guidance, relying entirely on the model’s pre-trained knowledge. This baseline approach immediately tests raw capabilities, revealing both its breadth and limitations.