Glossary

Transformer

The transformer is the neural network architecture behind every major language model since 2017. Introduced in the paper "Attention Is All You Need" by Vaswani et al. at Google, it replaced recurrent networks with a mechanism called self-attention that lets the model weigh the relevance of every word against every other word in parallel. This parallelism is what made training on internet-scale data feasible—and what made GPUs the bottleneck. Transformers are not limited to text: the same architecture powers image generation (Vision Transformers), protein folding (AlphaFold), and audio synthesis. The key insight was architectural simplicity. Transformers do one thing—attention—and scale it. That turned out to be enough for an extraordinary range of tasks, which is why transformer-based models now dominate AI research and production systems alike.

Referenced in these posts:

Things I Think I Think About AI

Noah distills his 2,400+ hours of AI use into a candid, unordered list of 29 controversial takeaways—from championing ChatGPT’s advanced models and token maximalism to predicting enterprise adoption bottlenecks—and invites fellow practitioners to discuss. CMOs can reach out to Alephic for expert guidance on integrating AI into their marketing organizations.

Related terms:

Token

In large language models, a token is the basic unit of text—usually chunks of three to four characters—that the model reads and generates. Since API costs, context windows, and rate limits are all measured in tokens, understanding tokenization is essential for controlling prompt length, cost, and model behavior.

Fine-Tuning

Fine-tuning continues training a pretrained language model on a smaller, task-specific dataset so it internalizes particular behaviors, styles, or domain knowledge. While it yields more consistent formatting and terminology than prompting alone, it requires curated data, additional training time, and can lead to loss of general capabilities.

Prompt Engineering

Prompt engineering involves designing and refining inputs—ranging from simple instructions to detailed system prompts with examples, constraints, personas, and chain-of-thought scaffolding—to elicit desired outputs from a language model. It’s the most accessible way to boost AI performance, requiring no training data or ML expertise, but prompts can be fragile, hard to version-control, and easy to overfit.