Transformer

The transformer is the neural network architecture behind every major language model since 2017. Introduced in the paper "Attention Is All You Need" by Vaswani et al. at Google, it replaced recurrent networks with a mechanism called self-attention that lets the model weigh the relevance of every word against every other word in parallel. This parallelism is what made training on internet-scale data feasible—and what made GPUs the bottleneck. Transformers are not limited to text: the same architecture powers image generation (Vision Transformers), protein folding (AlphaFold), and audio synthesis. The key insight was architectural simplicity. Transformers do one thing—attention—and scale it. That turned out to be enough for an extraordinary range of tasks, which is why transformer-based models now dominate AI research and production systems alike.

Referenced in these posts:

Things I Think I Think About AI

Noah distills his 2,400+ hours of AI use into a candid, unordered list of 29 controversial takeaways—from championing ChatGPT’s advanced models and token...

Transformer

Referenced in these posts:

Things I Think I Think About AI

Related terms:

Token

Fine-Tuning

Prompt Engineering