Context Window
A context window is the maximum amount of text a language model can process in a single call—both your input and its output combined. Measured in tokens (roughly three-quarters of a word), context windows range from 4,000 tokens in early GPT-3.5 to over a million in recent models like Gemini. A larger context window means you can feed the model more documents, longer conversations, or bigger codebases in one shot. But bigger is not automatically better: models tend to pay less attention to information in the middle of long contexts (the "lost in the middle" problem), and cost scales linearly with token count. Context window size is one of the most practical constraints in AI system design—it determines whether you can stuff the answer into the prompt or need a retrieval architecture to find it first.
Referenced in these posts:
Noah on Bloomberg: Claude Code and the AI Coding Boom
On Bloomberg’s Odd Lots, Noah Brier highlights Claude Code as a “computer within your computer,” using file system access and Unix commands to bypass...
The Magic of Claude Code
Claude Code combines a terminal-based Unix command interface with filesystem access to give LLMs persistent memory and seamless tool chaining, transforming...
Thinking Ahead, Building Ahead
Why the best AI products ship before they're ready—and why that's exactly right. In the AI era, speed and iteration beat waiting for perfection.
Related terms:
Inference
Inference is the process of running a trained model on new input to generate a prediction or output—such as sending a prompt to GPT-4 and receiving a...
Embeddings
Embeddings are numerical representations of text—vectors of hundreds or thousands of floating-point numbers—that capture semantic meaning in a form machines...