Glossary

Context Window

A context window is the maximum amount of text a language model can process in a single call—both your input and its output combined. Measured in tokens (roughly three-quarters of a word), context windows range from 4,000 tokens in early GPT-3.5 to over a million in recent models like Gemini. A larger context window means you can feed the model more documents, longer conversations, or bigger codebases in one shot. But bigger is not automatically better: models tend to pay less attention to information in the middle of long contexts (the "lost in the middle" problem), and cost scales linearly with token count. Context window size is one of the most practical constraints in AI system design—it determines whether you can stuff the answer into the prompt or need a retrieval architecture to find it first.

Referenced in these posts:

Noah on Bloomberg Odd Lots: Why the Tech World Is Going Crazy for Claude Code

On Bloomberg’s Odd Lots, Noah Brier highlights Claude Code as a “computer within your computer,” using file system access and Unix commands to bypass token-heavy workflows and enable direct file manipulation. This fundamental reinvention of computing architecture ushers in a new era of structured, human-in-the-loop software development akin to sophisticated pair programming.

The Magic of Claude Code

Claude Code combines a terminal-based Unix command interface with filesystem access to give LLMs persistent memory and seamless tool chaining, transforming it into a powerful agentic operating system for coding and note-taking. Its simple, composable approach offers a blueprint for reliable AI agents that leverage the Unix philosophy rather than complex multi-agent architectures.

Thinking Ahead, Building Ahead

Why the best AI products ship before they’re ready—and why that's exactly right.

Related terms:

Inference

Inference is the process of running a trained model on new input to generate a prediction or output—such as sending a prompt to GPT-4 and receiving a response. Unlike training, which is costly and infrequent, inference occurs millions of times per day, with speed (tokens per second) and cost (dollars per million tokens) determining an AI feature’s responsiveness and economic viability.

Embeddings

Embeddings are numerical representations of text—vectors of hundreds or thousands of floating-point numbers—that capture semantic meaning in a form machines can compare mathematically. They power semantic search, recommendation engines, clustering, anomaly detection, and the retrieval half of RAG architectures.