The Framing Problem in Language Model Interfaces

The bug discovered in Field Theory's transcript improvement feature—where Claude responds to dictated text as a question rather than rewriting it—illustrates a fundamental challenge in human-computer interaction that dates back to Terry Winograd's SHRDLU system in 1970. SHRDLU could manipulate virtual blocks through natural language, but only because its world was constrained: every utterance was unambiguously a command. Modern LLMs operate in unconstrained contexts, making the distinction between "text to process" and "instruction to follow" inherently ambiguous. When a user dictates "Could you examine our sound implementation," the model faces a classification problem: is this a request (respond with analysis) or data (clean up this transcript)?

The solution—wrapping input with explicit framing like "Transcript to improve:"—mirrors the concept of protocol headers in network communication. Just as TCP packets include headers that tell receiving systems how to interpret the payload, LLM prompts benefit from explicit metadata that disambiguates intent. This pattern appears throughout computing history: MIME types tell browsers how to render content, shebang lines tell Unix which interpreter to use, and BOM markers indicate text encoding. The transcript wrapper serves the same function—a machine-readable signal that says "what follows is data, not instruction."

This framing problem becomes more acute as LLMs grow more capable. A less intelligent system might have failed to answer the "question" at all; Claude's competence becomes a liability when the interface doesn't clearly separate the control plane (system prompt) from the data plane (user content). The fix is trivial—a single line change—but the underlying principle matters: in any system where instructions and data share the same channel, explicit delimiters prevent injection-style confusion.