The Moat Is What the Model Can't See
Sequoia hosted its second AI Ascent recently and somewhere in the middle of the lineup Andrej Karpathy gave a fireside chat that I think has quietly become one of the more useful ways of framing where software is heading, mostly because he manages to describe what’s happening without drifting into either AGI prophecy or the now-standard “everything becomes an agent company” framing that increasingly accompanies enterprise AI discussions. His formulation is that we’ve moved through three broad eras of software: Software 1.0, where humans write explicit code; Software 2.0, where humans train neural networks on datasets; and now Software 3.0, where the “program” is increasingly just the context loaded into the model window, with the LLM itself functioning as the runtime that interprets the instructions and executes the work.
This sounds vaguely abstract until he gets to the example, at which point the whole thing snaps into focus in a slightly unpleasant way.
He talks about building an app called MenuGen, which does roughly what it sounds like: you upload a restaurant menu, the app OCRs the dishes, generates images for each one, and renders the result back onto the menu. Importantly, this was not a toy demo. He actually built the thing. Vercel, auth, Stripe, DNS, all the ordinary infrastructure that surrounds putting even a small application into the world, which is part of what makes the story useful rather than merely illustrative. Then, midway through building it, he realized he could effectively hand the menu image directly to Gemini and ask it to perform the transformation natively using Nano Banana, no application layer required. The app, as he put it, shouldn’t exist.
What’s unsettling about the example is not really the app itself, which by all accounts seems perfectly competently built, but the realization that the underlying assumption beneath the product had quietly dissolved while the product was still being constructed. The “feature,” for lack of a better term, had been absorbed upward into the frontier model itself, leaving behind a kind of hollowed-out shell of infrastructure surrounding something that no longer needed infrastructural support in the first place. And the more I’ve thought about that example over the last few weeks, the more I think it describes a huge percentage of what’s currently happening across the AI startup landscape, albeit in slower motion than most people realize. A lot of products being built right now are not becoming platforms or enduring applications so much as temporary organizational structures around capabilities that frontier models are steadily converging on anyway.
Which raises the obvious question, namely: what survives?
My suspicion is that the durable layer is neither the model itself nor, in most cases, the application layer sitting on top of it, but the awkward operational middle where proprietary context accumulates through repeated workflow usage over long periods of time. “Context” is admittedly doing an enormous amount of work in that sentence,1 but I think the distinction matters because people often talk about context as though it were merely information sitting statically inside a prompt window, when in practice the more important thing is the set of systems that determine what information is available to load into the prompt in the first place.
Gemini can interpret a publicly photographable restaurant menu. It cannot, at least not on its own, access three years of internal Slack conversations, retailer-by-retailer POS feeds for a CPG company, the weird tribal approval logic embedded inside procurement workflows, customer support histories tied to identity graphs, or the accumulated operational residue that builds up inside organizations after a decade of running actual businesses through software systems. Anything public, reproducible, promptable, or broadly available on the open internet feels vulnerable to model convergence. Anything tightly coupled to operational workflow generation feels considerably harder to commoditize.
And once you start looking at the current AI landscape through that lens, the pattern becomes almost embarrassingly obvious.
Cursor works because the editor is not just a UI wrapper around a model. The editor is the environment where the operational context lives: the file tree, the diffs, the terminal state, the dependency graph, the language server, the execution history. Claude Code works for roughly the same reason. Cognition works for the same reason. Even Google’s recently launched Antigravity, which still sounds vaguely like the name of a discontinued energy drink, is fundamentally valuable because codebases contain dense, continuously updating, highly permissioned context that is difficult for outside systems to replicate.
The same thing happened, mostly accidentally, to products like Notion, Confluence, and Linear. These were not originally conceived as AI-native businesses. They became strategically important because organizations spent the previous decade externalizing institutional memory into them. Once LLMs became serviceable, those products discovered they were sitting on extraordinarily valuable context reservoirs almost by accident.
And then there’s the category that I still think people systematically under-discuss in AI conversations, which is vertical SaaS. Toast in restaurants. ServiceTitan in the trades. Procore in construction. Crisp in CPG and retail, which is where I happen to work. These companies captured workflow exhaust years before anyone started using the phrase “agentic” in public with a straight face, and now find themselves in unusually strong positions precisely because they own the operational systems where context is continuously generated rather than merely stored.
That distinction matters more than people currently appreciate. Stored data, by itself, is probably a weaker moat than incumbents would prefer to believe. Operational systems that continuously generate fresh context through day-to-day usage are much harder to displace because they are not merely repositories of information; they are the environments through which work itself flows.
This is also why I think a lot of current AI products are in more danger than their founders realize. Karpathy’s MenuGen example is clarifying precisely because it strips away the comforting assumption that “shipping software” automatically implies durable value creation. If a frontier multimodal model with tools can plausibly perform your product’s core transformation directly inside a prompt window, there’s at least a decent chance you do not actually have a company in the long-term sense. You may have a feature sitting on borrowed time until the capability gets absorbed upward into the foundation layer.
This is where integrations start becoming strategically important in a way that I still think most people underweight. Byrne Hobart wrote a couple years ago that integrations were the most important product strategy nobody talked about, and the argument has aged unusually well because the usefulness of captured context depends entirely on whether agents can actually reach it. The platforms sitting on proprietary workflow context now face a fairly consequential strategic decision: expose that context through APIs, MCP servers, agent-native interfaces, and permission systems, or remain walled gardens optimized primarily around direct human interaction.
Karpathy touches on this in the conversation and I think he’s directionally right that the “user” of software increasingly looks like an agent acting on behalf of a human rather than the human directly. Most enterprise software still assumes the opposite. It assumes the human is the orchestration layer, manually navigating menus, clicking workflows, moving information between disconnected systems, and stitching together applications through attention and repetition. Agents collapse a surprising amount of that interaction surface.
Which is also why I’m somewhat skeptical that traditional SaaS switching costs survive the transition intact.2 The historical moat for enterprise software was a combination of data gravity, workflow entrenchment, and user inertia. But agents potentially weaken the inertia component in a fairly profound way. If my agent can read from Notion, write to Coda, update Salesforce, summarize Slack, create Linear tickets, and coordinate across all of them with roughly equivalent fluency, then the interface layer starts mattering less than it did during the pure SaaS era.
At the same time, I don’t think this automatically means incumbents lose. If anything, the more interesting possibility is that the next decade becomes a contest between two different forms of leverage that previously tended to coexist inside the same companies without being particularly separable from one another: operational context ownership on one side, and distribution plus identity infrastructure on the other.
Vertical SaaS companies and workflow-native platforms increasingly own the richest operational context. Microsoft, Google, and Salesforce still control enormous portions of the identity, permissions, and distribution substrate through which organizational work happens. Those are related advantages, but they are not identical ones, and I suspect a large portion of the next decade’s competitive landscape is basically going to be defined by the tension between them.
One possibility is that operational context ultimately wins, because agents become so effective at traversing interfaces and coordinating across systems that distribution starts mattering less than proximity to the underlying workflow itself. Another is that identity, trust, permissions, and distribution remain overwhelmingly powerful, in which case the incumbents end up absorbing much of the value anyway simply because they already sit at the center of enterprise operational gravity. Both theories feel plausible to me at the moment, which is probably another way of saying the transition is still early.
And in practice, the companies that matter most may simply be the ones that successfully combine the two. Microsoft may end up winning enormous portions of the agent era not because Copilot is categorically superior, but because Outlook, Teams, Office, Windows, Entra, and the surrounding enterprise permission infrastructure already constitute the operational substrate through which organizational work flows. The same is true, in different ways, for Google Workspace and Salesforce. Distribution survives platform transitions longer than people expect, particularly when identity and permissions are involved.
Still, where I keep landing after rolling Karpathy’s framing around for a few weeks is that the durable work in this next phase of software probably lives in the least glamorous possible layer. Not the demos. Not the chatbot wrappers. Not even necessarily the models themselves, which feel destined for a brutally capital-intensive consolidation cycle. The durable layer is the operational middle: integrations, permissions, identity graphs, governance systems, workflow software, data cleanliness, interoperability, and the machinery required to make organizational context legible to agents in the first place.
None of this makes for especially compelling keynote material. “We have exceptionally robust permissioning infrastructure” does not tend to bring audiences to their feet. But I increasingly suspect that the companies which matter ten years from now will look less like the ones producing the flashiest model demos and more like the ones quietly sitting at the center of operational context flows inside large organizations.
Karpathy deserves a fair amount of credit for telling this story on himself rather than pretending he saw the shift ahead of time. The useful part of the MenuGen example is precisely that he built the thing sincerely, with real engineering effort, and only realized midway through development that the assumptions underneath the product had shifted. Which, if we’re being honest, is probably the default condition for anyone building model-adjacent software right now, whether they admit it publicly or not.
The uncomfortable possibility is not really that the models stop improving. It’s that they continue improving faster than the assumptions embedded in the roadmap you’re currently building against, which is a different problem and arguably a worse one.
⸻
-
It’s probably worth admitting that I’m using “context” in a meaningfully broader sense than Karpathy uses “context window.” His framing is primarily about the literal tokens available to the model at inference time; mine is closer to “organizationally accumulated operational state,” which sounds considerably uglier but is probably more economically important. I think the concepts collapse toward each other once you start asking what an agent can actually act upon inside a business environment, though reasonable people could object that I’m smuggling an enterprise software thesis into a discussion nominally about model interfaces. ↩
-
The incumbent counterargument, which I think is genuinely nontrivial, is that switching costs were never primarily about UI familiarity in the first place; they were about trust, governance, compliance, and the terrifying operational risk associated with moving critical business processes between systems. This may end up mattering more than the pure “agents flatten interfaces” crowd currently acknowledges, particularly in heavily regulated industries where the ability to explain, audit, and permission agent behavior becomes more important than raw model capability. ↩