Andrej Karpathy: The Evolution of Software in the Age of AI

Imagine programming a computer not just with code but with natural language prompts. As AI and LLMs transform the landscape, every developer must adapt to a new era of collaborative intelligence.

Navigating Software’s Vast Change

In the opening, Karpathy reminds us that technology never stands still. For almost seventy years, the core mechanisms of software 1.0 remained relatively stable—hand-crafted code dictating every function. Within the last few years, though, we’ve seen two seismic shifts: first to neural-network–driven software 2.0 and now to prompt-based software 3.0. Each wave requires a distinct blend of skills in programming, data curation, and model optimization. Every layer of the stack—from operating systems to user interfaces—faces rewrites in the wake of scalable transformers and API-driven architectures. The proliferation of cloud-hosted LLM services means that deployment patterns, security models, and performance budgets must all be rethought. Students and professionals entering the industry today must recognize not only the historical cycles of change but also the unprecedented speed at which LLMs are reshaping our expectations of what software can do. Understanding these shifts is not merely advantageous but essential to navigate the rapidly evolving software landscape.

From Code to Weights: The Evolution to Software 3.0 and Beyond

Software 1.0 was defined by explicit instructions: developers wrote conditionals, loops, and data structures that told machines exactly how to behave. Software 2.0 introduced training pipelines—datasets, network architectures, and optimization algorithms—to shape millions or billions of parameters instead of hand-coding every rule. Hugging Face, Model Zoo, and other hubs now serve as the GitHub equivalents for model weights, where anyone can fork, fine-tune, and contribute new checkpoints. Software 3.0 goes a step further: large language models become programmable through natural language prompts. Rather than authoring Python scripts or retraining networks, you describe your intended behavior in English. When Karpathy applied this lens at Tesla, he observed autopilot’s C++ codebases gradually swallowed by neural models, migrating image stitching and sensor fusion to parameter-learned networks and eliminating thousands of lines of handcrafted logic. Today, prompt-driven interfaces democratize innovation: fluent English speakers can prototype features that once demanded years of formal training.

LLMs as the New Operating Systems of Intelligence

Karpathy likens modern LLMs to essential utilities—intelligence delivered as a service—with hefty infrastructure investments supporting pay-per-use APIs, low-latency responses, and high-uptime SLAs. Like semiconductor fabs and power grids, labs such as OpenAI, Google, and Anthropic shoulder enormous CAPEX to train these models, followed by ongoing OPEX to serve inference at scale. Yet LLMs transcend simple commodities: they resemble operating systems that orchestrate memory, compute, and external tools. Context windows act as RAM buffers; transformer weights are the CPU cores; and plugins or tool calls emulate system calls. Clients connect over the network in a time-sharing model, echoing 1960s mainframes, while routing layers let users switch seamlessly among multiple providers. This operating-system perspective helps us build robust ecosystems of LLM-powered applications—from coding assistants to research platforms—where modular GUIs and dynamic tool integrations mirror traditional OS architectures.

“AI is the new electricity.” — Andrew Ng

The Psychology and Contextual Limits of LLMs

LLMs simulate a “people spirit” via autoregressive transformers trained on vast text corpora, granting them encyclopedic knowledge and superhuman recall—much like Dustin Hoffman’s savant in Rain Man. Yet they exhibit jagged intelligence: flawless in some tasks but prone to hallucinations, misremembering facts or inventing plausible errors—mixing up 9.11 with 9.9, for instance. Unlike continuous learners, they lack long-term memory consolidation: once the context window ends, they suffer entrograde amnesia until prompts reintroduce necessary background. Popular culture films like Memento and 50 First Dates dramatize these reset-every-session memory gaps, illustrating why persistent memory remains an open research frontier. Security and trust present further challenges: LLMs can fall victim to prompt-injection exploits or inadvertently leak private data. Effective LLM applications must therefore accommodate these cognitive quirks—mitigating hallucinations, managing context windows, and reinforcing guardrails to maintain consistency, accuracy, and reliability.

Building Partial Autonomy Apps: Verification and Oversight

The most impactful LLM applications embrace partial autonomy, where humans and machines collaborate within well-defined loops. Tools like Cursor and Perplexity layer intuitive GUIs over LLMs, handling context management and orchestrating multiple models—embeddings for search, chat models for dialogue, diff-applying engines for code changes—behind the scenes. Users benefit from an autonomy slider: from token-level completions and single-file patches to full-repo transformations, each level trading off tight control for speed. Visual interfaces accelerate verification by highlighting diffs, citing sources, and providing click-to-accept actions rather than demanding manual text prompts for every step. Analogous to Tony Stark calibrating his Iron Man suit, developers fine-tune agentic behaviors, keeping AI firmly leashed to prevent runaway modifications. A fast, reliable generate-verify cycle unlocks productivity, ensuring that intelligent suggestions transform into safe, auditable outcomes without compromising human oversight.

Vibe Coding and the Democratization of Programming

Karpathy’s notion of “vibe coding” captures the thrill of sketching ideas in plain English and watching LLMs translate them into working prototypes. A viral tweet and videos of children “vibe coding” simple games demonstrate how natural language UIs lower entry barriers. In one case, Karpathy spun up an iOS app in Swift in a single afternoon despite lacking prior experience; in another, he launched menu.app—a restaurant-menu scanner that generates dish images via LLMs—with just a few hundred words of prompt. These stories highlight how anyone can prototype domain-Transcending tools without weeks of study. Yet the friction of production—integrating authentication, payments, and hosting—remains a manual devops slog. The next frontier of vibe coding lies in automating deployment pipelines and backend workflows so that quick experiments can graduate seamlessly to real-world applications.

Preparing Software for Agents: Docs, APIs, and New Protocols

As software evolves for LLM consumption, documentation and APIs must become agent-friendly by design. Traditional HTML pages, screenshots, and tutorials confuse language models, so platforms like Stripe and Vercel now publish machine-readable markdown with explicit examples. Just as robots.txt guides web crawlers, an lm.txt file at a domain’s root can orient LLM agents on site structure and usage policies. Vague instructions like “click here” give way to precise curl commands or programmatic endpoints, enabling agents to navigate reliably. Tools such as get-ingest for GitHub and Deep Wiki consolidate repositories into cohesive text artifacts ready for LLM analysis. Emerging standards like the Model Context Protocol (MCP) formalize how agents request context windows, tool access, and versioning metadata. By meeting language models halfway—providing structured, parseable artifacts—developers ensure smoother integrations, faster agent onboarding, and more robust LLM-driven workflows.

Conclusion: Embracing Collaborative Intelligence

We stand at a crossroads in software evolution. From explicit code to neural weights and now to natural-language prompts, LLMs have become the CPUs of collaborative intelligence. To thrive, developers must master hybrid paradigms—balancing autonomy with human oversight, optimizing verification loops, and adapting infrastructure for agent-native consumption. The future belongs not to fully autonomous agents nor to traditional programs alone, but to systems that empower humans and machines to co-author solutions.

Bold Actionable Takeaway
• Start integrating LLMs into your existing tools with a focus on context management and custom GUIs to accelerate the human-AI collaboration cycle.

Andrej Karpathy: The Evolution of Software in the Age of AI

Jump to Specific Moments

Andrej Karpathy: The Evolution of Software in the Age of AI

Navigating Software’s Vast Change

From Code to Weights: The Evolution to Software 3.0 and Beyond

LLMs as the New Operating Systems of Intelligence

The Psychology and Contextual Limits of LLMs

Building Partial Autonomy Apps: Verification and Oversight

Vibe Coding and the Democratization of Programming

Preparing Software for Agents: Docs, APIs, and New Protocols

Conclusion: Embracing Collaborative Intelligence

Topics: