Files

Marius Mutu 1fbd624195 chore(kb): add memory/kb to git tracking

memory/* was fully ignored; now only memory/kb/ is tracked
so notes, coaching sessions, insights, and project docs are
versioned while embeddings and sqlite databases stay untracked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-25 21:35:41 +00:00

34 KiB

Raw Blame History

I Broke Down Anthropic's $2.5 Billion Leak. Your Agent Is Missing 12 Critical Pieces.

Link: https://youtu.be/FtCdYhspm7w Durată: 26:52 Data salvare: 2026-04-03 Tags: @work @project

TL;DR

Analiza leak-ului Claude Code dezvăluie 12 primitive arhitecturale esențiale pentru agenți de producție la scară de miliarde $. 80% din succesul unui agent = plumbing solid (session persistence, permissions, token budgets, workflow state), nu doar AI fancy. Autorul lansează un skill care evaluează și proiectează harness-uri agentic bazate pe lecțiile din Claude Code.

Puncte cheie

Tool Registry cu metadata-first design - 207 comenzi user-facing, 184 tool-uri model-facing, toate ca structuri de date înainte de implementare
Permission system pe 3 niveluri - built-in (highest trust), plugin (medium), skills (lowest); bash_tool singur are 18 module de securitate
Session persistence - supraviețuiește crash-uri complete (JSON cu messages, tokens, permissions, config)
Workflow state separat de conversation state - "What step are we in?" vs "What have we said?"
Token budget tracking - hard limits, auto-compaction threshold, stop before API call dacă depășește
Structured streaming events - typed events (message_start, tool_match, etc.) + crash reason ca ultim event
System event logging - audit trail complet: context loads, routing decisions, permission grants/denials
Verification pe 2 niveluri - (a) verifică work-ul agentului, (b) testează că modificările la harness nu sparg guardrails
Tool pool assemblies - nu dă toate 184 tool-uri, ci selectează dinamic subset bazat pe mode/permissions/deny lists
Transcript compaction - automată după N turns, păstrează recent entries, discardă old ones
Permission audit trail - permissions ca first-class objects, 3 handlers: interactive (human-in-loop), coordinator (multi-agent), swarm worker (autonomous)
Agent type system - 6 tipuri built-in (explore, plan, verify, guide, general purpose, status_line_setup), fiecare cu prompt/tools/constraints proprii

Leak-uri Anthropic în aceeași săptămână:

Claude Mythos (draft blog pe server public)
Claude Code (build config error)

Velocity vs operational discipline - teorie circulantă: adaptive reasoning mode switch → session fallback la Sonnet → model commit map file

Quote-uri remarcabile

"Building agents is 80% non-glamorous plumbing work and 20% AI."

"If your agent can take actions in the world... and you don't have a permissions layer, you have just a demo, right? You don't have a product."

"When you resume a conversation, it is not the same thing as resuming a workflow."

"The most common failure mode in agentic systems isn't underengineering. It's actually overengineering."

"This is the architecture of scale. And it's amazing to me that so much of this is essentially a function of good back-end engineering."

"Premature complexity is where frankly most projects go to die."

Idei aplicabile

Pentru Echo / OpenClaw

Permission audit trail - momentan am trust implicit, ar trebui categorii (read-only, mutating, destructive)
Workflow state tracking - distinge între "conversation history" și "task checkpoints" (approved_tasks.md e un început)
Token budget per session - hard stop înainte să depășească (protecție pentru Marius)
System event logging - audit trail pentru decizii Echo (ce job a rulat, ce a aprobat, ce a skip-uit)
Tool pool assembly - dynamic filtering tools bazat pe context (dimineața vs seara, work vs self)

Pentru Ralph / Agenți autonom

Session persistence - Ralph ar trebui să supraviețuiască crash-uri (save state la fiecare story completată)
Verification 2-level - (a) verifică codul generat, (b) testează că modificările la workflow nu sparg procesul
Agent type constraints - Ralph Planner NU execută cod, Ralph Executor NU face design decisions

Skill lansat de autor

agentic_harnesses skill cu 2 moduri:
- Design mode: descrii produsul → recomandă architecture cu rationale
- Evaluation mode: point la codebase → identifică ce lipsește vs Claude Code primitives
Disponibil pentru Claude Code (skill package) și OpenAI Codex
Bias către simplicity: single-agent unless good reason, lean architecture, push back on overengineering

Sursă completă (transcript)

Title: I Broke Down Anthropic's $2.5 Billion Leak. Your Agent Is Missing 12 Critical Pieces. Duration: 26:52

=== I Broke Down Anthropic's $2.5 Billion Leak. Your Agent Is Missing 12 Critical Pieces. ===

It does not get bigger than this. Anthropic accidentally leaked Claude Code. This is a two and a half billion dollar run rate product. This is the secret sauce that describes how Clawed Code works. And what's inside is actually not what most people are focused on. I have read the blogs, the breathless coverage, the hype, and what I see is a ton of focus on, you know, the feature flags that are not toggled on, what Cloud Code is going to release in the next few weeks. That's fine. That's going to last for a few weeks. I wanted to see what is the underlying architecture for cloud code that sustains this $2 and half billion dollar business and specifically what are the baseline infrastructural insights for running agents successfully for businesses that we can learn from and port over how can we take some of claude code secret sauce now that it's leaked now that it's available to everyone and we can say okay what can we learn how can we build this effectively and so I'm going to talk about the things I learned reviewing the repo in detail I am also releasing a special skill to help you to assess your own agentic framework and agentic harness and to recommend changes to your agentic harness based on what we've learned from claude code. I think that's one of the most solid useful actionable takeaways we can get from this. It's not really about the hypiness of we get to take a peek at the next two weeks of Anthropics roadmap. They ship fast. Pretty soon we're going to be right through that window and they're going to be on to shipping other stuff. It's really about how we can learn what Anthropic is building under the surface that sustains an agentic production system that's successful. That's what matters. If you're wondering, is this just all about Claude code? The answer is no. What we can learn from claude code applies to us all. So, I'm also releasing a skill that's tuned to codec specifically that ties in lessons we can learn from cloud code so we can start to cross-pollinate and really understand how agents drive and sustain work over time regardless of the LLM of choice. But before we get there, I want to call a spade a spade. This is the second significant leak from Anthropic in the last few days and it's worth asking ourselves why. Earlier this week, Fortune reported that Enthropic left draft blog materials describing its new model, Claude Mythos, on a server that was open to the public. Now, 5 days later, a build configuration error has ended up leaking Claude code. Now, I want to be clear, these are entirely different mechanisms. They're different systems, but it did all happen to the same company in the same week. I'm not trying to say that Mythos or any other AI model caused this leak. Anthropic says it was human error, and there's certainly no public evidence to the contrary. But the pattern raises a question for me that every team building with AI assisted development should be asking. Is your development velocity outrunning your operational discipline? In light of that, it's telling to me that the developer community's default conjecture for how cloud code got leaked involves an AI model making an error. The theory circulating on X flagged explicitly as conjecture by Alex Vulkov is that someone inside anthropic got switched to adaptive reasoning mode on accident and their clog code session fell back to sonnet and the model committed the map file which was the leak as part of a routine build step. Now, we don't know for sure that that's what happened, but the fact that the AI committed the build artifact that leaked the AI's own code is a part of the discourse is something we can reasonably guess is a chain of events that looks really reasonable from the outside, that tells you everything you need to know about where we are with velocity and build security in 2026. When the AI writes 90% of your code, as Enthropic says it does, and your engineers are shipping multiple releases per engineer per day, maybe up to five, the surface area for configuration drift is really high. And whatever the exact chain of events, it's clear to me that that velocity had some consequences this week. There were two significant leaks for Anthropic, and I'll be curious to see how the team tightens up and goes back to operational discipline. Something tells me they will find a way to do that without significantly adjusting their shipping cadence. the velocity is here to stay and the operational cadence is going to catch up. Ironically, some of the stuff that we're going to talk about when we talk about primitives and AI is exactly the kind of boring stuff that I suspect Anthropic will lean on to tighten up and prevent this happening again. It's stuff like build pipeline configuration, the publish step validation, stuff that you need to do to make sure that you're not accidentally leaking things. And I suspect they're going to revisit those boring basic primitives along the way as they continue to clean up after this mess. Now, what can we learn about the incredible plumbing that makes Claude code possible? You might think, incredible plumbing, Nate, what are you talking about? Well, it's actually true. This is the secret sauce. We've never had a peak behind the curtain at Claude Code. I've done the work to organize this in terms of 12 specific primitives that are organized into multiple tiers. So, I've gone through, this is not an order of presentation. This is not an order in the codebase. This is an order that is rational and makes sense for us to understand how anthropic is building. So I did that codebase analysis for you. And so these are presented in the order you should think about them when building your own agentic system, which I think makes sense. So there are 12 categories, three tiers. And for each primitive, I'm going to name three things, right? The universal pattern, the design principle that applies to any agentic system. Claude codes manifestation, right? One specific production grade implementation of that pattern. and then how this might look in your system beyond that. Right? So, let's jump in. If you're building an agent from scratch, what can claude code tell you about how to do it right? What are some initial non-negotiables? Number one, think about tool registry with metadata first design. This is not super new if you've been following Claude code closely, but boy, do we get a ton of detail about it from the Cloud Code leak. The pattern is really clear. You define your agents capabilities as a data structure before writing any implementation code. The registry should answer what exists and what does it do without executing anything. So how does this look inside cloud code? What I found is that claude code maintains two parallel registries. A command registry with 207 entries for userfacing actions and a parallel tool registry with 184 entries for model facing capabilities. Every single entry there it's like a dictionary. It carries a name, it carries a source hint and it carries a responsibility description. The registries are a source of truth and the implementations from there load on demand. And so this separation is not dependent on the model to infer. This is structural. Now why does this matter? If you don't have a clean tool registry, you can't filter tools by context. You can't introspect your system without triggering side effects. And every new tool is going to require changes in your orchestration code. The registry is a foundation everything else builds on. And so if you're thinking about how to apply this to you, like think about a list tools function that returns metadata for all registered capabilities without necessarily having to invoke them, right? You should be able to support runtime filtering. You should be able to define each tool clearly by a name and a short description. And you should be able to write that function before any kind of ask to the model to think about executing or picking a tool. So that's tool registry. Super basic. Another day one thing I would look at if I were building an agent based on the clog code leak is the permission system. Not all tools carry the same risk. Categorize that risk and apply different approval tiers per category. And so what I found in the code is that Claude segments its capabilities into three different trust tiers. There's built-in always available highest trust tier tools. There are plug-in tools which are medium trust and they can be disabled on command and then there are skills which are userdefined and are lowest trust by default. Yes, a skill is considered a tool in this scheme. Every tier has different loading behavior, different permission requirements and different failure handling. And the shell execution tool alone, which is called bash tool, has an 18 module security architecture. That's not a typo, right? That's 18 separate modules from pre-approved command patterns to destructive command warnings to get specific safety checks to sandbox termination. They're really careful with it because bash tool as a shell execution script could go very wrong very fast. And this is all relevant because this gets at exactly the security concerns we've seen dominating the conversation since the mythos leaks. It's open claw. If your agent bottom line can take actions in the world, if your agent can execute code, if it can call APIs, if it can send messages, if it can modify files and you don't have a permissions layer, you have just a demo, right? You don't have a product. You don't have anything that you can like execute on safely. And so when you think about an 18 module security stack for a single tool, I don't think Anthropic is being paranoid. I think it's what separates a system that works safely at two and a half billion dollar run rate from one that works in a little notebook. And so what does this imply for how you should think about security and permissions? Well, first think about pre-classification. Is this action readon? Is it mutating? Is it potentially a destructive action? Do you have pre-approved patterns that are known safe? Do you have destruction detection where you can flag actions that might delete or overwrite ahead of time? Do you have domain specific safety? Are you looking at targeted checks for specific risk factors you're worried about? Do you have permission logging? Do you record every decision granted or denied with enough context to replay that decision? These are the things you need to be thinking about that are already in the cloud code leak. Number three, not glamorous, but super important. Session persistence that survives crashes. So the pattern we see is really simple. Your agent session is not just the conversation history. It's a recoverable state that includes conversation. It includes usage metrics. It includes permission decisions and it includes configuration. It's the whole ball of wax. If any of those are missing when you resume the session isn't going to work the same as the original. And what I discovered when I got into the code is that clawed code persists those sessions in that way like in in its entirety in the form of JSON files and it captures session ID. It captures messages. It captures the token usage in and out and the query engine essentially can be fully reconstructed from that stored session. You can reinstantiate an entire session after a crash with load, a reconstruct transcript, restore counters, and you can return essentially a fully functional Agentic engine from a crash. Why would you care about this? Well, agents crash. They crash all the time. Connections drop, users close tabs. If your agent can't reliably resume where it left off, including what tools were available, what permissions were granted, how many tokens were consumed, then every single interruption is a restart. And every restart ends up being a degraded experience for the customer. And so you should build your version of this, right? You should look at a session state structure that captures everything needed to resume. You should look at how you can persist after a significant event, not just at shutdown. And you should be able to build a resume session function that reconstructs full agentic state, not just conversation history. Number four, workflow state. This is a really big deal, but it's not getting talked about at all. The pattern is simple. When you resume a conversation, it is not the same thing as resuming a workflow. So a chat transcript answers, What have we said? A workflow state answers, What step are we in? What side effects have happened as a result of that workflow? Is this operation safe to retry? and what should happen after we restart. And this is very tightly connected to session persistence, but it is a different thing. Almost every agentic framework conflates conversation state with task state. And they're different problems with different solutions. And so if you don't have a workflow state, you can reinstantiate the agent to be exactly where it was, but it won't remember where it was in the workflow automatically because the workflow is something that persists beyond the agent. Your agent will not survive a crash mid tool execution without potentially duplicating a write or double sending a message or rerunning which is a very expensive operation and potentially very destructive. You need to have a clear way to retry a workflow and you need to know where you were when the agent crashed. So you should be modeling longunning work as very explicit states. Planned awaiting approval is an example of a state executing as an example of a state. Waiting on an external party is an example of a state. You want to persist those checkpoints all the time. It's like when we were in the 1990s and we saved our game every two seconds because we didn't want to lose the game if the computer crashed. Same idea. Be paranoid. Save your workflow state. Number five, where are you at with your token budget? What I discovered is that Claude Code's query engine configuration defines very hard limits on token usage. It has a maximum number of turns in a conversation. It has a maximum budget for tokens in a conversation. and has a compaction threshold where it will autocompact the conversation. Every turn will calculate projected token usage. If the projection exceeds the budget, the execution is just going to stop with a structured stop reason before an API call is made. And this is confirmed from how we actually use Claude in the wild. This is critical because without budget tracking, you're going to discover you've exceeded your token limits, right? This is just very common sense. You're going to have a runaway loop and spend money you didn't intend to spend. And so Claude is actually putting in checks that are not beneficial to Enthropic, they're beneficial to the long-term health of the customer here because in the short-term interest of Anthropic, you'd love it to burn tokens and spend money with Anthropic. Anthropic being a really responsible citizen here and saying, "We don't want you to have runaway budget spending that you do not clearly intend. It's the same way that Amazon enables returns, which may not be good for Amazon in the short term, but increase customer trust in the long term." Same deal. They are increasing customer trust by making it easy to track tokens. You should also be building if you are building agents token budgeting. You should have input tokens, output tokens, budgets, hard stops. It's a non-negotiable. It's just responsible building in 2026. Number six is a really big one. It's something that is more unique to Claude and is a real trust builder over time. Claude is invested in structured streaming events. This is what we mean when we talk about stream of thought from Claude, right? So the pattern is pretty clear and we see it on the front end all the time. Streaming isn't just about showing text. Every streaming event that you see while the model's running is an opportunity for you to find out what is going on with that model, right? It has to communicate a system state to you. So it needs to be talking about what tools the agent is thinking about using, how many tokens have been consumed, whether the agent is wrapping up. I don't know about you, but I use that streaming state all the time with Claude because it tells me where the model is going and I will sometimes intervene in the model's train of thought and type a message because I have seen what it's thinking about and I know it's going off track. So, it's extremely useful, but it's not automatic. You have to design for it. And so what I discovered is Claude codes query engine emits typed events that are used in the stream like message start command match tool match basically all kinds of typed events that it can call upon as it's constructing this query stream. And what's critical is if there's a crash if there's an issue. Do you see how many times I've talked about crashes and issues? Good engineering assumes a failure path and plans for it. In this case, the clawed team has assumed that sometimes the agent will crash and includes a special typed event with a reason for the crash as the last message that the stream sends if there's an issue. It's like a little black box from a crash. So, when you're thinking about how you send messages back, don't just assume that you can send raw chain of thought or whatever. Take the time to send reasonable streamed events that communicate real information to the user and that allow the user to understand what is going on and make sure you plan for crashes. Okay, number seven is related. It's system event logging. If number six was about streaming events and kind of what the model is thinking, maybe some refined chain of thought there. System event logging is when something goes wrong again the failure cases think about the engineering into failure cases. This is one of the meta. When something goes wrong, the conversational transcript needs to tell the user what the agent did, not just what it said. And so, separate from the conversation, separate from streaming events, claude code maintains a history log of system events. It is a source of truth. What context it loaded? What was its registry initialization like? What routing decisions did Claude make? What execution counts did Claude have? What permission denials or approvals did Claude experience? Every single event has a category and is presented with structured details so you can easily reconstruct an agentic run. This is what you do when you are building a system that you intend for enterprise. When you are building something you intend to run seriously, this is how you prepare. And so if you're trying to build a serious agent, you need to think about event logs. You need to think about how the system maintains a record of not just what was said, but what was done and how you can provably walk that back. Number eight, Clawed Code takes verification seriously and this happens in two levels and I want to talk about each of them because we only see one normally. The one you see is really obvious. I think it's quick to talk about. You see Claude having a separate step to check its work when you go through the stream of events. That's expected and that's something that Claude code explicitly provides for. It's part of the harness. Verify that the work done was correct. Yay. Good job. A+. But we're not done yet. Part two, which I think is really critical that Claude code also thinks about in the leak is you need to recognize that you also need to be able to verify changes that you the human make to the agentic harness. So it's not just did a given agent run complete successfully. That's important and that's good. It is also when I make a change to the harness and I change every subsequent agent run, am I doing so with confidence that I'm not breaking something? And that is where you have special verification tests that should test whether the model still works against common guardrails. So things like do we have destructive tools always requiring approval after we make this change because that's a reasonable guardrail. Or when tokens run out, what happens to the model? Does it gracefully stop as we would expect or is there some sort of hard crash? These are things that you would want to have as guardrails on any agentic experience. You should name them. You should log them. And this is that second level of verification that's in a harness that we don't think about a lot because you have to provide for the harness evolving. Okay, those were day one basics. Those are things that I often see teams consider late or never when they're putting a harnesses together. Now, we're going to move toward operational maturity and think about larger and deeper lessons we can learn. I'm going to give you four of them and if you want more, I've written them up on the substack. Number one here is tool pool assemblies. Say that five times fast. What we're talking about is the idea that if you have 184 tools, Claude isn't going to assemble all of those tools into a usable pool on every agent run. Instead, it is going to assemble a sessionspecific tool pool, a group of tools that will be used to get the run done based on mode flags, based on permission context, based on deny lists, etc. What you need to learn as a designer is that you need to think about the idea that a generalpurpose agent may need to assemble a short list of tools dynamically when preparing for a run. And that's something that we typically see hard-coded in a lot of enterprise workflows where they say these are the tools available. What Claude is suggesting is that if you have a more general purpose problem solving agent, you may want to give it a wider tool subset that it can read efficiently and then let it pick from that tool list what it wants for a given row. Number two has been talked about a lot. Transcript compaction. I want to get into it a little bit more. So conversation history is obviously a token expensive resource and claude code automatically manages that by compacting it after a configurable number of turns and it keeps recent entries when it compacts and it tends to discard the older ones. The transcript store tracks whether it's been persisted to avoid data loss. You want to think about how you build automatic compaction for longer running agents. what your threshold is, what you're compacting, what you're keeping, how you know if what you're keeping is correct. This is a really hot commodity as we think about longer running agents because you have to think about how you initially keep the instruction that gets the agent started, but also how you cut intervening conversational turnpoints or intervening actions that are not relevant to the agent's present state in a way that allows the agent to save significant space. So, compaction is one. We've already talked about it. It's great to see how cloud code does it behind the scenes. There's going to be a lot more effort going into this for everybody in the next few months. Number three, this is a little bit more advanced, but think about your permission audit trail. I talk about permissions as something you need to be ready to sort of talk about and audit all the time, but Claude Code actually makes this easy because they don't make permissions just a boolean gate that is yes or no. Instead, they make permissions state a first class object that is easy to query. And Claude actually builds three separate permission handlers to serve different contexts, right? It has an interactive handler for a human in the loop. It has a coordinator when you have a multi- aent orchestration and the orchestrator agent needs to hand out permissions. And it also has a swarm worker level where you have autonomous execution that's being managed by an orchestrator agent. So that's three different types of agent that all need different permission structures and claude code thinks about all of them in the permission architecture. So should you. And last but not least, claude code has an agent type system that as far as I know was not leaked before now. So, Claude Code defines six built-in agent types. Explore, plan, verify, guide, general purpose, and status line setup. Each of these agent types comes with its own prompt, its own allowed tools, its own behavioral constraints. Like an explore agent, by definition, cannot edit files. A plan agent doesn't execute code. The transferable lesson here is not just spawn agents randomly like you're cloning minions. It's actually to constrain roles really sharply when you split work out and constrain them in a number of observable types so that you can manage those types to control your overall agent population and to manage the efficiency of the work they're able to produce. This is a great way to think about larger multi- aent systems. Okay, that was a lot. Let's hop briefly into what I'm releasing and what I built. I'm releasing an agenta harnesses skill that helps us to operationalize some of this for our own agents that we're running. And yes, if you're wondering if this works on like your open claw agent, if it's something you can use for any agentic setup, you can absolutely run it and it will give you some good tips. So what does this thing do? It has two modes. Design mode is going to enable you to describe the product that you're building like a chat assistant or a workflow orchestrator or a code agent, whatever agent you want to build. And the skill is going to walk you through a structured design process and it's going to recommend a harness shape. It's going to identify the minimum useful set of primitives. It's going to sequence the implementation into phases and it's going to define verification criteria and all of that is going to happen before you write a line of code for the harness. It doesn't just generate boiler plate here, right? It generates an architecture with rationale that is deeply rooted in what we can learn for how the most successful agent in production today runs. It also has a second mode, evaluation mode. If you already have an existing harness, you can point it at your codebase. You can point it at your cloud.mmarkdown, your architecture documents, and it's going to tell you what's missing, what's not there, what could you learn from that you might not know about with this cloud code release. So, it will evaluate every dimension across the codebase in light of the principles I've identified in this video here, architecture, safety, and permissions, state and durability, etc. and it's going to return findings ordered by severity, by a prioritized upgrade path, and specific tests that confirm the fixes work. Now, why is this a skill and not just a document? I'll put it pretty simply. A skill allows you to do something dynamically with the AI. And that's what this is all about. It's about actually implementing and fixing the agentic setups we have today and designing better agentic setups based on what claude code can teach us. I think that's a much more sustainable path to utility to usefulness based on this leak than just trying to hype up the drama. And yes, I built the skill for both claude code as a clouded skill package and also for OpenAI's codecs with codec specific metadata path patterns and agent routing. And the core logic is identical, right? The way it assesses primitives, the way it assesses evaluation dimensions, the way it assesses the design playbook. The reason for that is that I think these primitives scale pretty well. And this was a very deliberate choice on my part to make sure that we are all thinking about the primitives of agentic development as things we can learn from together and use to build more solid systems regardless of our LLM of choice. And I'll be honest here, the skill is opinionated. It biases toward a lean solo maintainable architecture unless you have a good reason not to. It starts with single agent design unless you give it really good reasons that push for a multi-agent design. It biases towards simplicity because simplicity is maintainable. And this is very intentional because the most common failure mode that I've seen in agentic systems, it's not underengineering. It's actually overengineering. Building a really complicated multi-AN coordination layer before you have a working permission system, right? Or implementing a plug-in marketplace before your sessions can survive crashes. So the skill is going to push back on unnecessary complexity because premature complexity is where frankly most projects go to die. If we step back and look at the claw code leak after the dust settles, what is the takeaway here? The larger takeaway is that building agents is 80% non-glamorous plumbing work and 20% AI. So much of what I spent time talking about in this video is stuff that most people will roll their eyes at, but it is the exact boring stuff that makes an agent successful at a multi-billion dollar level. This is what makes it possible to serve to millions and millions of people. You have to think about failure cases. You have to think about security. You have to think about how the agent recovers from from crashes. You have to think about how you have typed events that enable an agent to choose from a limited schema and come back with useful information across a range of scenarios. This is the architecture of scale. And it's amazing to me that so much of this is essentially a function of good back-end engineering. And in the sense that hasn't evolved. We're just applying good back-end engineering to these agentic pipelines and discovering that hey that works pretty well. So the plumbing isn't very glamorous maybe, but I'm talking about it because I believe it's the whole game and I'm launching a skill to make it easy because I don't want it to be hard for people to figure this out. You should not have to go through the cloud code leak yourself to infer these principles. We should be able to just pull them out and have a conversation about them as a community. So, you tell me in the comments, what did you learn from Claude Code that I didn't mention? And what would you like to see included in future agent work that you think we can all pull from Claude Code and make our agents better? I'd love to hear. Cheers.

34 KiB Raw Blame History