Shivanath Devinarayanan

The Payback Review AI Teams Need Before They Scale

Shivanath Devinarayanan — Fri, 19 Jun 2026 11:30:42 GMT

The public AI market argument keeps circling one question: is this a bubble?

That question is useful for headlines. It is less useful inside an operating review.

A team deciding whether to scale an AI workload needs a different discipline. It needs to know whether the work earns its compute.

![The Payback Review AI Teams Need Before They Scale](images/substack_cover.png)

That distinction matters because the demand evidence is no longer thin. NVIDIA reported fiscal 2026 revenue of $215.9 billion, with Data Center revenue of $193.7 billion. OpenAI has described annual recurring revenue moving from $2 billion in 2023 to $6 billion in 2024 to more than $20 billion in 2025. Anthropic has reported run-rate revenue above $30 billion and fast growth in million-dollar annualized business accounts.

Those are not small signals.

They still do not answer the operating question.

The operating question is whether the value event sits close enough to the inference bill.

What The Public Thesis Leaves Out

The LinkedIn version of this argument is intentionally simple: demand can be real while payback stays uneven.

Subscribers need the next layer.

When an AI system moves from demo to production, the cost path changes. A prompt becomes a workflow. A workflow becomes retries, tool calls, file reads, long-context passes, permission checks, validation steps, human review, and monitoring. The marginal unit is not a token. It is a completed business action.

That is where many AI business cases get too soft.

They price the model call and forget the loop.

They count the answer and ignore the verification.

They cite adoption and skip the margin map.

The Payback Review

Before scaling an AI workload, I would run a five-part review.

![The Payback Review](images/substack_info_01_payback_review_board.png)

1. Map The Workload

Do not start with the model. Start with the work.

Name the task, the user, the trigger, the input, the decision, the output, the handoff, and the failure mode.

A coding agent that edits a test suite, opens a pull request, and saves two engineering days is not the same economic object as a chatbot answering shallow internal questions from stale retrieval. Both may use tokens. Only one may create enough measurable value to justify premium inference.

2. Price The Inference Path

The first estimate should include more than the visible answer.

Count prompts, tool calls, retries, long-context steps, file reads, retrieval passes, evaluations, moderation, logging, and human review. Also count the work that happens when the system fails: escalations, rework, support tickets, or manual cleanup.

This is where model routing becomes an operating control.

Recent Last30Days research surfaced routing as a practical response to AI overspending. That signal is important. Mature teams stop treating intelligence as one premium tier. They decide which work deserves frontier reasoning and which work can move through a cheaper model, a cached answer, a deterministic rule, or no AI at all.

3. Locate The Value Event

Every serious AI workload needs a named value event.

![Usage is not payback](images/substack_info_02_value_event_map.png)

The value event might be a resolved support case, a contract risk caught before signature, an engineer day saved, a sales handoff completed, a claim processed, or a compliance exception routed before it becomes expensive.

If the team cannot name the value event, it should not scale the workload yet.

Usage is not enough.

Engagement is not enough.

Answers are not enough.

The value event is the moment where the business can say: this work paid for the compute it consumed.

4. Identify Who Captures Margin

This is the uncomfortable step.

The user may get value. The application owner may get adoption. The cloud provider may get revenue. The chip supplier may capture scarce-supply margin. The model lab may carry heavy compute commitments. The enterprise may pay more than it saves.

All of those can be true in the same ecosystem.

That is why the broad bubble debate hides too much. The buildout can create durable economic value while the return pool concentrates in fewer places than the story suggests.

GMO's framing is useful here: a technology can be transformative and still produce poor returns for many investors. Railroads, telecom fiber, cloud capacity, and internet infrastructure all carry versions of that lesson.

5. Choose The Operating Response

The review should end with a decision, not a mood.

![Scale Decision Matrix](images/substack_info_03_scale_decision_matrix.png)

Use four decisions:

Scale: the workload has paid usage, measured value, and acceptable inference discipline.
Route: the workload is valuable, but too much of it is running on expensive reasoning.
Redesign: the workflow creates value, but the operating path has too many retries, handoffs, or verification loops.
Stop: the workload consumes compute without a credible value event.

The best teams will not be the ones with the most AI stories. They will be the ones that make these calls early.

A Concrete Scenario

Imagine a legal review agent.

In the weak version, it summarizes contracts and produces plausible notes. Lawyers still reread everything, escalations are unclear, and the business cannot tell whether the agent reduced cycle time or risk.

That workload may have usage, but it has not proved payback.

In the stronger version, the system is scoped to a small set of clauses. It routes simple extraction to cheaper processing, reserves premium reasoning for ambiguous risk, records confidence, escalates exceptions, and measures avoided rework or faster approval.

The same category now has a payback path.

The difference is not belief in AI.

The difference is operating design.

What To Inspect Next

For any AI workload already moving toward production, inspect these five things this week:

The exact business action the system completes.
The full inference path behind that action.
The value event that proves the work was worth doing.
The routing policy that keeps premium reasoning away from cheap work.
The margin map showing who gets paid when usage rises.

That review will not settle the AI bubble debate.

It will do something more useful.

It will tell you whether your AI buildout is becoming an operating model or just a larger bill.

Sources

NVIDIA FY2026 financial results: https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-fourth-quarter-and-fiscal-2026
OpenAI CFO Sarah Friar on revenue, compute, and practical adoption: https://openai.com/index/a-business-that-scales-with-the-value-of-intelligence/
Anthropic Google/Broadcom compute partnership and run-rate revenue disclosure: https://www.anthropic.com/news/google-broadcom-partnership-compute
Futurum 2026 AI capex analysis: https://futurumgroup.com/insights/ai-capex-2026-the-690b-infrastructure-sprint/
Axios AI data-center financing coverage: https://www.axios.com/2026/06/10/meta-amazon-oracle-data-centers
GMO AI valuation and capital-bubble analysis: https://www.gmo.com/americas/research-library/valuing-ai-extreme-bubble-new-golden-era-or-both_viewpoints/
Last30Days raw conversation grounding: source_room/last30days_raw/ai-bubble-buildout-payback-inference-economics-raw-v3.md

The Harness Control Review

Shivanath Devinarayanan — Tue, 16 Jun 2026 13:10:56 GMT

The public Edition 20 argument was simple: the OpenAI and Anthropic IPO race is also a harness ownership test.

For subscribers, the more useful question starts one level lower.

Before your company expands another enterprise AI contract, can you prove which controls still belong to you?

That proof should not live in a strategy deck. It should live in a review ritual that a CIO, CISO, product leader, and operating executive can run together.

Call it the Harness Control Review.

The Problem

Most AI buying conversations begin with capability.

Which model is better at coding? Which model handles long context? Which product has the strongest enterprise controls? Which vendor can send people into the business to make the deployment work?

Those questions are useful, but they arrive too early.

The first question is ownership.

If the vendor owns context assembly, routing, evaluations, memory, permissions, review, and rollback, the company may still have a legal contract, a security review, and an admin console. It may not have operating control.

That difference becomes more important as model prices fall.

Cheap tokens do not remove dependency. They move dependency toward the layer that decides how tokens enter work.

Why The Deployment Race Matters

OpenAI's Deployment Company announcement is a useful signal because it says the quiet part out loud. OpenAI is not only selling model access. It is building an organization that embeds Forward Deployed Engineers inside customer environments, redesigns workflows around AI, and connects OpenAI models to customer data, tools, controls, and business processes.

Anthropic is moving through a partner-scale model. Its DXC alliance says DXC will train tens of thousands of Claude-certified Forward Deployed Engineers embedded inside customer organizations.

Those are different go-to-market shapes, but they solve the same problem.

The lab needs customer context.

The customer needs working systems.

Forward deployed engineering becomes the bridge.

That bridge can be good. Many companies need help. The mistake is letting the bridge become the only road back into your own operating system.

The Review Ritual

Run the Harness Control Review before signing, expanding, or renewing a major AI deployment.

The review has seven lanes.

1. Context

List the context sources the agent can use: documents, CRM fields, tickets, code, messages, spreadsheets, product telemetry, contracts, and policy material.

Then ask who controls assembly.

Can your team decide what enters context, what is excluded, what is redacted, and what expires? Can you reconstruct what the model saw after a decision? Can you test a different context recipe without asking the vendor to redesign the product?

If the answer is no, the model does not merely read your context. The vendor controls the shape of your context.

2. Evaluations

Every agent deployment needs tests that represent the work.

Not benchmark tests.

Your tests.

Renewal risk summaries. Support escalations. Contract clause extraction. Lead routing. Code review. Financial variance explanations. Safety checks.

The review question is direct: who owns the pass/fail definition?

If the vendor owns the evaluation harness, your team may get impressive demo scores and still lack proof that the system behaves inside your business.

3. Permissions

Claude Enterprise and similar products correctly emphasize governance, data controls, identity, audit, and admin infrastructure. Those controls matter. They are the entry ticket for enterprise use.

The deeper review asks whether permissions are tied to actual work.

Can the agent read as one role and act as another? Which actions require approval? Which tools are read-only? Which writes can be rolled back? Which systems are off-limits no matter what the model recommends?

The dangerous deployment is not the one with no permission model.

Security teams usually catch that.

The dangerous deployment is the one where permissions exist, but no operating leader knows how they map to the work.

4. Routing

Routing is the economic control surface.

The last 30 days of operator conversation kept returning to cost pressure, model routing, caching, and token waste. That signal should change the buying conversation.

Your company should know which tasks require the frontier model, which tasks can use a smaller model, which tasks should hit cache, and which tasks should stop for review.

If every request goes through the vendor's preferred path, you are not optimizing. You are trusting.

The router decides cost.

The router also decides dependency.

5. Memory And Cache

Memory is where the harness starts to feel like infrastructure.

A company should know what the system remembers, why it remembers it, how long it keeps it, how it refreshes stale memory, and how a human can correct it.

Caching has a similar operating question. Are repeat calls cheaper because your company owns stable, verified context? Or are repeat calls cheaper only inside the vendor's product surface?

The answer changes the economics.

It also changes portability.

6. Review And Rollback

Anthropic's containment post is useful because it frames risk as both likelihood and damage. As agents gain access, the possible blast radius grows.

That should force a review lane.

Can a human inspect the evidence before an action completes? Can the system explain which sources, memories, tools, and evaluations affected the action? Can you reverse the action? Can you pause the agent when the failure mode is business ambiguity rather than model error?

If rollback is a support ticket, your harness is weaker than your ambition.

7. Supplier Swap

This is the final test.

Pick one high-value workflow. Imagine replacing the model tomorrow.

What breaks?

If the workflow breaks because the model has different syntax, that is manageable. If it breaks because context, evals, memory, routing, permissions, and review all live inside one vendor's surface, the company does not own the harness.

It rents it.

A Concrete Scenario

Take a renewal-risk workflow.

The agent reads CRM history, support tickets, contract language, product usage, and account notes. It drafts a risk summary, recommends next action, routes the account to a manager, and writes an update back to the CRM.

The Harness Control Review asks:

Which fields can the agent read?
Which tickets are excluded?
Who wrote the evaluation cases?
Which model handles the summary?
Which model handles the routing recommendation?
Is the prior account history cached?
Can the sales leader inspect why the agent flagged the account?
Can the CRM write be reversed?
Could the company move this workflow from one model provider to another?

That review is slower than a demo.

It is also where the real strategy appears.

What To Inspect Next

When the OpenAI and Anthropic S-1s become public, read them for the normal things: revenue, losses, compute commitments, risk factors, customer concentration, and gross margin.

Then read them for the hidden split.

How much value comes from model access?

How much comes from deployment labor?

How much comes from enterprise product surfaces?

How much depends on customers rebuilding work around the lab's harness?

Inside your own company, run the same inspection before the vendor renewal.

If you own context, evals, permissions, routing, memory, review, rollback, and supplier swap paths, you can rent powerful models without surrendering the work layer.

If you do not, the contract may still look like software.

The operating reality is different.

You rented the factory floor.

Sources

OpenAI Deployment Company: https://openai.com/index/openai-launches-the-deployment-company/
OpenAI API pricing: https://openai.com/api/pricing/
Claude Opus pricing: https://www.anthropic.com/claude/opus
Anthropic and DXC alliance: https://www.anthropic.com/news/dxc-anthropic-alliance
Claude Enterprise: https://www.anthropic.com/product/enterprise
Anthropic containment engineering: https://www.anthropic.com/engineering/how-we-contain-claude
Anthropic Economic Index: https://www.anthropic.com/research/economic-index-march-2026-report
Anthropic IPO reporting: https://www.theguardian.com/technology/2026/jun/01/anthropic-ai-ipo
OpenAI IPO reporting: https://www.theverge.com/ai-artificial-intelligence/946335/openai-ipo-s-1-confidential
Thanks for reading! Subscribe for free to receive new posts and support my work.

MuleSoft as the Anti-Corruption Layer: DataWeave, Canonical Models, and Multi-Source Mediation

Shivanath Devinarayanan — Sun, 14 Jun 2026 18:02:15 GMT

Newsletter note: This is part of a five-part MuleSoft operator series focused on practical integration, governance, and Agent Fabric readiness.

Shivanath Devinarayanan, Chief Digital Labor and Technology Officer at Asymbl

The connector is not the product.

The mapping is the mapping alone is too small.

The product is the business contract that survives backend drift.

MuleSoft earns its keep when one stable business contract can survive many unstable backends through DataWeave, bounded canonical models, tenant-aware configuration, and source adapters.

https://learn.microsoft.com/en-us/azure/architecture/patterns/anti-corruption-layer

https://docs.mulesoft.com/dataweave/latest/

https://docs.mulesoft.com/mule-sdk/latest/static-dynamic-configs

Anti-Corruption Is A Boundary

The anti-corruption layer pattern protects one model from another model. In MuleSoft terms, that means a consumer should not inherit every backend identifier, status code, null convention, relationship shape, or protocol compromise.

The API contract should express the business capability, not the backend mess.

DataWeave Is Contract Logic

DataWeave is not filler between connectors. It is where parsing, transformation, defaults, field precedence, and shape decisions become executable.

Treat that logic like product code. Test it. Version it. Review it when a source system changes. Give it an owner.

Bound The Canonical Model

A canonical model should not become a universal enterprise theology. Keep it bounded by durable capability: customer, candidate, order, invoice, worker, policy, case.

MuleSoft CIM is useful proof that canonical models can reduce friction across Process and System APIs. The operator still has to decide where the boundary stops.

Tenant And Credential Resolution

Multi-source mediation becomes risky when tenant and credential logic is casual. Dynamic configuration patterns exist for multi-tenancy, dynamic endpoints, and dynamic settings. Use them deliberately.

Do not trust a tenant header just because it is convenient. Derive tenant context from authenticated client identity when the risk level requires it.

What The Docs Do Not Say

DataGraph can compose queries across an application network, but it does not decide business meaning. It will not choose identity precedence, resolve tenant boundaries, or design compensation for conflicting backends.

The mediation boundary still needs human judgment. That is where the integration team becomes a product team.

Operator Notes

A canonical contract starts with language, not code. Ask the team what a customer, candidate, order, worker, or invoice means. If two systems disagree, write the disagreement down before writing the mapping. DataWeave can express the decision, but it cannot make the decision for you.

Field precedence is usually where the real argument lives. Which source wins for display name? Which status is authoritative? Which timestamp tells freshness? Which identifier can cross tenant boundaries? These are product decisions disguised as transformation logic.

The tenant path should be explicit. A canonical API that serves multiple tenants needs to resolve tenant context, credentials, source routing, and audit labels before the backend call. If the tenant is inferred from an untrusted header, the integration layer can become the place where isolation breaks.

Source adapters keep the system from turning into a pile of conditionals. Each backend should have a narrow adapter contract. Add a source by adding a mapping and route, not by rewriting the consumer contract. That is the anti-corruption layer in practice: backend drift stays behind the boundary.

Testing needs to match the risk. Unit tests should cover mappings, default behavior, missing fields, enum translation, and source precedence. Integration tests should prove tenant resolution and credential scope. Contract tests should protect the consumer shape. A mapping that works for the happy path can still corrupt the business object when a backend sends an unexpected null or status.

This is where DataGraph fits carefully. It can help consumers query across an application network, but it does not replace mediation. Someone still has to decide what the fields mean, which system wins, and how the response should behave when one source is stale or unavailable.

The Working Artifact

What To Inspect Next

Next week: once the contract is clean, we expose selected operations as governed tools through Agent Fabric and MCP Bridge.

Question for the comments: Where does your canonical model fail first: identity, status codes, ownership, tenant routing, or source freshness?

Sources

1. https://learn.microsoft.com/en-us/azure/architecture/patterns/anti-corruption-layer

2. https://docs.mulesoft.com/dataweave/latest/

3. https://docs.mulesoft.com/mule-sdk/latest/static-dynamic-configs

4. https://docs.mulesoft.com/accelerators-cim/latest/

5. https://docs.mulesoft.com/datagraph/

Headless MuleSoft: Build and Deploy Without Studio

Shivanath Devinarayanan — Sat, 13 Jun 2026 18:24:51 GMT

Newsletter note: This is part of a five-part MuleSoft operator series focused on practical integration, governance, and Agent Fabric readiness.

Shivanath Devinarayanan, Chief Digital Labor and Technology Officer at Asymbl

The fastest way to expose a fragile MuleSoft practice is to remove the console.

If the build only works from one laptop, you do not have a delivery system.

You have a ritual.

Headless MuleSoft is a tighter platform discipline. It is a tighter operator loop where MCP can assist, but Maven and Anypoint CLI remain the repeatable proof path.

https://docs.mulesoft.com/mulesoft-mcp-server/reference-mcp-tools

https://docs.mulesoft.com/anypoint-cli/latest/install

https://docs.mulesoft.com/cloudhub-2/ch2-deploy-cli

https://docs.mulesoft.com/mulesoft-mcp-server/mulesoft-mcp-server-release-notes

MCP Assists, Maven Proves

MuleSoft DX MCP Server can help with project validation, local runs, deployment actions, API governance, policy management, and usage insights. That is useful. It should not become the only proof path.

The repeatable path still needs source control, Maven, Anypoint CLI, explicit credentials, and environment promotion rules.

The Practical Toolchain

A terminal-first MuleSoft loop starts from Git, not Studio export. The repo should hold `pom.xml`, Mule XML, MUnit tests, config templates, and deployment configuration. Developers can use MCP in the inner loop, but CI should run deterministic commands.

The current CLI install path uses Node 22 or later, npm 7 or later, and `anypoint-cli-v4-public`. The current CloudHub 2.0 docs expose deploy, list, describe, logs, download logs, modify, start, stop, and delete commands. Mule Maven Plugin 4.10.0, dated June 2, 2026, also matters because it supports applications targeting Mule runtime engine 4.12.0.

https://docs.mulesoft.com/release-notes/mule-maven-plugin/mule-maven-plugin-4.10.0-release-notes

The Credential Wall

The operator lesson is that credentials are architecture. Platform credentials, connected apps, Maven settings, Exchange publishing, and secure deployment properties all shape whether headless work is real.

Do not bury secrets in source. Do not treat a local settings file as a deployment strategy. If a build needs enterprise Maven access, solve the entitlement problem instead of blaming the tool.

The Verification Loop

A green deploy is not enough. Check the app list, app description, runtime version, logs, health endpoint, and expected API reachability. Then compare the deployed artifact and configuration to the repo.

This is where console craft becomes software delivery.

What The Docs Do Not Say

Docs describe commands. They do not decide where agent assistance stops and deterministic CI/CD begins. My rule is simple: MCP assists, Maven and CLI prove.

The dangerous path is a deployment that succeeds but cannot be reproduced. That is worse than a failed deploy because it teaches the team to trust a process it cannot inspect.

Operator Notes

The first headless test should be deliberately boring. Clone the repo on a clean machine or clean runner. Install the documented versions of Node, npm, Maven, and Java. Authenticate through the same path CI will use. Run tests. Package. Deploy to a sandbox target. Then inspect the deployed app from Anypoint CLI without opening the console.

That test exposes hidden dependencies fast. A missing Maven setting, a plugin version mismatch, a runtime target nobody documented, or a secret that only exists on one developer machine will break the loop. That is good news. The failure is cheaper in the terminal than in a production release window.

The MCP server fits best as an assistant inside this loop. Let it validate the project, inspect structure, help run local actions, and call deployment tools when the task is bounded. Keep the promotion path deterministic. CI should still run the same Maven commands, publish the same artifact, and use the same deployment configuration every time.

The artifact boundary is important. A team should be able to answer which source revision produced the deployed app, which runtime version it targets, which environment properties were supplied, which secure properties were injected, and which smoke test proved the endpoint. Without those answers, headless deployment becomes faster console craft.

The build pipeline also changes review culture. Mule XML, DataWeave, properties, MUnit tests, and deployment config become code review subjects. A change to a secure property name can be as dangerous as a change to a flow. A runtime upgrade can be as meaningful as a connector change. A skipped test can hide a mapping failure.

The payoff is not speed alone. The payoff is reproducibility. Once the app can be built and deployed without Studio, governance becomes easier to inspect. That is the handoff to the next article: deployment is only the start. The front door still needs API Manager, autodiscovery, client contracts, policy checks, and runtime evidence.

The Working Artifact

npm install -g anypoint-cli-v4-public
anypoint-cli-v4 --version
mvn -B clean test
mvn -B clean package
mvn -B clean deploy -DmuleDeploy
anypoint-cli-v4 runtime-mgr:application:list --environment Sandbox --output json

What To Inspect Next

Next week: the front door. A deployed app is not governed until API Manager, autodiscovery, contracts, and gateway mode line up.

Question for the comments: Where does your MuleSoft delivery path break first when Studio is removed: build, credentials, deploy, logs, or promotion?

Sources

https://docs.mulesoft.com/mulesoft-mcp-server/reference-mcp-tools
https://docs.mulesoft.com/anypoint-cli/latest/install
https://docs.mulesoft.com/cloudhub-2/ch2-deploy-cli
https://docs.mulesoft.com/cloudhub-2/ch2-deploy-maven
https://docs.mulesoft.com/release-notes/mule-maven-plugin/mule-maven-plugin-release-notes
https://docs.mulesoft.com/release-notes/mule-maven-plugin/mule-maven-plugin-4.10.0-release-notes
https://docs.mulesoft.com/mulesoft-mcp-server/mulesoft-mcp-server-release-notes
Subscribe now

Governing the Front Door: API Manager, Omni Gateway, and the Autodiscovery Trap

Shivanath Devinarayanan — Sat, 13 Jun 2026 09:08:24 GMT

Newsletter note: This is part of a five-part MuleSoft operator series focused on practical integration, governance, and Agent Fabric readiness.

Shivanath Devinarayanan, Chief Digital Labor and Technology Officer at Asymbl

The policy can exist.

The API can exist.

Traffic can still enter through the wrong door.

API Manager is not policy dust. The front door governs traffic only when runtime, API instance, credentials, client contract, flow binding, and gateway mode line up.

https://docs.mulesoft.com/gateway-home/

https://docs.mulesoft.com/mule-gateway/mule-gateway-config-autodiscovery-mule4

https://docs.mulesoft.com/gateway/latest/policies-included-client-id-enforcement

Autodiscovery Is A Binding, Not A Badge

In Mule 4, autodiscovery binds an API Manager instance to a Mule flow through `apiId` and `flowRef`. The API must exist in API Manager, and the runtime needs valid Anypoint Platform credentials.

That is a runtime binding. It is not a decorative metadata sync.

Gateway Names Matter

Current MuleSoft docs distinguish Omni Gateway, formerly Flex Gateway, from Anypoint Mule Gateway embedded in Mule runtime. Mixing those names creates bad operating assumptions.

Policy support also depends on gateway mode. A policy that works in one path may not work in another.

The Decision Tree

Use this sequence before blaming API Manager.

Are consumers hitting the governed endpoint or bypassing it?
Is the API instance active in API Manager?
Does the Mule app have the right `apiId` and `flowRef`?
Did the runtime start with credentials for the right org, business group, and environment?
Is the policy supported by the selected gateway mode?
Does the client app have an approved contract?
Are headers or query params matching the policy extraction rules?
Are logs showing `401`, `429`, or only backend responses?

CloudHub Health Is Not Policy Enforcement

Runtime Manager tells you whether the app is deployed, running, logging, and sized correctly. API Manager tells you whether the API is governed. Those are related surfaces, not the same surface.

A healthy CloudHub app can still be reachable through a path that does not enforce the policy you thought was protecting it.

What The Docs Do Not Say

The dangerous failure mode is not policy failure. The dangerous failure mode is traffic finding another door.

A green API instance does not prove every consumer enters through that instance. A policy attached to the wrong flow does not protect the business action. A rate limit scoped per replica can surprise a team that expected one global bucket.

Operator Notes

When a policy does not enforce, resist the urge to stare at the policy screen first. Follow the traffic. Which hostname did the client call? Which base path did it use? Did the request enter the managed API instance or go directly to the app? The fastest diagnosis often comes from proving the path before debating the setting.

Then check the binding. The `apiId` points to a specific API Manager instance. The `flowRef` points to a specific Mule flow. A wrong value can create the worst kind of failure: something appears configured, but the protected flow is not the flow handling traffic.

Credentials add another layer. The runtime needs platform credentials for the correct organization, business group, and environment. The client app needs an approved contract when the policy depends on client identity. The request needs credentials in the place the policy expects. Headers, query params, and DataWeave extraction expressions become operating standards, not cosmetic choices.

Rate limiting needs special care. Teams often talk about rate limits as if there is one universal bucket. The actual behavior depends on the policy, gateway mode, replicas, and storage configuration. If you do not know the scope of the counter, you do not know what protection you have.

CloudHub health can mislead teams here. A running app proves the runtime is alive. It does not prove the traffic is governed. Logs, API Manager status, policy responses, and client error behavior need to be inspected together. A backend `200` might mean success, or it might mean the request bypassed the front door entirely.

The operating standard should be simple: no API is considered governed until someone can prove the request path, API instance, flow binding, client contract, policy response, and log evidence. That proof matters even more when the next consumer is an agent tool instead of a human-triggered integration.

The Working Artifact



apiId=12345678

What To Inspect Next

Next week: after the request enters the right door, we deal with backend semantics, canonical models, and DataWeave mediation.

Question for the comments: When a policy does not enforce, which check do you run first: endpoint path, API instance, runtime credentials, flowRef, client contract, or gateway mode?

Sources

1. https://docs.mulesoft.com/gateway-home/

2. https://docs.mulesoft.com/mule-gateway/mule-gateway-config-autodiscovery-mule4

3. https://docs.mulesoft.com/gateway/latest/policies-included-client-id-enforcement

4. https://docs.mulesoft.com/gateway/latest/policies-included-rate-limiting-sla

5. https://docs.mulesoft.com/cloudhub/managing-applications-on-cloudhub

The 2026 MuleSoft Mental Model: From API-Led Connectivity to Agent Fabric

Shivanath Devinarayanan — Fri, 12 Jun 2026 16:15:01 GMT

I would not start a 2026 MuleSoft architecture review with a connector list.

I would start with one question: which APIs are ready to become agent tools?

Most estates do not like the answer.

API-led connectivity is not dead. It becomes the substrate for governed agent networks, where APIs, MCP servers, brokers, gateways, identity, observability, and LLM governance are managed assets.

https://www.salesforce.com/news/stories/agent-fabric-control-plane-announcement

https://docs.mulesoft.com/general/exp-release-notes

https://docs.mulesoft.com/general/agent-fabric-release-notes

The Old Map Became The Base Layer

The familiar System, Process, and Experience API map still matters. System APIs protect systems of record. Process APIs hold reusable business verbs. Experience APIs shape consumption for a channel or audience.

Agents do not remove that discipline. They expose where it was missing.

The Consumer Changed

A portal user, mobile app, and partner integration usually call a known endpoint from a known path. An agent can ask, decide, call, retry, delegate, and trigger a second action. That shifts the risk from integration reach to integration control.

The hard question is no longer whether an agent can call an API. The hard question is whether we know what it called, why it was allowed, which contract it exercised, which policy applied, which identity it used, and how we would prove that later.

Exchange Turns Into Inventory

The newest MuleSoft direction points to a broader catalog. Exchange now sits in an operating model where agents, APIs, brokers, LLMs, MCP servers, policies, templates, and integration assets can all become managed assets.

For agent work, the catalog is not documentation. It is inventory. Inventory is where governance starts.

The Checklist

Before an API becomes a tool, inspect it like an operator.

Is the capability published in Exchange?
Is it managed in API Manager?
Does it have a current spec?
Is the owner clear?
Is the operation read-only, write, destructive, or privileged?
Is the identity model explicit?
Can we observe failures and business outcomes?
Is there an approval path for high-risk action?

What The Docs Do Not Say

The docs can show the happy path: discover assets, create MCP servers, govern traffic, visualize networks, and route tasks. The cleanup comes first.

You need to know which APIs are real products and which are accidental endpoints. You need to know which specs are current and which are archaeology. You need to know which policies are enforced and which only exist in a slide. A successful HTTP 200 can still be a failed business operation.

Operator Notes

Start with the inventory meeting, not the agent demo. Put the top twenty API capabilities on a board and mark which ones have a current spec, an owner, a policy path, and useful telemetry. The answers will tell you more about agent readiness than any architecture diagram.

The strongest candidates are boring. Read-only lookups, bounded status checks, narrow workflow actions, and well-owned process APIs make better first tools than large destructive endpoints. A tool that updates a record, opens a case, sends a message, or starts an order needs a different review path than a tool that retrieves account status.

The naming work matters too. Agents choose from names and descriptions. A vague operation name like `updateCustomer` is not safe enough when the backend action can alter billing, service, legal identity, or ownership. The tool name should describe the business action and the risk. The description should tell the agent when not to call it.

Identity is the second inspection point. A human user, an agent, a connected app, and a backend service account may all be involved in one action. If the platform cannot explain whose authority was used, the action is not ready for autonomous execution. That is not an AI issue. It is an access-control issue.

Observability comes next. A trace that says a tool was called is useful, but it is not enough. You need to know the business operation, the input shape, the policy decision, the downstream system, the response, and the exception path. Otherwise the audit trail proves activity without proving intent.

This is why API-led connectivity still matters. It gives us the controlled surface. Agent Fabric makes that surface visible to agents, brokers, MCP clients, and governance workflows. The transition is not from integration to AI. The transition is from API reuse to governed action reuse.

Publishing Angle

The article should land as a map before the build series. Readers do not need another MuleSoft overview. They need a way to inspect their own estate before agent teams start wrapping endpoints as tools.

Use the comments to pull operators into the real constraint. Some teams will say ownership. Some will say identity. Some will say API Manager coverage. Some will admit the spec is stale. That is useful because the next four articles each take one of those constraints into the field.

The best reader outcome is a small inventory exercise. Pick one API. Ask whether it is discoverable, governed, owned, observable, and safe to expose as a tool. If the team cannot answer those five questions, Agent Fabric becomes an aspiration rather than an operating model.

The Working Artifact

mulesoft_2026_mental_model:
  foundation: [system_apis, process_apis, experience_apis]
  governance_plane: [exchange, api_manager, omni_gateway, identity, policies]
  agent_fabric: [agent_registry, agent_broker, mcp_servers, a2a_agents]

What To Inspect Next

Next week: headless MuleSoft, where the architecture diagram has to survive Maven, Anypoint CLI, MCP tools, and deployment reality.

Question for the comments: If you had to turn one API in your estate into an agent tool tomorrow, what would block you first: ownership, policy, identity, observability, or the contract itself?

Sources

1. https://www.salesforce.com/news/stories/agent-fabric-control-plane-announcement/

2. https://docs.mulesoft.com/general/exp-release-notes

3. https://docs.mulesoft.com/general/agent-fabric-release-notes

4. https://docs.mulesoft.com/general/agent-fabric-overview

5. https://docs.mulesoft.com/exchange/

6. https://www.mulesoft.com/lp/reports/connectivity-benchmark

7. https://www.salesforce.com/blog/api-led-connectivity/

The Operating Manual For Fable-Class Work

Shivanath Devinarayanan — Wed, 10 Jun 2026 13:22:18 GMT

Claude Fable 5 creates a routing problem.

Not a prompting problem.

When a model is clearly more capable, teams tend to make the wrong move first. They promote it to default. Every draft, every ticket, every summary, every question starts flowing through the new premium lane because the answers feel better.

That is exactly how a capable model becomes an expensive habit.

The better move is to treat Fable as an escalation layer: a model for long, messy, consequential work that deserves extra reasoning and review. Routine work should still go to cheaper models. Sensitive work should still go through governance. High-stakes work should still come back to a human before it changes the business.

That is the operating manual.

The Fable Escalation Test

Use Fable when at least two of these are true:

- The context spans many files, documents, teams, or systems.

- The task would take a senior person several hours or days to structure.

- A weak output would create real rework or risk.

- The output will become a reviewed artifact.

- The task needs visual, chart, table, PDF, or code understanding.

- The work needs tests, validation, or self-checking.

- Cheaper models keep losing the thread.

Do not use Fable just because it is available.

Use it when the job has enough consequence to justify the escalation.

The Model Routing Table

Model Lane | Use It For | Avoid It For

Haiku | Fast routine work, simple cleanup, lightweight extraction | Deep planning, high-risk reasoning

Sonnet | Everyday build work, drafting, summarization, practical implementation | Long-horizon work with many dependencies

Opus | Complex reasoning, higher-quality review, sensitive work where Fable retention or fallback is not appropriate | Premium long-running work where Fable can remove multiple loops

Fable | Long-horizon, messy, reviewed escalation work | Routine work, casual drafts, sensitive ZDR-bound material

This table is the simplest way to keep Fable useful.

It makes the decision visible.

If a task cannot explain why it needs Fable, it probably does not need Fable.

Developer Playbook

For developers, Fable belongs at the point where coding becomes system reasoning.

Use it for:

- codebase migrations

- architectural refactors

- hard PR review

- test-plan generation

- visual implementation QA

- legacy modernization

- long-running Claude Code sessions

Example:

A team needs to migrate an old service from one internal API shape to another. Sonnet can inspect files, draft a plan, and make the first set of edits. Opus can review the risky parts. Fable enters when the work crosses a threshold: too many modules, uncertain test coverage, ambiguous edge cases, and a final plan that needs to survive senior review.

The prompt should not be "fix this repo."

It should be:

Review the migration plan, inspect the highest-risk paths, identify missing tests, propose a phased rollout, and return a merge-readiness checklist.

That is Fable work.

A typo fix is not.

Founder Playbook

For founders, Fable is best used as a judgment partner for messy strategic inputs.

Use it for:

- market synthesis

- investor memo critique

- product strategy tradeoffs

- customer feedback clustering

- pricing assumption review

- "kill our plan" sessions

- prototype judgment before committing engineering time

Example:

A founder has customer calls, churn notes, competitor pages, revenue segments, product constraints, and a half-written board memo. A cheaper model can summarize each input. Fable should be used for the higher-order task: turn the mess into decision options, identify weak assumptions, separate evidence from opinion, and tell the founder what must be verified before the next board meeting.

The output should not be treated as truth.

It should be treated as a sharper agenda.

Project Manager Playbook

For PMs, Fable is useful when the project state is too messy for a meeting summary but too important for an ordinary checklist.

Use it for:

- release readiness reviews

- dependency maps

- risk registers

- stakeholder tradeoff options

- acceptance criteria

- test plans

- launch packet assembly

Example:

A launch has scattered Jira tickets, two meeting transcripts, design changes, customer commitments, unresolved engineering risks, and legal review still open. Sonnet can summarize the meetings. Fable should produce the operating artifact: phases, owners, dependencies, open decisions, acceptance criteria, rollback triggers, and the questions leadership must answer before launch.

The PM still owns the plan.

Fable helps expose the missing joints.

Cost Governance After June 22

The usage window matters.

Anthropic says Fable is included on Pro, Max, Team, and seat-based Enterprise plans from June 9 through June 22, 2026. Starting June 23, usage on those plans requires usage credits unless Anthropic extends the window.

That means the first period should be treated like a controlled trial.

Track:

- who used Fable

- what task they used it for

- which cheaper model was tried first

- whether Fable removed a review loop

- whether the output became a real artifact

- how many retries were needed

- whether usage credits will be needed after June 22

Fable costs 2x Opus on the listed API rates. That does not mean it is too expensive. It means it must be routed to work where the extra reasoning reduces total cost.

If Fable saves three review meetings, it may be cheap.

If it rewrites a simple status update, it is waste.

Retention And Fallback Review

Fable is also a governance decision.

Anthropic says Mythos-class model traffic requires 30-day retention for safety monitoring. Anthropic also says certain categories, including cybersecurity, biology and chemistry, and distillation-related requests, can route away from Fable to Opus 4.8 or refuse.

For most individual users, this may feel invisible.

For enterprise teams, it should become a routing rule.

Before using Fable, ask:

- Is this data allowed under 30-day safety retention?

- Does the workspace normally require zero data retention?

- Would an Opus fallback be acceptable?

- Does the workflow need to log which model actually answered?

- Is the output reviewed before action?

If those answers are unclear, do not start with Fable.

Start with governance.

The First-Week Inspection Ritual

At the end of the first week, run a Fable usage review.

Do not ask, "Did people like it?"

Ask:

- Which Fable tasks became real artifacts?

- Which tasks could have been Sonnet or Opus?

- Which tasks saved human time?

- Which tasks created extra review work?

- Which prompts touched sensitive data?

- Which outputs needed a model fallback?

- Which use cases deserve a budget?

- Which use cases should be blocked?

The goal is to build a routing policy while the included window is still fresh.

Good teams will not win by using Fable the most.

They will win by knowing exactly when to use it.

What To Inspect Next

Create a lightweight `Fable Escalation` tag in your work tracker.

For every Fable use, capture:

- role

- task type

- source material

- cheaper model attempted first

- reason for escalation

- output artifact

- review result

- retention sensitivity

- estimated loop saved

After 10 to 20 tasks, the pattern will be obvious.

Some work deserves Fable.

Most work does not.

That is the point.

Source Notes

Launch and model behavior:

- Anthropic launch post: https://www.anthropic.com/news/claude-fable-5-mythos-5

- Fable model page: https://www.anthropic.com/claude/fable

- API launch docs: https://platform.claude.com/docs/en/about-claude/models/introducing-claude-fable-5-and-claude-mythos-5

Pricing and usage credits:

- Claude pricing docs: https://platform.claude.com/docs/en/about-claude/pricing

- Paid plan usage credits: https://support.claude.com/en/articles/12429409-manage-usage-credits-for-paid-claude-plans

- Team and Enterprise usage credits: https://support.claude.com/en/articles/12005970-manage-usage-credits-for-team-and-seat-based-enterprise-plans

Governance:

- Mythos-class data retention: https://support.claude.com/en/articles/15425996-data-retention-practices-for-mythos-class-models

Developer workflow:

- Claude Code product page: https://claude.com/product/claude-code

The Trust Boundary Is the Product

Shivanath Devinarayanan — Tue, 09 Jun 2026 15:51:01 GMT

WWDC26 is easy to read as a catch-up keynote.

That reading is too thin.

The more useful reading is that Apple is trying to move AI from a separate assistant surface into the operating system. Siri AI is the visible part. Liquid Glass is the interface metaphor. App Intents and Foundation Models are the developer surface. Private Cloud Compute and on-device models are the trust claim.

The hard part is the boundary between them.

When an assistant can read context, understand what is on screen, reason over personal information, and act through apps, the product is no longer the answer. The product is the control surface around the action.

That is what operators should inspect.

Why the public thesis is incomplete

The LinkedIn article argues that WWDC26 was about the operating layer. That is the public thesis.

For subscribers, the deeper question is more practical: what breaks when the operating layer is poorly designed?

Three things break first.

First, context becomes invisible. The assistant uses something the user did not know it was using.

Second, action becomes ambiguous. The assistant suggests, edits, sends, schedules, files, or changes something without a clear action boundary.

Third, accountability becomes mushy. The user cannot inspect what happened, why it happened, or how to reverse it.

That is not a model problem alone.

It is an operating-design problem.

What Apple is testing

Apple's WWDC26 announcements put several pieces into the same frame: Siri AI with personal context and onscreen awareness, Apple Foundation Models, App Intents, Spotlight semantic indexing, View Annotations, and Private Cloud Compute.

Source: Apple Newsroom, Siri AI release, June 8, 2026

https://www.apple.com/newsroom/2026/06/apple-introduces-siri-ai-a-profoundly-more-capable-and-personal-assistant/

Source: Apple Developer, WWDC26 Apple Intelligence guide

https://developer.apple.com/wwdc26/guides/apple-intelligence/

The interesting part is not any single feature.

It is the dependency chain.

The assistant needs context. The model needs a way to reason. The app needs typed actions. The user needs a permission surface. The system needs a trace.

If one of those pieces is weak, the whole experience feels either magical or unsafe.

Magic is not enough for production.

The five-part trust boundary review

Here is the review ritual I would use before calling any assistant production-ready.

1. Context review

Ask what the assistant can see.

Can the user tell whether the assistant is reading the current screen, prior messages, calendar events, files, photos, or public web information?

The important point is not whether the assistant has access. The important point is whether access is legible.

If the user cannot see which context was used, the answer may be correct but still untrustworthy.

2. Action review

Ask what the assistant is allowed to do.

There is a large difference between summarizing an email and sending a reply. There is a large difference between finding a calendar slot and booking it. There is a large difference between drafting a note and filing it into a system of record.

App Intents matter because they move actions from vague screen behavior into named capabilities.

Source: Apple Developer, WWDC26 Apple Intelligence guide

https://developer.apple.com/wwdc26/guides/apple-intelligence/

Typed actions are inspectable. Screen guessing is not.

3. Permission review

Ask where the user approves the boundary.

Permission is not a checkbox buried in settings. It is the moment where the user understands what context is being used, what action is being proposed, and what app or system will receive the result.

This is where Liquid Glass becomes a useful metaphor.

Apple described Liquid Glass controls as adjustable from ultra-clear to fully tinted. That is a design setting, but it is also a trust metaphor.

Source: Apple Newsroom, WWDC26 overview, June 8, 2026

https://www.apple.com/newsroom/2026/06/apple-unveils-next-generation-of-apple-intelligence-siri-ai-and-more/

The interface should reveal the right layer at the right moment.

4. Trace review

Ask what the system records.

If an assistant moves across context, model reasoning, and app action, the user needs a readable trace. Not a developer log. A user-facing explanation.

What did it see?

What did it decide?

What action did it take?

Which app changed?

This is the difference between a helpful assistant and an unaccountable automation.

5. Reversal review

Ask what can be undone.

Undo is not a convenience feature in agentic systems. It is a trust primitive.

If the user cannot reverse the action, the approval moment needs to be stronger. If the user can reverse the action, the system can afford more fluidity.

That is the operating tradeoff.

A concrete scenario

Imagine a user asks an assistant to prepare for a meeting.

A weak system summarizes a few emails and gives a confident answer.

A stronger operating-layer system shows the source context, separates public knowledge from private messages, identifies which app actions are available, asks before creating or sending anything, and leaves a trace of what changed.

That second version is not only more capable.

It is more governable.

What to inspect next

After WWDC26, the useful question is not whether Apple has the best model.

The useful question is whether Apple can make the operating boundary legible enough for users and developers to trust AI inside the system.

Builders should ask the same question of their own products.

Can the user see the layer?

Can the system separate context?

Can the app expose typed actions?

Can the permission moment be understood?

Can the result be traced?

Can the action be reversed?

If those answers are weak, the assistant is still a demo.

If those answers are strong, the operating layer becomes a product surface.

That is the real WWDC26 takeaway.

The Loop Readiness Review

Shivanath Devinarayanan — Tue, 09 Jun 2026 08:24:11 GMT

The LinkedIn version made the public argument: loop engineering is not a prompting trick. For Salesforce teams, it is release control.

The subscriber question is more practical.

Before a team lets Cursor, Claude Code, and Agentforce participate in real delivery, what should leaders inspect?

The answer is not another tool list.

It is a review ritual.

Call it the Loop Readiness Review.

The purpose is simple: prove that the loop can work inside a bounded Salesforce release path before it starts changing code, metadata, permissions, or Agentforce behavior.

The Problem

Salesforce work has never been only code.

That is why loop engineering gets dangerous if it is imported from generic software engineering without translation. A loop that works cleanly in a small TypeScript service can fail quietly in Salesforce because the repo is only one part of the system.

The real system includes Apex, Lightning Web Components, Flow, permission sets, profiles, custom metadata, deployment manifests, package versions, sandbox drift, and production release rules.

Agentforce adds another layer. Salesforce describes Agentforce DX as extending Salesforce DX to work with agents, and says agents are metadata like other Salesforce customizations.

https://developer.salesforce.com/docs/ai/agentforce/guide/agent-dx.html

That means agent behavior enters the same release conversation as Apex and Flow. If an Agentforce action calls Apex, launches a Flow, uses a prompt template, or changes how a support rep escalates a renewal risk, the loop is no longer just a coding loop.

It is a business-behavior loop.

The Review Ritual

Run the Loop Readiness Review before a loop is allowed to operate on a Salesforce branch or sandbox.

The review has five questions.

1. Is The Loop Bounded?

A bounded loop has a trigger, a scope, and a no-touch list.

Good trigger examples:

A pull request opens.
A deployment validation fails.
A scheduled sandbox drift check runs.
A human release owner starts a specific inspection.

Bad trigger example:

"Improve the renewal process."

That instruction gives the agent permission to wander. In Salesforce, wandering is expensive. It can turn a small Apex fix into Flow edits, profile changes, and metadata churn nobody asked for.

The no-touch list matters as much as the scope. If the loop can edit profiles, broad Flow metadata, package boundaries, or production-connected assumptions without review, the loop is not ready.

2. Is The Evidence Real?

Anthropic's agent guidance makes the control point clear: agents need ground truth from the environment at each step, and stopping conditions help maintain control.

https://www.anthropic.com/engineering/building-effective-agents

In Salesforce, ground truth is not a model's confidence.

It is command output, analyzer output, test output, deployment validation, retrieved metadata, and human review notes.

For an Agentforce escalation action, the evidence packet should include:

Changed-component map.
Salesforce Code Analyzer output.
Apex test output.
LWC test output when relevant.
`sf project deploy validate` result.
Agentforce metadata notes.
Behavior-review script for the business owner.

Salesforce Code Analyzer gives the loop a static inspection layer across engines such as CPD, ESLint, Flow Scanner, PMD, RetireJS, Regex, and Salesforce Graph Engine.

https://developer.salesforce.com/docs/platform/salesforce-code-analyzer/guide/engines.html

Salesforce CLI deployment validation gives the loop a release-shaped check instead of a local-only check.

https://github.com/salesforcecli/cli

3. Is Org State Represented?

This is the Salesforce trap.

The repository may look clean while the org is not.

A Flow can exist in UAT and not in source. A permission set can be missing field access. A managed package version can differ. A prompt template can be connected to an Agentforce action in one sandbox and not another. A test can pass locally while deployment validation fails because the target org has different constraints.

The loop needs a rule for org-state failures.

It should not automatically keep editing.

It should classify the failure:

Code issue inside scope: fix and rerun.
Metadata missing from source: stop and report retrieval need.
Permission ambiguity: stop and route to admin.
Package or org dependency: stop and route to architect or release manager.
Business-rule ambiguity: stop and route to business owner.

The loop earns trust by stopping at the right boundary.

Not by pretending every failure is code.

4. Is Agentforce Behavior Reviewed?

Agentforce actions are building blocks that let agents perform tasks and interact with data. Salesforce documents custom action paths through Apex, Flow, prompt templates, Apex REST actions, AuraEnabled actions, and named query actions.

https://developer.salesforce.com/docs/ai/agentforce/guide/get-started-actions.html

https://developer.salesforce.com/docs/ai/agentforce/guide/agent-invocablemethod.html

That means a loop can validate code and still miss the business consequence.

The Apex compiles.

The Flow deploys.

The metadata moves.

The agent still routes the wrong renewal risk to the wrong team.

That is why the business owner owns behavior review. The loop can prepare the review script:

What should the agent detect?
What action should it call?
What record should change?
What should the user see?
When should the agent escalate to a human?

But the loop cannot approve the business meaning.

5. Is There A Stop Condition?

The most important line in any loop is not the first prompt.

It is the stop rule.

Stop when validation passes.

Stop when the iteration budget is reached.

Stop when the loop detects missing source.

Stop when a permission decision is needed.

Stop when the action changes customer or employee behavior.

Stop when the loop is about to touch a no-touch area.

Claude Code and Cursor make the work surface faster. Claude Code can run commands, inspect output, and continue through terminal feedback. Cursor can help engineers reason across files and encode project rules.

https://code.claude.com/docs/en/overview

https://docs.cursor.com/agent

https://docs.cursor.com/context/rules

That speed is useful only when the stop rule is visible.

What To Inspect Next

Run this review on one bounded Salesforce change before scaling it:

Pick one Agentforce action or Apex/LWC change.
Write the loop trigger.
Write the exact success condition.
Write the no-touch list.
Write the evidence packet.
Write the stop conditions.
Assign the five owners: developer, admin, architect, release manager, business owner.

Then run the loop in a sandbox and inspect the evidence, not the vibe.

The point is not to slow the team down.

It is to make AI speed legible enough to release.

Loop engineering will become a real Salesforce skill when teams stop asking whether the agent can make the change and start asking whether the loop can prove the change is ready.

That is the operating shift.

Sources

Anthropic, Building Effective Agents: https://www.anthropic.com/engineering/building-effective-agents
Claude Code overview: https://code.claude.com/docs/en/overview
Cursor Agent: https://docs.cursor.com/agent
Cursor Rules: https://docs.cursor.com/context/rules
Salesforce Agentforce DX: https://developer.salesforce.com/docs/ai/agentforce/guide/agent-dx.html
Salesforce Agentforce Actions: https://developer.salesforce.com/docs/ai/agentforce/guide/get-started-actions.html
Salesforce Apex InvocableMethod actions: https://developer.salesforce.com/docs/ai/agentforce/guide/agent-invocablemethod.html
Salesforce Code Analyzer engines: https://developer.salesforce.com/docs/platform/salesforce-code-analyzer/guide/engines.html
Salesforce CLI: https://github.com/salesforcecli/cli

Build The Room Before You Draft The Memo

Shivanath Devinarayanan — Mon, 08 Jun 2026 12:53:55 GMT

The public edition makes the visible argument: the next hallucination firewall is not a better prompt. It is the source room.

For subscribers, the more useful question is operational:

What does that room actually contain?

Most teams already understand the surface advice. Verify sources. Check citations. Do not paste sensitive material into unsafe tools. Have a policy. Train the team.

The Sullivan & Cromwell incident shows why that advice is not enough by itself. The firm told Chief Bankruptcy Judge Martin Glenn that an April 9 emergency motion in the Prince Global Holdings Chapter 15 matter included inaccurate citations and other errors, including AI hallucinations. The same letter said the firm's AI policies were not followed and that citation review did not identify the inaccurate citations generated by AI.

A policy can describe the behavior.

A room has to force the behavior.

The operating problem

In serious knowledge work, the agent is usually asked to do two jobs at once.

First, figure out the source material.

Second, produce the artifact.

That is backwards.

When source material is messy, contradictory, duplicated, or incomplete, the agent spends part of the run guessing the shape of the project. It has to decide what matters, what is stale, what is authoritative, what conflicts, and what is missing while also trying to write the final output.

That is how polished work gets dangerous.

The problem is not that the model cannot write. The problem is that the writing starts before the evidence surface is visible.

The Source Room Release Ritual

Use a source room for any work that could create legal, financial, market, operational, or reputational exposure.

The minimum ritual has seven artifacts.

Source inventory.

Every file, link, transcript, note, model output, spreadsheet, deck, and prior draft gets listed. Path, type, date, owner if known, and purpose. This is not bookkeeping. It is the first moment where the agent tells the human what it believes the project contains.

Authority ranking.

The room separates signed, filed, approved, published, and current sources from background material. A final policy outranks a draft. A court filing outranks a recap. A signed contract outranks a sales deck. A live system export outranks a remembered chat message.

Conflict log.

The agent does not resolve conflicts silently. It surfaces them. Two decks disagree on the number. A policy and a FAQ describe different approval paths. A source says the tool was unspecified while commentary guesses a vendor. The conflict log keeps ambiguity visible until a human decides.

Missing context list.

Missing material is often more dangerous than available material. The agent should say what it does not have: the final redline, the signed addendum, the current pricing sheet, the raw transcript, the export behind the chart, the approval record. This is where hallucination pressure drops because the gap has a name.

Duplicate and version-family report.

Duplicates are reasoning debt. A repeated transcript can overweight one source. A stale memo with the same title as a current memo can contaminate the answer. A redline and a final version can blend into a false compromise. Do not delete duplicates automatically. Quarantine them and ask for a decision.

Claim ledger.

Every material claim in the draft gets a source. Not every sentence needs a citation. Every claim that would matter if wrong does. The ledger should show the sentence, the source, the source rank, and the verification owner.

Release check.

Before the artifact leaves the room, someone signs the release. Not because humans are ceremonial. Because accountability has to land somewhere. The release check asks: What changed? What claims are unsupported? What conflicts remain? What missing context did we accept? Who is comfortable with this going outside the room?

What breaks in practice

The most common failure is starting with the deliverable.

Write the memo.

Draft the filing.

Create the board deck.

Summarize the diligence room.

That prompt feels efficient because it skips the slow part. But the slow part is often the control. The source room makes the agent do the boring work first, which is exactly the work that prevents the expensive mistake later.

The second failure is treating source organization as a junior task.

It is not.

In an agentic workflow, source organization becomes reasoning architecture. The layout of the folder changes the answer. The labels change the answer. The stale file left beside the current file changes the answer. The missing export changes the answer.

The third failure is using policy as a substitute for instrumentation.

Policies matter. Training matters. ABA Formal Opinion 512 matters in the legal context because existing duties still apply when lawyers use generative AI. But the operational lesson is broader: a policy that says "verify everything" still needs a workspace where verification is visible, repeatable, and interruptible.

A concrete enterprise scenario

Imagine an enterprise team preparing a board memo on AI productivity.

The folder contains a current KPI export, three old operating decks, two vendor benchmarks, a transcript from a buyer interview, a finance spreadsheet, a draft narrative from last quarter, and meeting notes from a leadership review.

The fastest bad prompt is:

"Write the board memo."

The better first prompt is:

"Build the source room. Inventory every file. Mark authority. Identify duplicates and version families. Surface conflicts. List missing context. Summarize each source. Do not draft the memo yet."

The board memo becomes easier after that, not harder. The writing prompt can shrink because the room carries the complexity:

"Draft from the approved KPI export and current finance spreadsheet. Treat old decks as background only. Preserve the unresolved conflict about retention impact. Flag any productivity claim without a source."

That is the real shift.

Prompt quality improves when the room quality improves.

What to inspect next

Leaders should not start by asking how many AI tools are deployed.

They should ask how many source rooms exist for high-stakes work.

Inspect five things:

Does the team separate authoritative sources from background sources before drafting?
Does the agent produce a conflict log before it synthesizes?
Are duplicates quarantined instead of silently blended?
Does the final artifact have a claim ledger for material assertions?
Is there a named release owner when AI-assisted work leaves the workspace?

If those controls do not exist, the organization is relying on memory, habit, and polished formatting.

That can work for casual drafting.

It cannot be the operating model for serious work.

The point is not to slow AI down.

The point is to make AI speed legible enough to trust.

Sources

Sullivan & Cromwell letter to Chief Judge Martin Glenn, In re Prince Global Holdings Ltd.: https://websitedc.s3.amazonaws.com/documents/InrePrinceUSA18April2026.pdf
Legal AI Governance case tracker: https://legalaigovernance.com/tracker/cases/in-re-prince-global/
Mata v. Avianca sanctions order: https://law.justia.com/cases/federal/district-courts/new-york/nysdce/1%3A2022cv01461/575368/54/
ABA Formal Opinion 512: https://www.americanbar.org/content/dam/aba/administrative/professional_responsibility/ethics-opinions/aba-formal-opinion-512.pdf
Stanford RegLab, Hallucination-Free?: https://reglab.stanford.edu/publications/hallucination-free-assessing-the-reliability-of-leading-ai-legal-research-tools/
Stanford Impact Labs, Large Legal Fictions: https://impact.stanford.edu/article/large-legal-fictions-profiling-legal-hallucinations-large-language-models
Thomson Reuters Institute hallucinations report: https://www.thomsonreuters.com/en-us/posts/ai-in-courts/hallucinations-report-2026/
OpenAI Codex: https://openai.com/codex/
OpenAI Agents SDK update: https://openai.com/index/the-next-evolution-of-the-agents-sdk/
Anthropic Claude Code overview: https://docs.anthropic.com/en/docs/claude-code/overview

Execution Got Cheap. Judgment Did Not.

Shivanath Devinarayanan — Wed, 20 May 2026 17:20:38 GMT

Organizations built rituals to protect scarce execution. Now execution is cheaper, but the rituals remain. The costliest work may be the meeting that prevents the prototype from existing.

This is the Substack sibling to the LinkedIn article, "The Bottleneck Moved. Your Calendar Didn't.". It uses the same local source base, but it changes the reader promise: less public thesis, more operating judgment for subscribers.

The Old Insurance Policy Costs Too Much

Planning rituals were rational when building the wrong thing was expensive. If a rough version can be built quickly, the cost of preventing all wrong turns can exceed the cost of learning from one.

Containerization Is The Better Analogy

When shipping got cheap, winners did not merely load ships faster. They learned what to move, where demand would be, and how to coordinate the network. AI does the same to knowledge work. It moves the constraint.

The New Constraints Are Human

Clarity, ambition, distribution, and trust do not become abundant because a model writes faster. They become more visible as the true bottlenecks. Teams that spend cheap execution on timid ideas will still underperform.

The Subscriber Takeaway

The calendar is a diagnostic tool. Count the time spent aligning before testing. If the meeting exists to protect work that could be explored directly, the organization is paying an old tax.

Subscriber Operating Lens

The calendar test should become a recurring operating review. Take a two-week sample and classify time into alignment, approval, planning, building, testing, distribution, and relationship work. Then ask which meetings existed only because building used to be expensive.

The answer will be uncomfortable. Some rituals still protect important judgment. Others protect execution that can now be explored directly. The point is not to cancel every meeting. It is to move human attention toward the new constraints: clarity, ambition, distribution, and trust.

For subscribers, the management challenge is reallocating attention. Cheap execution only creates advantage when leaders spend the saved time on better questions, braver bets, and faster learning loops.

What To Inspect Next

The practical test is not whether the argument sounds right in a strategy discussion. It is whether the organization has a repeatable way to capture the reasoning that would make the next similar decision better informed.

That means asking three questions after the relevant decision:

What assumption did we rely on?
What exception did we allow?
What would a future person or agent need to know before making the next call?

If the answer disappears after the meeting, the organization has not built memory. It has only produced activity.

Closing Note

The deeper story across this thread is that AI does not only make work faster. It raises the value of context. The firms that learn how to preserve reasoning will get better with each cycle. The firms that do not will move faster without learning faster.

Shivanath Devinarayanan, Chief Digital Labor and Technology Officer at Asymbl. These views are my own.

The Agent Economy Breaks Seats Before It Breaks Software

Shivanath Devinarayanan — Wed, 20 May 2026 16:05:56 GMT

The useful lesson from the current SaaS pricing debate is not that enterprise software became worthless. It is that a pricing model built around human logins looks fragile when work starts flowing through agents.

This is the Substack sibling to the LinkedIn article, "Your Software Didn't Break. Your Pricing Model Did.". It uses the same local source base, but it changes the reader promise: less public thesis, more operating judgment for subscribers.

The Product Survives. The Toll Booth Changes.

Data, workflow logic, integrations, reliability, and accountability still matter. What looks weaker is charging by the number of humans who click through a user interface. Agents do not respect that accounting unit.

The Buyer Has A New Argument

Even before full automation arrives, AI gives buyers leverage. They can ask why a fee, seat count, or service price still reflects old labor assumptions. The negotiation changes before the operating model fully changes.

The Market Can Be Incoherent And Still Right

Investors may rotate between contradictory AI stories. That does not erase the underlying signal: if agents can move work away from the UI, vendors must explain what they actually sell. Data access, governed workflows, and accountability become more defensible than seats.

The Subscriber Takeaway

For SaaS firms and professionals, bolting AI onto the old workflow is not enough. The question is what unit of value remains durable when the human login is no longer the center of work.

Subscriber Operating Lens

A vendor facing agent-mediated work has to name its durable unit of value. Is it data access, regulated workflow, accountability, integration depth, risk transfer, or measurable outcome? If the answer is merely a human seat, the pricing story is exposed.

The same logic applies to professional work. A person should ask which parts of their contribution are tied to effort and which parts are tied to judgment, trust, or accountability. AI compresses effort. It does not erase the need for accountable judgment, but it does force that judgment to be made explicit.

For subscribers, the practical question is pricing architecture. If agents complete more work through fewer logins, what does the buyer still need to pay for, and how will the seller prove it?

What To Inspect Next

That means asking three questions after the relevant decision:

What assumption did we rely on?
What exception did we allow?
What would a future person or agent need to know before making the next call?

If the answer disappears after the meeting, the organization has not built memory. It has only produced activity.

Closing Note

Shivanath Devinarayanan, Chief Digital Labor and Technology Officer at Asymbl. These views are my own.

Architecture Decays When Context Cannot Fit In One Head

Shivanath Devinarayanan — Wed, 20 May 2026 14:41:02 GMT

Bad architecture is often blamed on bad judgment. In mature systems, the more common culprit is missing context. Good people make locally sensible changes that accumulate into global decay.

This is the Substack sibling to the LinkedIn article, "Your Architecture Isn't Failing Because of Bad Engineers". It uses the same local source base, but it changes the reader promise: less public thesis, more operating judgment for subscribers.

Entropy Is A Context Problem

The failure pattern is familiar: each change passes review, each optimization seems reasonable, each exception has a reason. Then the system slows, fragments, or contradicts itself because nobody could see the full interaction surface.

Human Limits Are Not Moral Failures

Working memory constraints matter. Architecture asks people to hold performance, maintainability, security, ownership, history, and business pressure together. No seniority level removes the biological limit.

Where AI Is Actually Useful

The practical opportunity is not to crown AI as chief architect. It is to use AI for tireless cross-checking, pattern enforcement, historical recall, and local-global comparison. That is vigilance work. Humans are not built to do it evenly forever.

The Subscriber Takeaway

Human architects should own novel design, tradeoffs, and risk judgment. AI should patrol entropy, surface precedent, and teach at the moment of change. The partnership works only when those responsibilities are explicit.

Subscriber Operating Lens

The best use of AI in architecture review is not a grand verdict. It is a persistent patrol for drift. Ask the system to compare new changes against accepted patterns, old incident notes, known bottlenecks, and cross-service constraints. The output should be a reasoned warning, not an automatic veto.

Human architects should still own the tradeoff. A rule can say a pattern adds risk. A person has to decide whether the risk is acceptable given timing, market pressure, and reversibility. That split of responsibilities keeps AI from becoming a brittle standards machine.

For subscribers, the operating model is partnership by cognitive strength. Let AI hold breadth and repetition. Let humans hold novelty, stakes, and business judgment. Make the handoff explicit.

What To Inspect Next

That means asking three questions after the relevant decision:

What assumption did we rely on?
What exception did we allow?
What would a future person or agent need to know before making the next call?

If the answer disappears after the meeting, the organization has not built memory. It has only produced activity.

Closing Note

Shivanath Devinarayanan, Chief Digital Labor and Technology Officer at Asymbl. These views are my own.

Careers Are Compressing Into Agent Orchestration

Shivanath Devinarayanan — Wed, 20 May 2026 13:15:53 GMT

The scary version of the AI career story is disappearance. The more useful version is compression. Roles, timelines, and learning cycles are collapsing into a shorter path, and the people who engage early get more repetitions.

This is the Substack sibling to the LinkedIn article, "AI Is Collapsing Futures - And Most of Us Are Misreading What That Means". It uses the same local source base, but it changes the reader promise: less public thesis, more operating judgment for subscribers.

The Horizontal Collapse

Functions still matter, but they are increasingly mediated through the same operating skill: directing AI systems with domain judgment. The marketer, analyst, engineer, and operator are not becoming identical. They are all becoming more software-shaped.

The Temporal Collapse

The leverage people expected to build over years is compressing into months. Waiting for stability feels prudent, but the learning curve belongs to people who are already building workflows, norms, and instincts.

Speed Creates Stability

The bicycle metaphor works because slow engagement keeps every new tool feeling alien. Faster engagement creates pattern recognition across systems. You learn what repeats, what fails, and what needs human judgment.

The Subscriber Takeaway

Do not treat AI literacy as a course to finish. Treat it as an operating rhythm. The durable asset is the habit of learning while the ground is moving.

Subscriber Operating Lens

The compression argument changes how people should learn. A quarterly training plan is too slow for a toolchain that shifts monthly. The better unit is a weekly practice loop: pick a real task, apply a new AI workflow, inspect the failure, and keep the part that improves judgment.

The goal is not tool collecting. It is transfer. A person who has used three agent systems begins to see common patterns: context boundaries, memory failures, permissions, evaluation gaps, and handoff risks. That pattern recognition becomes more durable than any single product tutorial.

For subscribers, the career question becomes concrete. What work are you doing this week that gives you repetitions in agent orchestration, and what evidence will tell you that your judgment improved?

What To Inspect Next

That means asking three questions after the relevant decision:

What assumption did we rely on?
What exception did we allow?
What would a future person or agent need to know before making the next call?

If the answer disappears after the meeting, the organization has not built memory. It has only produced activity.

Closing Note

Shivanath Devinarayanan, Chief Digital Labor and Technology Officer at Asymbl. These views are my own.

The Next Enterprise Moat Sits Above The System Of Record

Shivanath Devinarayanan — Wed, 20 May 2026 11:51:00 GMT

Systems of record are not going away. They are becoming infrastructure. The value is moving to the layer that decides what to do with the record, why, and under which precedent.

This is the Substack sibling to the LinkedIn article, "Your System of Record Is Becoming the Least Valuable Thing You Own". It uses the same local source base, but it changes the reader promise: less public thesis, more operating judgment for subscribers.

Records Belong To Functions. Decisions Cross Them.

A finance system, sales system, and people system each preserve a narrow view of reality. Important decisions cross those boundaries. The missing layer is not another database. It is a reasoning layer above the databases.

Why Context Beats Semantic Purity

Semantic layers tried to standardize business truth. Context graphs can preserve legitimate disagreement instead. Marketing and finance may calculate the same metric differently for valid reasons. The reasoning behind both versions is the useful artifact.

Vertical Context Will Win

The graph that supports hiring decisions is not the graph that supports incident response or pricing exceptions. The infrastructure can be horizontal. The decision product has to be vertical because precedent is domain-shaped.

The Subscriber Takeaway

The system of record tells an agent where to look. The decision layer tells it how prior people reasoned when the answer was not obvious. That is where enterprise value compounds.

Subscriber Operating Lens

A decision layer should not try to replace the systems underneath it. It should read across them, preserve context, and explain why a decision crossed functional boundaries. That makes it different from a dashboard. A dashboard reports state. A decision layer preserves judgment.

The first product question is vertical. Which domain has enough repeated judgment to justify a graph: recruiting, pricing, incident response, procurement, release governance, or field operations? The wrong answer is everything at once. The right answer is one domain where precedent already matters.

For subscribers, the strategy implication is clear. The enduring asset is not the generic graph infrastructure. It is the accumulated domain reasoning that makes a future decision better than a cold read of the records.

What To Inspect Next

That means asking three questions after the relevant decision:

What assumption did we rely on?
What exception did we allow?
What would a future person or agent need to know before making the next call?

If the answer disappears after the meeting, the organization has not built memory. It has only produced activity.

Closing Note

Shivanath Devinarayanan, Chief Digital Labor and Technology Officer at Asymbl. These views are my own.

The New Manager Is The Person Who Teaches The System

Shivanath Devinarayanan — Wed, 20 May 2026 10:25:05 GMT

When execution gets cheap, management stops being only a title. Anyone whose decision shapes the next human or digital action is managing part of the system. Most firms have not admitted this yet.

This is the Substack sibling to the LinkedIn article, "Every Employee Is Already a Manager". It uses the same local source base, but it changes the reader promise: less public thesis, more operating judgment for subscribers.

Cheap Execution Changes Status

The old hierarchy rewarded people who could get work done through scarce resources. AI compresses that scarcity. The scarce resource becomes judgment: what to attempt, when to deviate, and why the exception matters.

Orchestration Is Already Distributed

People are already directing agents, drafting with models, summarizing with tools, and making micro-decisions through AI systems. The risk is not only tool use. It is unrecorded reasoning at scale.

The Outcome Layer Belongs To Humans

Tasks can be automated. Outcomes still require someone to understand tradeoffs, incentives, timing, and trust. The people who stay valuable are the ones who can explain why an action is worth taking.

The Subscriber Takeaway

The next workforce architecture should treat every serious judgment call as a management act. Capture it, connect it, and let the system learn from it without pretending the human role disappeared.

Subscriber Operating Lens

In an agent-shaped workplace, management is not only supervision. It is the act of setting intent, checking outputs, resolving exceptions, and leaving traces that guide future work. That means individual contributors can manage meaningful parts of the operating system without having direct reports.

The risk is invisible delegation. A person uses an agent, accepts a recommendation, edits the result, and moves on. The organization sees the final artifact but loses the judgment that shaped it. Multiply that across hundreds of workers and the firm becomes more automated while becoming less explainable.

For subscribers, the practical move is to make orchestration visible. Record which decisions were delegated, which were overridden, and why the human accepted or changed the agent output. That is where the new management signal lives.

What To Inspect Next

That means asking three questions after the relevant decision:

What assumption did we rely on?
What exception did we allow?
What would a future person or agent need to know before making the next call?

If the answer disappears after the meeting, the organization has not built memory. It has only produced activity.

Closing Note

Shivanath Devinarayanan, Chief Digital Labor and Technology Officer at Asymbl. These views are my own.

Why Memory Programs Fail Before The Database Is Chosen

Shivanath Devinarayanan — Wed, 20 May 2026 09:00:42 GMT

The hard part of organizational memory is not storage. It is getting people to make their reasoning visible while the decision is still fresh. That is a behavioral system before it is a technical one.

This is the Substack sibling to the LinkedIn article, "You Can't Install Organizational Memory". It uses the same local source base, but it changes the reader promise: less public thesis, more operating judgment for subscribers.

Documentation As A Byproduct

The Toyota Kata lesson is not that every team needs more forms. It is that learning routines survive when they are embedded in the work itself. Separate documentation becomes optional. Optional work disappears.

Reasoning Requires Trust

Recording why you made a call creates an audit trail of judgment. In low-trust cultures, that feels like risk. A context graph stays empty when people believe the organization will weaponize the record.

Structure Beats Motivation

Decision hygiene works because it decomposes judgment into component assessments before the final call. That structure captures reasoning while improving the decision itself. The capture is part of the act, not a report afterward.

The Subscriber Takeaway

Start with one workflow, one added why field, and one review ritual. Software can make reasoning searchable. It cannot create the habit of saying the reasoning out loud.

Subscriber Operating Lens

A memory program fails when it asks for hero behavior. People will not consistently write long explanations after hard decisions just because a launch email asked them to. The capture has to be tiny, timed correctly, and tied to a decision they already need to make.

The manager role changes in this model. The manager is not the person chasing documentation. The manager is the person creating a review rhythm where reasoning can be spoken without punishment and sharpened without blame. The ritual creates the artifact.

For subscribers, the sequence matters. Start with trust, then structure, then tooling. Reversing the order produces an empty system with impressive search.

What To Inspect Next

That means asking three questions after the relevant decision:

What assumption did we rely on?
What exception did we allow?
What would a future person or agent need to know before making the next call?

If the answer disappears after the meeting, the organization has not built memory. It has only produced activity.

Closing Note

Shivanath Devinarayanan, Chief Digital Labor and Technology Officer at Asymbl. These views are my own.

The Compound Loop Hidden Inside Organizational Memory

Shivanath Devinarayanan — Wed, 20 May 2026 07:35:53 GMT

A single decision trace is not impressive. A hundred connected traces begin to reveal patterns. A thousand can change how the organization learns. The loop matters more than the archive.

This is the Substack sibling to the LinkedIn article, "The Loop Nobody Sees Coming". It uses the same local source base, but it changes the reader promise: less public thesis, more operating judgment for subscribers.

Archives Sit Still. Loops Improve.

Most knowledge systems store artifacts. The more valuable design is circular: decisions generate reasoning, reasoning becomes precedent, precedent improves future decisions, and future decisions generate better traces.

Why Consumer Loops Feel Smarter

Traffic apps improve when driver behavior and reports feed the next recommendation. Marketplace platforms often improve through similar feedback patterns. Enterprises often stop at logging the outcome. They miss the feedback mechanism that would make the next action better.

Emergence Is The Point

Kauffman's autocatalytic sets are useful because no individual element carries the whole system. The value appears when enough parts connect. Context graphs work the same way. Connections between traces become intelligence.

The Subscriber Takeaway

The moat is not a graph database. It is years of connected reasoning that rivals cannot download, scrape, or buy. That is why loop design deserves as much attention as model selection.

Subscriber Operating Lens

The loop only works if each cycle changes the next one. Capturing a trace is not enough. The trace has to be findable at the next decision, connected to similar cases, and reviewed after the outcome is known. Without that return path, the graph is an archive with better search.

The first metric should not be graph size. It should be reuse. How often did a past decision inform a new one? How often did a surfaced precedent change the recommendation? How often did a new outcome correct an old assumption? Those measures tell you whether compounding is happening.

For subscribers, the design challenge is to make learning circular. A linear workflow can move faster with AI. A loop gets smarter because the work changes the memory that guides the next pass.

What To Inspect Next

That means asking three questions after the relevant decision:

What assumption did we rely on?
What exception did we allow?
What would a future person or agent need to know before making the next call?

If the answer disappears after the meeting, the organization has not built memory. It has only produced activity.

Closing Note

Shivanath Devinarayanan, Chief Digital Labor and Technology Officer at Asymbl. These views are my own.

Precedent Is The Training Data Enterprises Already Produce

Shivanath Devinarayanan — Wed, 20 May 2026 06:15:06 GMT

Every judgment call creates two outputs. One is the immediate result. The other is a precedent. Most organizations keep the result and throw away the precedent.

This is the Substack sibling to the LinkedIn article, "Every Decision Trains the Next One". It uses the same local source base, but it changes the reader promise: less public thesis, more operating judgment for subscribers.

The Oldest Version Of The Problem

Ernest Codman wanted medicine to compare decisions with outcomes. The radical part was not measurement. It was accountability for reasoning. Enterprises now face the same problem at digital speed.

Tacit Knowledge Needs A Capture Moment

Polanyi was right that people know more than they can tell. That does not mean reasoning is impossible to capture. It means capture has to happen close to the decision, while the context is still live and the tradeoffs are still visible.

Precedent Without Review Is Dangerous

A context graph should not preserve every decision as wisdom. Some decisions encode bias, politics, or stale assumptions. The operating discipline is double-loop learning: inspect the target, not just the miss.

The Subscriber Takeaway

Treat experienced workers as generators of decision intelligence. Their judgment does not only resolve today's exception. It teaches the organization how to approach the next one.

Subscriber Operating Lens

Decision traces should be reviewed before they are reused. Otherwise the organization risks turning old bias into automated precedent. The question is not only what worked. It is whether the reasoning was sound, whether the assumptions still hold, and whether the outcome proved the judgment or merely rewarded luck.

A good precedent system needs a quality loop. Tag traces that were later contradicted. Mark cases where the outcome was good but the reasoning was weak. Keep examples where the team made the right call for the wrong reason, because those are exactly the cases that teach humility.

For subscribers, this is the difference between memory and training data. Memory preserves. Training data shapes future behavior. Once decision traces start guiding agents, trace quality becomes operational governance.

What To Inspect Next

That means asking three questions after the relevant decision:

What assumption did we rely on?
What exception did we allow?
What would a future person or agent need to know before making the next call?

If the answer disappears after the meeting, the organization has not built memory. It has only produced activity.

Closing Note

Shivanath Devinarayanan, Chief Digital Labor and Technology Officer at Asymbl. These views are my own.

The Memory Layer Your Systems Never Built

Shivanath Devinarayanan — Wed, 20 May 2026 04:50:44 GMT

Most enterprise systems are very good at remembering events. They remember the renewal, the approval, the moved date, and the closed ticket. What they forget is the reasoning that made those events make sense.

This is the Substack sibling to the LinkedIn article, "Every Company Has Amnesia". It uses the same local source base, but it changes the reader promise: less public thesis, more operating judgment for subscribers.

State Is Not Memory

A system of record tells you what happened. Organizational memory tells you why it happened. Confusing those two is how teams end up with pristine databases and no idea why a policy exists.

The Missing Unit Is The Decision Trace

A decision trace records the inputs, alternatives, exception logic, precedent, and accountable judgment behind a choice. It is not a meeting recap. It is the reasoning artifact that lets a future person or agent understand the decision without finding the person who made it.

AI Makes The Gap More Expensive

People can sometimes recover missing context through proximity. Agents cannot. They can search the CRM and the wiki, but if the reasoning was never captured, they have only state. That is how technically correct actions become operationally wrong.

How To Start Without Boiling The Ocean

Choose one judgment-heavy workflow. Record the why when an exception happens. Connect similar cases over time. The first goal is not perfect architecture. It is making reasoning durable enough to be found again.

Subscriber Operating Lens

A memory layer should begin where reasoning is already being generated under pressure. Discount approvals, hiring calls, vendor selection, escalation handling, and architecture exceptions are useful starting points because they already contain tradeoffs. The capture surface should be small enough that people will actually use it.

The minimum viable trace has four parts: the situation, the options considered, the reason for the chosen path, and the condition that would make the decision wrong later. That final condition matters. It prevents memory from becoming nostalgia. It tells the next person when precedent should not apply.

For subscribers, the practical move is to stop treating knowledge loss as a search problem. Search cannot retrieve reasoning that was never recorded. The first system to build is the habit of making the why durable.

What To Inspect Next

That means asking three questions after the relevant decision:

What assumption did we rely on?
What exception did we allow?
What would a future person or agent need to know before making the next call?

If the answer disappears after the meeting, the organization has not built memory. It has only produced activity.

Closing Note

Shivanath Devinarayanan, Chief Digital Labor and Technology Officer at Asymbl. These views are my own.