Skip to content
Ryan de Melo
Go back

Agentic Transaction Systems: Moving Money With Machines You Can Audit

Picture the meeting where someone signs off on an agent that can move money. It is never a debate about whether the model is smart enough. The model is plenty smart. The question that decides it, every time, is the one a regulator or a board member asks six weeks later: why did it pay that, and who is accountable. If you cannot answer that in under a minute, the agent does not ship. It does not matter how good the demo was.

The whole industry has been arguing about the wrong axis. People keep asking how much autonomy to hand an agent, as if the dial goes from “asks permission for everything” to “does whatever it wants” and the trick is finding the middle. That framing gets agents banned from finance entirely. The thing that actually lets a machine touch money is not how free it is. It is whether you can reconstruct, completely and honestly, what it did and why.

The boring layer is the product

I will say the unpopular part plainly. For agentic finance, the model is a commodity and the audit layer is the moat. You can swap the model. You will swap the model, probably twice a year, as the frontier moves and prices drop. What you cannot swap, and what took the longest to build on the reconciliation systems I have shipped, is the apparatus that records every decision in a form you can stand behind in front of an auditor.

Nobody puts that in the pitch deck. The deck shows the agent narrating a cash-flow anomaly in fluent prose, and the room loves it. (The room always loves the prose.) Then the platform goes into a regulated environment and the first real question is not about the prose. It is “show me the decision from last Tuesday at 14:03 and prove it could not have used data it did not have yet.” If your architecture cannot do that, you built a toy.

So the design problem is not “make a clever agent.” It is “make every action the agent takes explainable, replayable, and attributable.” Three words, and each one is a hard constraint that shapes the system.

Explainable, replayable, attributable

Explainable means the decision came with its reasons attached, captured at the moment it was made, not reconstructed later from logs that happen to be lying around. Not a vibe. The actual inputs, the tool calls, the model’s stated rationale, and the rule that gated the final action. If a payment went out, I want the record to say which invoice it matched, which tolerance it cleared, and what it was told about the account at that instant.

Replayable means I can take that captured state and run the decision again against exactly what was known at the time. This is the one people skip, and it is the one that saves you. An agent that paid a vendor in March looks insane in June when you know the vendor was fraudulent. The only fair question is whether it was a defensible decision given what was true in March. Replay against point-in-time data answers that. Replay against today’s data just relitigates the past with a stacked deck.

Attributable means every action traces to a responsible party. A human who approved it, a policy that authorized it, a specific agent identity operating under a named owner. “The system did it” is not an answer anyone accepts when money is missing. Someone owns every dollar that moves, and the record has to name them.

Deterministic boundaries around a non-deterministic core

Here is the part nobody tells you about putting an agent near money. The agent itself can be as probabilistic as it likes, but the edge where it touches the world cannot be. The model reasons freely. The tools it is allowed to call are a small, fixed, deterministic set with hard schemas and hard limits, and every call through that boundary is logged before it executes, not after.

An agent decision: inputs and tool calls flow through a deterministic boundary into an immutable audit record, with a replay path that rebuilds the decision from point-in-time state

The agent reasons however it wants. The boundary it acts through is deterministic, logged, and immutable, and the replay path can rebuild any past decision from the state that was true at the time.

That boundary is where the autonomy debate actually lives, and it is a much more useful place to have it. The agent does not “decide to wire funds.” It proposes a call to a tool named release_payment with a bounded amount, against an invoice, under a policy ID, and that tool checks the limit, checks the approval state, and writes the decision record before it does anything irreversible. If the record write fails, the action does not happen. The audit log is not a side effect of the transaction. It is a precondition for it.

This inverts the usual instinct, which is to act first and log for observability. In a normal service that is fine. When the action is irreversible and denominated in dollars, logging after the fact means your most important record is the one most likely to be missing exactly when you need it: right after the thing went wrong.

Immutable, or it does not count

A decision record you can edit is not evidence. It is a note. The records have to be append-only and tamper-evident, because the entire point is to be trusted by someone who does not trust you, the auditor, the regulator, the counterparty’s lawyer. On the systems I have built for this, the decision log is content-addressed and chained, so any after-the-fact edit is detectable. You do not need a blockchain and you do not need ceremony. You need an honest hash chain and storage that nobody, including your own ops team, can quietly rewrite.

I learned this one the unglamorous way. Early on we had beautiful structured logs in a store that engineers could mutate to fix “bad data.” Entirely well-intentioned. It also meant the logs proved nothing, because anyone with access could have changed them, so from an auditor’s view they might as well not exist. The fix was not more logging. It was removing the ability to edit. The discipline is the feature.

What this costs you

None of this is free, and I would be lying if I pretended the audit layer is cheap. Capturing point-in-time state for replay means you are versioning the world your agent saw, which is real storage and real plumbing. Deterministic tool boundaries mean you write more code and the agent feels more constrained than the unfenced demo. Immutable records mean you give up the convenience of fixing things in place.

I will take all of it. Because the alternative is an agent that is brilliant and unaccountable, and an unaccountable system that moves money is not a product you can sell into any business that has ever met a regulator. The autonomy is not the hard part and it was never the selling point. The selling point is that when someone asks why the machine paid that, you can answer, completely, and prove you are not making it up.

So when a team shows me their agentic finance platform, I have stopped watching the agent reason. I ask to see last Tuesday at 14:03, replayed. The ones who can show me that built the right thing. The ones who change the subject built a demo, and they are about to learn the difference in a room they would much rather not be in.


Share this post:

Previous Post
Eval-Driven Development: How I Actually Build LLM Features Now
Next Post
Resolving Stuck Receivables With RAG and Agents