Resolving Stuck Receivables With RAG and Agents

Six browser tabs open, and a person on the finance team is reading a remittance advice that someone at another company had typed straight into the body of an email, trying to line up which payment settled which invoice. Do that across receivables in the tens of millions and you get the state nobody there wanted to name: not bad debt, not a dispute, just stuck. The cash had arrived, or some of it had, and no one could match it to invoices by hand.

The difficulty has nothing to do with the model or the agents or the retrieval. The truth about a single payment is scattered across an invoice in one system, a contract clause in a PDF, a bank remittance file in a fixed-width format from 1998, and a one-line email that says “paid the March stuff less the credit, see attached.” Reconciliation is a reading comprehension task that finance has been doing by hand because the data was never clean enough to automate the dumb way.

We built something to read it: RAG over the documents, anomaly detection to find the mismatches, an agent to propose a resolution and draft the cash-flow note, and a human who approves anything that touches money. The lessons worth writing down came from its least glamorous corners.

The shape of it

Documents and ledgers feed retrieval and matching; an agent proposes a resolution; a human approves before posting

Everything left of the approval gate is reversible and cheap. Everything right of it moves money, so a person signs.

Two ledgers disagree: what we invoiced and what the bank says cleared. A normalization job lifts every payment, invoice, contract, and remittance email into one store with consistent keys. A matcher tries the easy cases first (exact reference, exact amount) and routes the rest to retrieval. For a stuck item, the agent pulls every document that plausibly touches that customer and that amount, reasons about what reconciles to what, and writes a proposed resolution plus a plain-language narration of the cash effect. Then it stops. A human reads the proposal, sees the evidence inline, and approves or rejects. Only on approval does anything post to the ledger.

The agent never moves money, only writes a recommendation and shows its work, and that one rule decided most of the architecture.

What was genuinely hard

Nothing on the project was harder than the remittance data, and I cannot overstate that. The model that reads a contract clause and decides whether a two percent early-payment discount applies is doing something almost easy next to parsing what a customer’s accounts-payable clerk actually sent us. We saw remittance advice as PDF tables, as scanned images of PDF tables, as Excel attachments with merged cells, as plain text in an email body, and in one memorable case as a photo of a printout taken at an angle. The information was all there, just never in the same place twice.

RAG earned its keep here, though not the textbook version. Chunking a contract works fine, but a remittance email is different: the meaning lives in the relationship between the line items, and splitting them loses exactly what you came for. Remittances stay whole, each one a single structured record with the raw text attached, and retrieval pulls the whole record or none of it. The index covers the messy stuff (emails, contract clauses, dispute notes), while the structured ledgers stay in Postgres where joins and money belong. Embeddings are for finding the relevant paragraph, not for deciding what equals what, so the moment a number matters it comes from the database rather than the vector.

The second hard thing was matching across systems that were never designed to be matched. The invoice ID in our ledger is not the reference the customer put on their wire. The customer batched four invoices into one payment, applied a credit against one of them, and short-paid another over a delivery dispute nobody logged. Where a naive matcher gives up, the agent holds the partial evidence and proposes the most likely allocation, with a confidence and a reason, the way a senior person in the finance team would. The win was in clearing the obvious cases cheaply and handing the human a clean starting point on the hard ones instead of a blank screen, never in getting every allocation right.

Counterintuitively, the value was in the long tail of boring matches, the ones stuck only because no human had time to open the tabs, rather than in the hard cases, and the agent clears that tail in bulk. The genuinely ambiguous allocations still go to a person, and they should, because those are the ones where being confidently wrong costs you a customer.

What was boring but essential

Idempotency looks like a footnote and is not. Every proposal carries a deterministic key derived from the items it touches: the invoice IDs, the payment ID, the resolution type. Run the agent twice on the same stuck item and you get the same proposal key, and the system refuses to create a second one. That key is what lets you re-run the whole pipeline after a crash, or after you improved a prompt, without doubling up on resolutions or, worse, double-posting to a ledger. An agent that loops and calls tools will get interrupted, so the retry is the case you design for, not the happy path.

def resolution_key(proposal):
    # Same stuck item + same fix must always hash the same, so a retry
    # (or a re-run after we tweak the prompt) can't create a second
    # proposal or, god forbid, post the cash twice.
    parts = (
        proposal.customer_id,
        tuple(sorted(proposal.invoice_ids)),
        proposal.payment_id,
        proposal.resolution_type,        # apply | short_pay | credit | writeoff
        round(proposal.amount_cents),    # cents, never floats near money
    )
    return hashlib.sha256(repr(parts).encode()).hexdigest()

def propose(stuck_item, evidence):
    draft = agent.draft_resolution(stuck_item, evidence)
    key = resolution_key(draft)
    existing = store.get_proposal(key)
    if existing:
        return existing            # already proposed; do not duplicate
    return store.put_proposal(key, draft, status="awaiting_approval")

The other boring essential is the audit trail, which in regulated finance is not optional. Every proposal records exactly which documents the agent retrieved, the matcher’s score, the model’s stated reasoning, who approved it, and when. This amounts to a first-class record you can put in front of an auditor and a regulator, not a log line. When the agent proposes applying a payment across three invoices with a credit, the approver sees the remittance email, the contract clause that justifies the credit, and the model’s allocation, all on one screen. Rather than rubber-stamping a black box, the approver is checking a worked answer against the evidence the system already pulled.

The design philosophy comes down to that split: the agent does the reading and the arithmetic of finding candidates, and the human keeps the judgment and the signature. Drawing that line in the right place is most of the work.

def post_resolution(key, approver):
    proposal = store.get_proposal(key)
    if proposal.status != "awaiting_approval":
        raise StaleProposal(key)                 # someone got here first
    if not approver.can_post(proposal.amount_cents):
        raise NeedsHigherApproval(proposal.amount_cents)

    with ledger.transaction() as txn:
        txn.apply(proposal.entries, idempotency_key=key)   # key again here
        audit.record(
            proposal=proposal,
            retrieved_docs=proposal.evidence_ids,   # what the agent read
            model_reasoning=proposal.reasoning,
            approved_by=approver.id,
        )
    store.mark_posted(key)

What I would tell someone building this

Whatever you do, don’t start with the agent. Build the boring half first: get the normalization, the idempotency keys, and the audit record right while the matcher is still dumb if-statements, because those are expensive to retrofit and impossible to fake in front of a regulator. The agent is the easy piece to add last, on top of a system that already cannot double-post and already remembers everything it did.

Resist the urge to let the agent close the loop, too. Every quarter someone asks why a person still has to approve, since the agent is right most of the time, and being right most of the time is exactly the problem. The cases where it is wrong are the ones where money goes to the wrong place and a customer relationship pays for it. The human is not in the loop because the agent is bad, but because the downside is asymmetric, and no eval number I can show you changes that math.