Archives
All the articles I've archived.
-
What Eighteen Years of Platform Builds Taught Me About AI Hype
After eighteen years of watching big data, mobile, microservices, cloud, and now AI agents arrive on the same script, here is how I separate the durable capability from the narrative.
-
Building GenAI for Regulated Industries Without Getting Fired
Two ways to fail when you ship GenAI into a regulated business: never ship at all, or ship something nobody can audit. The narrow path between them, from building this in financial services and industrial operations.
-
MCP a Year In: What Held Up, What Didn't
Sixteen months of building production systems on the Model Context Protocol. The interoperability bet paid off. Auth, versioning, and the demo-to-production gap are still where teams bleed.
-
Fine-Tuning Small Models in 2026: A Practical Pipeline
An end-to-end pipeline for fine-tuning a small model in 2026: distill the data, train adapters, hold an eval bar, ship behind a canary, and watch for the drift that quietly eats your accuracy.
-
Human-in-the-Loop Is a Design Problem, Not a Safety Net
Bolting a human approval step onto a badly designed agent does not make it safe. It makes a rubber stamp. The human-in-the-loop work is UX and architecture, not a checkbox.
-
The Cost of Agents: A FinOps Model for Token-Hungry Systems
An agent that loops and calls tools spends tokens like a single prompt never does. Here is the cost model that explains the bill, and the levers that cut it without making the agent dumber.
-
Eval-Driven Development: How I Actually Build LLM Features Now
My day-to-day loop for LLM features in 2026: write the eval first, then the prompt, then the code, and fold every production failure back in as a case.
-
Agentic Transaction Systems: Moving Money With Machines You Can Audit
An agent that moves money is only acceptable if every decision it makes can be explained, replayed, and pinned to a responsible party after the fact. The audit layer is the product.
-
Resolving Stuck Receivables With RAG and Agents
A production system that resolves stuck accounts-receivable mismatches by retrieving over invoices, contracts, remittances, and email, then proposing a fix a human approves before any money moves.
-
The Enterprise AI OS: My Thesis for the Next Five Years
The durable enterprise AI layer is not a model or a chatbot. It is an operating system that gives agents identity, permissions, tools, memory, and an audit trail over the systems a company already runs.
-
Multi-Agent Systems: When One Agent Isn't Enough
Most multi-agent designs are one agent's job split across five processes that now have to argue with each other. The few cases where splitting actually pays, and the complexity to refuse.
-
The LLM Observability Stack I Wish I'd Built Sooner
What to instrument for LLM and agent apps before the first incident: full request and tool-call tracing, token and cost per request, latency breakdown, eval scores in production, and turning real failures into eval cases.
-
Anomaly Detection for Cash Flow: Less Magic, More Plumbing
Catching reconciliation anomalies across institutional financial records, where the model is a rounding error and the matching and normalization is the whole job.
-
Why I Left a Director Seat to Build Again
I left a senior leadership seat at a large company to found an AI venture and write code again. The pull, the discomfort, and what eighteen years of platform work left me wanting to do.
-
Multimodal AI in the Field: Voice, Image, Form, Action
A closed-loop field inspection system that turns voice, a photo, and a half-filled form into a structured action, built for places where the network drops for hours.
-
RAG Over Enterprise Records: The Boring Parts That Matter
Enterprise RAG is trustworthy because of the unglamorous parts: per-user permissions enforced at retrieval, freshness, lineage, and handling records that change. Retrieval is an access-control problem wearing a search costume.
-
Sovereign AI: Running GPUs On-Prem When the Cloud Isn't an Option
For regulated workloads where the data legally cannot leave a building, on-prem GPU inference is back. The build-vs-rent math, the constraints nobody prices in, and the software that makes a fixed fleet feel like a platform.
-
Small Fine-Tuned Models Are Beating Frontier on My Workloads
On narrow, high-volume tasks a fine-tuned small model matches frontier quality at a fraction of the cost and latency. Here is the pipeline, the eval bar, and the maintenance bill nobody quotes you.
-
Agentic Workflows Need Guardrails, Not Vibes
How to put real constraints around an agent that touches money or production: bounded tools, approval gates on irreversible actions, dry-run modes, spend limits, and a tool-call audit trail you can actually read.
-
Building an MCP Server Fabric for Financial Operations
Instead of one large agent wired to every financial system, a fabric of small MCP servers, each wrapping one system with tightly scoped tools and an approval gate on anything that writes.
-
MCP Is the USB-C of AI Tools. Here's Why I'm Betting on It.
The Model Context Protocol standardizes how models reach tools and data, the way a connector standard kills a drawer full of adapters. The ecosystem is thin. I'm betting on the protocol anyway.
-
Shipping ML to Twenty Teams: The Platform Bet That Paid Off
Two years of running a self-service ML platform across twenty-odd product teams. What the paved paths got right, what we built too early, and the only success metric that turned out to matter.
-
Data Residency Is an Architecture Constraint, Not a Checkbox
A national regulator's residency and sovereignty rules redrew the topology of a payments system. Where data lives, what can leave, and where the keys sit are architecture decisions, not a config flag you toggle at the end.
-
Agents Are Coming. Most Demos Are Lying.
A skeptical look at agent reliability in late 2024, where the impressive demos quietly fall apart in production, and the narrow places agents already pull their weight.
-
Post-Merger Tech Integration: Ten Systems, Nine Months, Zero Downtime
Two large platforms merged, and now we owned two of everything. How we collapsed the overlap into one stack without a single customer feeling it, and why the politics were harder than the code.
-
Getting JSON Out of LLMs Without Crying
Function calling and JSON mode get you syntactically valid JSON. They do nothing about a model that fills the right shape with confident nonsense. The validation-and-repair layer you still have to write.
-
Negotiating a Nine-Figure Cloud Deal: What Engineers Should Know
A multi-year hyperscaler commitment is an architecture decision wearing a procurement costume, and engineers who skip the room get the bill.
-
The Lakehouse Won. Here's the Migration Nobody Warns You About.
Moving a multi-petabyte warehouse to an open table format over object storage. The format is a weekend. The operations are the project, and nobody puts that in the deck.
-
Hybrid Search: BM25 and Embeddings Are Better Together
Pure vector search quietly fails on the exact terms, codes, and acronyms users actually type. Combining BM25 with dense retrieval, fusing the two, and paying the latency bill it costs.
-
vLLM, Quantization, and Serving LLMs on a Budget
Self-hosting an open model when GPUs are scarce and finance is reading the bill. Continuous batching, KV-cache, what quantization actually costs you, and when to just call a hosted API instead.
-
Stop Fine-Tuning. Start Retrieving. (Usually.)
A decision framework for RAG versus fine-tuning that is not "it depends." Three questions settle most of it, and the cases where fine-tuning actually wins are narrower than the budget requests suggest.
-
Evals Are the New Unit Tests (And You're Not Writing Them)
Shipping an LLM feature with no evals is shipping with no tests, and almost everyone is doing it. A small, hand-written harness you run on every change, plus the honest limits of grading with another model.
-
Your RAG Is Bad Because Your Chunking Is Bad
A year into production RAG, the retrieval problems teams keep blaming on the model are almost always chunking, metadata, and document structure. Concrete fixes, with the splitting code I actually run.
-
Five Hundred Engineers, Four Countries, and Conway's Law
Across four countries and a few hundred engineers, the system came to look exactly like the org chart. After years of fighting that, I started using it on purpose.
-
BNPL From Scratch: Underwriting, Disbursement, Collections
Building a buy-now-pay-later product end to end and selling it to banks. The credit model is the easy part. Disbursement, reconciliation, and collections are where it lives or dies.
-
Fraud Detection at Sub-200ms: The Latency Budget Nobody Talks About
Real-time fraud scoring that lives inside the payment authorization path has a tiny latency budget, and a slightly worse model that fits it beats a better one that does not.
-
A Forecasting Ensemble That Actually Ships
A demand-forecasting ensemble (a classical statistical model, a sequence model, and gradient boosting) that took accuracy far enough to cut inventory hard, plus the boring data problems that mattered more than the model.
-
Llama 2 Is Here. Should You Self-Host?
The week Llama 2 dropped, half my inbox asked whether to pull inference in-house. The break-even math, the GPU scarcity, and the on-call tax nobody puts in the spreadsheet.
-
Platform Engineering: Paving Roads vs Building Cages
An internal ML platform that dozens of teams actually used, and the one test that told me whether I was paving a road or building a cage around it.
-
The Real Cost of a Customer Data Platform
Unifying a customer record across several business units sounds like a data project. It is mostly an ownership fight, and the maintenance never ends.
-
pgvector vs the Vector DB Gold Rush
Most teams adding semantic search this year should start in the Postgres they already run, not a new vector database. Where pgvector holds, where it doesn't, and how to tell which side of the line you are on.
-
Building a RAG Pipeline Before LangChain Was Cool
A production retrieval pipeline over a few hundred thousand internal documents, hand-rolled in early 2023. The model is the easy part. Retrieval is where the quality lives or dies.
-
Your Recommendation Engine Doesn't Need Deep Learning (Yet)
Collaborative filtering and plain co-occurrence carried a marketplace recommender to hundreds of millions of recs a day. The exact point where deep ranking earned its complexity, and why most teams reach for it a year early.
-
FinOps Is Just Capacity Planning With a Better Hat
Owning a large cloud P&L taught me that cost is an engineering-culture problem, not a dashboard. Where the tooling earns its keep and where it is theater.
-
Everyone Wants ChatGPT in Their Product. Most Should Wait.
Weeks after ChatGPT launched, every exec wants it shipped into the product. Here is the production math most teams have not done yet, and the short list of who should not wait.
-
Alternative Credit Scoring When the Bureau Has No File
Scoring hundreds of thousands of small merchants with no credit-bureau file, where the ensemble was easy and the fairness, reason codes, and feedback loops were the part that took a year.
-
ClickHouse Saved Us Real Money. Here's What It Cost.
Moving a large analytics workload to ClickHouse cut query latency and the bill hard, but only after we stopped designing tables the way Postgres taught us.
-
Payment Orchestration When Every Method Fails Differently
A payment orchestration layer across dozens of methods is not a router. It is a failure-handling system, because every method breaks in its own way and you only find out at checkout.
-
Migrating Petabytes Across Four Clouds Without a War Room
A long migration of hundreds of services and multi-petabyte data across four clouds plus colo, run as boring reversible waves instead of a heroic weekend.
-
The Feature Store Nobody Asked For
A feature store fixes training/serving skew and lets teams share features, but most adopt it a year early and pay the operational tax for nothing. Here is the line where it flips.
-
Data Mesh Is an Org Chart, Not an Architecture
Rolling out data mesh across dozens of business units taught me that domain ownership of data is a reporting-line and incentive change first, and a technology choice a distant second.