Skip to content
Ryan de Melo

Archives

All the articles I've archived.

2026 10
June 2
May 2
  • MCP a Year In: What Held Up, What Didn't

    Sixteen months of building production systems on the Model Context Protocol. The interoperability bet paid off. Auth, versioning, and the demo-to-production gap are still where teams bleed.

  • Fine-Tuning Small Models in 2026: A Practical Pipeline

    An end-to-end pipeline for fine-tuning a small model in 2026: distill the data, train adapters, hold an eval bar, ship behind a canary, and watch for the drift that quietly eats your accuracy.

April 2
March 2
February 2
  • Resolving Stuck Receivables With RAG and Agents

    A production system that resolves stuck accounts-receivable mismatches by retrieving over invoices, contracts, remittances, and email, then proposing a fix a human approves before any money moves.

  • The Enterprise AI OS: My Thesis for the Next Five Years

    The durable enterprise AI layer is not a model or a chatbot. It is an operating system that gives agents identity, permissions, tools, memory, and an audit trail over the systems a company already runs.

2025 11
November 1
  • Multi-Agent Systems: When One Agent Isn't Enough

    Most multi-agent designs are one agent's job split across five processes that now have to argue with each other. The few cases where splitting actually pays, and the complexity to refuse.

October 1
  • The LLM Observability Stack I Wish I'd Built Sooner

    What to instrument for LLM and agent apps before the first incident: full request and tool-call tracing, token and cost per request, latency breakdown, eval scores in production, and turning real failures into eval cases.

September 1
August 1
  • Why I Left a Director Seat to Build Again

    I left a senior leadership seat at a large company to found an AI venture and write code again. The pull, the discomfort, and what eighteen years of platform work left me wanting to do.

July 1
June 1
  • RAG Over Enterprise Records: The Boring Parts That Matter

    Enterprise RAG is trustworthy because of the unglamorous parts: per-user permissions enforced at retrieval, freshness, lineage, and handling records that change. Retrieval is an access-control problem wearing a search costume.

May 1
April 1
March 1
  • Agentic Workflows Need Guardrails, Not Vibes

    How to put real constraints around an agent that touches money or production: bounded tools, approval gates on irreversible actions, dry-run modes, spend limits, and a tool-call audit trail you can actually read.

February 1
January 1
2024 12
December 1
November 1
October 1
  • Agents Are Coming. Most Demos Are Lying.

    A skeptical look at agent reliability in late 2024, where the impressive demos quietly fall apart in production, and the narrow places agents already pull their weight.

September 1
August 1
  • Getting JSON Out of LLMs Without Crying

    Function calling and JSON mode get you syntactically valid JSON. They do nothing about a model that fills the right shape with confident nonsense. The validation-and-repair layer you still have to write.

July 1
June 1
May 1
April 1
  • vLLM, Quantization, and Serving LLMs on a Budget

    Self-hosting an open model when GPUs are scarce and finance is reading the bill. Continuous batching, KV-cache, what quantization actually costs you, and when to just call a hosted API instead.

March 1
  • Stop Fine-Tuning. Start Retrieving. (Usually.)

    A decision framework for RAG versus fine-tuning that is not "it depends." Three questions settle most of it, and the cases where fine-tuning actually wins are narrower than the budget requests suggest.

February 1
January 1
  • Your RAG Is Bad Because Your Chunking Is Bad

    A year into production RAG, the retrieval problems teams keep blaming on the model are almost always chunking, metadata, and document structure. Concrete fixes, with the splitting code I actually run.

2023 12
December 1
November 1
October 1
September 1
  • A Forecasting Ensemble That Actually Ships

    A demand-forecasting ensemble (a classical statistical model, a sequence model, and gradient boosting) that took accuracy far enough to cut inventory hard, plus the boring data problems that mattered more than the model.

August 1
  • Llama 2 Is Here. Should You Self-Host?

    The week Llama 2 dropped, half my inbox asked whether to pull inference in-house. The break-even math, the GPU scarcity, and the on-call tax nobody puts in the spreadsheet.

July 1
June 1
May 1
  • pgvector vs the Vector DB Gold Rush

    Most teams adding semantic search this year should start in the Postgres they already run, not a new vector database. Where pgvector holds, where it doesn't, and how to tell which side of the line you are on.

April 1
March 1
February 1
January 1
2022 6
December 1
November 1
October 1
September 1
August 1
  • The Feature Store Nobody Asked For

    A feature store fixes training/serving skew and lets teams share features, but most adopt it a year early and pay the operational tax for nothing. Here is the line where it flips.

July 1
  • Data Mesh Is an Org Chart, Not an Architecture

    Rolling out data mesh across dozens of business units taught me that domain ownership of data is a reporting-line and incentive change first, and a technology choice a distant second.