Posts

All the articles I've posted.

Negotiating a Nine-Figure Cloud Deal: What Engineers Should Know

16 Jul, 2024

A multi-year hyperscaler commitment is an architecture decision wearing a procurement costume, and engineers who skip the room get the bill.
The Lakehouse Won. Here's the Migration Nobody Warns You About.

18 Jun, 2024

Moving a multi-petabyte warehouse to an open table format over object storage. The format is a weekend. The operations are the project, and nobody puts that in the deck.
Hybrid Search: BM25 and Embeddings Are Better Together

14 May, 2024

Pure vector search quietly fails on the exact terms, codes, and acronyms users actually type. Combining BM25 with dense retrieval, fusing the two, and paying the latency bill it costs.
vLLM, Quantization, and Serving LLMs on a Budget

16 Apr, 2024

Self-hosting an open model when GPUs are scarce and finance is reading the bill. Continuous batching, KV-cache, what quantization actually costs you, and when to just call a hosted API instead.

Negotiating a Nine-Figure Cloud Deal: What Engineers Should Know