Tag: cost-optimization
All the articles with the tag "cost-optimization".
-
I Asked My AI Agent for an Itemized Bill. It Got Awkward.
A month of heavy AI agent use, itemized: only a fifth of the tokens wrote code, almost half could run on a cheaper model, and the prompt cache quietly bills you for every coffee break.
-
The Cost of Agents: A FinOps Model for Token-Hungry Systems
An agent that loops and calls tools spends tokens like a single prompt never does. Here is the cost model that explains the bill, and the levers that cut it without making the agent dumber.
-
Small Fine-Tuned Models Are Beating Frontier on My Workloads
On narrow, high-volume tasks a fine-tuned small model matches frontier quality at a fraction of the cost and latency. Here is the pipeline, the eval bar, and the maintenance bill nobody quotes you.
-
vLLM, Quantization, and Serving LLMs on a Budget
Self-hosting an open model when GPUs are scarce and finance is reading the bill. Continuous batching, KV-cache, what quantization actually costs you, and when to just call a hosted API instead.