Tag: cost-optimization

All the articles with the tag "cost-optimization".

I Asked My AI Agent for an Itemized Bill. It Got Awkward.

13 Jun, 2026

A month of heavy AI agent use, itemized: only a fifth of the tokens wrote code, almost half could run on a cheaper model, and the prompt cache quietly bills you for every coffee break.
The Cost of Agents: A FinOps Model for Token-Hungry Systems

7 Apr, 2026

An agent that loops and calls tools spends tokens like a single prompt never does. Here is the cost model that explains the bill, and the levers that cut it without making the agent dumber.
Small Fine-Tuned Models Are Beating Frontier on My Workloads

15 Apr, 2025

On narrow, high-volume tasks a fine-tuned small model matches frontier quality at a fraction of the cost and latency. Here is the pipeline, the eval bar, and the maintenance bill nobody quotes you.
vLLM, Quantization, and Serving LLMs on a Budget

16 Apr, 2024

Self-hosting an open model when GPUs are scarce and finance is reading the bill. Continuous batching, KV-cache, what quantization actually costs you, and when to just call a hosted API instead.

I Asked My AI Agent for an Itemized Bill. It Got Awkward.