Primary levers
- Model selection by task: simple classification, extraction, synthesis, and dense reasoning do not need the same model.
- Prompt size, instruction compression, and retrieved context: every irrelevant chunk increases cost and can worsen the answer.
- Caching, routing, batching, environment limits, and test-traffic controls reduce unnecessary recomputation.


