AI Inference Costs Are Crushing SaaS Gross Margins β Here's What to Do About It
Is your AI SaaS company skating on thin ice because of exploding compute costs you're not tracking?
In episode #365, Ben Murray tackles one of the most pressing financial challenges facing AI-first SaaS companies: the structural margin compression caused by LLM inference costs. Traditional SaaS was built on near-zero marginal cost per customer β that era is over. If you're building on top of AI, every prompt, query, and agentic workflow is a hard COGS line that scales with revenue, and if you're not managing it, it will quietly destroy your unit economics.
- Why AI-first SaaS companies are running 50β60% gross margins (vs. 70β80% for legacy SaaS) β and what Bessemer data shows about AI supernovas with margins as low as 25%.
- How inference and compute costs differ fundamentally from traditional SaaS COGS β and why they won't scale down the way hosting costs did
- Why token costs vary wildly (from $1β2 per million to $30β180+ for frontier models) and how that variability makes feature-level economics a CFO priority
- 5 tactical ways to reduce LLM spend: model routing, prompt caching, context compaction, semantic caching, and batch processing
- How to set up your GL accounts and COGS tracking to allocate inference costs by feature β so you actually understand the economics of what you've built
Tune in before your next board meeting β because if you're not tracking AI inference costs at the feature level, you're flying blind on your most important unit economics.
Resources Mentioned
- The SaaS CFO: https://www.thesaascfo.com/
- Ray Rike β AI to ROI Newsletter: https://ai2roi.substack.com/
- Tomas Tunguz: https://tomtunguz.com/
- Fungies.io β 5 Ways to Save on LLM Costs: https://fungies.io