AI Inference Costs Are Crushing SaaS Gross Margins β€” Here's What to Do About It

Is your AI SaaS company skating on thin ice because of exploding compute costs you're not tracking?

In episode #365, Ben Murray tackles one of the most pressing financial challenges facing AI-first SaaS companies: the structural margin compression caused by LLM inference costs. Traditional SaaS was built on near-zero marginal cost per customer β€” that era is over. If you're building on top of AI, every prompt, query, and agentic workflow is a hard COGS line that scales with revenue, and if you're not managing it, it will quietly destroy your unit economics.

  • Why AI-first SaaS companies are running 50–60% gross margins (vs. 70–80% for legacy SaaS) β€” and what Bessemer data shows about AI supernovas with margins as low as 25%.
  • How inference and compute costs differ fundamentally from traditional SaaS COGS β€” and why they won't scale down the way hosting costs did
  • Why token costs vary wildly (from $1–2 per million to $30–180+ for frontier models) and how that variability makes feature-level economics a CFO priority
  • 5 tactical ways to reduce LLM spend: model routing, prompt caching, context compaction, semantic caching, and batch processing
  • How to set up your GL accounts and COGS tracking to allocate inference costs by feature β€” so you actually understand the economics of what you've built

Tune in before your next board meeting β€” because if you're not tracking AI inference costs at the feature level, you're flying blind on your most important unit economics.

Resources Mentioned

Close

50% Complete

Almost there!

Please enter your name and email below!  I'll keep you updated on courses and SaaS metrics.  Thanks!  Ben.