A Voice Cost Calculator

Model the cost of every voice Ai stack before you build.

Compare cascaded pipelines — speech-to-text, language model, text-to-speech wired in series — against end-to-end voice-native models. Move the sliders, tune your stack, watch the monthly bill move in real time.

· Turn-based LLM accounting · Prefix caching modeled · 18 providers across 3 architectures
Min / month
Providers in mix
Combinations
$/min range
cheapest → priciest
01 · Assumptions

Set the shape of a conversation.

Per-minute cost is a useful headline; the real bill scales with session length, system-prompt size, and how much of the audio is actually user speech. Tune all of it here.

Sessions / day80
11,000
Session length20 min
30 s60 min
User speaks45%
5%80%
Ai speaks40%
5%80%
System prompt1,500 tok
2008,000
02 · Pick the stack

Mix & match providers.

Every checked combination contributes one row to the comparison below. Cascaded combinations are STT × LLM × TTS — selections fan out fast.

Cascaded pipeline · STT → LLM → TTS Three services wired in series · expand

Speech-to-text

$/min, streaming

Language model

$/1M tok · in · cached · out

Text-to-speech

$/1M chars
Voice native · end-to-end Audio in, audio out — one model · expand

Token-based

Audio in / cached / out · text in / cached / out

Flat-rate

$/min bundled
03 · Compare

Cost comparison.

Move a slider, deselect a provider — everything below updates live.

— combinations · — min / month
04 · Scale

The cheapest stack, at scale.

Same configuration, four volume floors. The shape of the bill is steepest between 10k and 100k — that's where production discounts and orchestration costs start to matter.

05 · Under the hood

Assumptions, rates, caveats.

Every number above traces back to a published rate and a clearly stated assumption. Open the panels below to inspect either.

Calculation assumptions

expand

All rates

expand