
Compare cascaded pipelines — speech-to-text, language model, text-to-speech wired in series — against end-to-end voice-native models. Move the sliders, tune your stack, watch the monthly bill move in real time.
Per-minute cost is a useful headline; the real bill scales with session length, system-prompt size, and how much of the audio is actually user speech. Tune all of it here.
Every checked combination contributes one row to the comparison below. Cascaded combinations are STT × LLM × TTS — selections fan out fast.
Move a slider, deselect a provider — everything below updates live.
Same configuration, four volume floors. The shape of the bill is steepest between 10k and 100k — that's where production discounts and orchestration costs start to matter.
—
Every number above traces back to a published rate and a clearly stated assumption. Open the panels below to inspect either.