ASR and LLM cost assumptions

 

LLM model Llama3-8B
Real-time simultaneous voice streams 64,000
Average tokens spoken per second 5
Average length of call (seconds) 30
GPU hardware A100 80GB SXM
GPU software TensorRT-LLM v0.13
Input/output length 128:128
Cloud API costs
LLM input tokens (GPT-4o mini) $0.15 / M tokens
LLM output tokens (GPT-4o mini) $0.60 / M tokens
ASR (Deepgram Whisper) $0.0048 / minute