LLM model | Llama3-8B |
Real-time simultaneous voice streams | 64,000 |
Average tokens spoken per second | 5 |
Average length of call (seconds) | 30 |
GPU hardware | A100 80GB SXM |
GPU software | TensorRT-LLM v0.13 |
Input/output length | 128:128 |
Cloud API costs | |
LLM input tokens (GPT-4o mini) | $0.15 / M tokens |
LLM output tokens (GPT-4o mini) | $0.60 / M tokens |
ASR (Deepgram Whisper) | $0.0048 / minute |