Tpot And Tbt
Time per output token (TPOT) and time between tokens (TBT) are commonly used when measuring the decode performance of the inference of large language models.
The metrics are a bit different though. Let’s look at their definitions, formulas, and uses.
TPOT
- TPOT is also known as intertoken-latency (ITL).
- TTFT is the time to first token render which is the amount of time taken from when the request arrives to the first token being rendered.
- For a single request decoding N tokens the formula is: \(\frac{end2endlatency_{n} - TTFT}{n-1}\)
TBT
- TBT is the time between decoding successive tokens.
- For some consecutively decoded tokens the formula is: \(e2elatency_{i} - e2elatency_{i-1}\)
Comparison
Now you might think that these metrics are equivalent e.g. \(TPOT = \frac{e2elatency_{n} - TTFT}{n-1}\) \(= \frac{e2elatency_{n} - e2elatency_{1}}{n-1}\) \(= \frac{ \sum^{n}_{i=2} e2elatency_{i} - e2elatency_{i-1}}{n-1}\) \(= TBT_{avg}\)
By just normalizing TBT by the number of decoded tokens within the request: n-1
.
BUT this is incorrect.
TPOT aggregates within a request, but TBT does not.
Given 3 requests R1, R2, R3, we get 3 TPOT numbers.
But each request decodes a certain number of tokens; say P, Q, R for requests 1, 2, 3 respectively. TBT spans multiple requests.
Inference server metrics are reported as some aggregation over a pool: |TPOT_pool| = 3
and |TBT_pool| = P + Q + R
.
Both are useful to monitor the performance of an inference server, though TBT is more fine-grained. We can look carefully at tail TBT to identify stalls. We have to be careful since “average of averages” can dilute the usefulness of TPOT.
References
Citation
If you want to cite this blog post please use:
Atkinson, Adam. (Mar 2025). TPOT and TBT. adama's blog. https://adama.dev/2025/03/14/tpot-and-tbt.html.
Or the following BibTex citation:
@article{atkinson2025tbt,
title = "TPOT and TBT.,
author = "Atkinson, Adam",
journal = "adama.github.io",
year = "2025",
month = "mar",
url = "https://adama.dev/2025/03/14/tpot-and-tbt.html"
}
N.B.
I’ve stood on the shoulders of giants and benefitted from their technical blogs for over a decade: time to give back with a blog of my own :)