The ChatGPT Tokenizer counts exactly how many tokens your text uses for OpenAI models. Paste any text and it returns the precise token count — the same number the API bills you for — plus a colored, token-by-token view so you can see where the model splits your words. It runs entirely in your browser using tiktoken (OpenAI's own byte-pair encoding), so it is exact, instant, free, and private.
What is a token?
A token is the unit OpenAI models read and are billed in. It is usually a short chunk of a word — sometimes a whole short word, sometimes a few characters, sometimes just a space plus the start of a word. A rough rule of thumb for English is 1 token ≈ 4 characters ≈ 0.75 words, but the only accurate way to know is to run the real encoder, which is what this tool does.
Which encoding does my model use?
OpenAI models share a small number of encodings. Pick the one that matches your model:
| Encoding | Models | Typical use |
|---|---|---|
o200k_base | GPT-4o, GPT-4o mini, GPT-4.1, o1 / o3 / o4 (and newer) | Current chat and reasoning models; newest, most efficient tokenizer |
cl100k_base | GPT-4, GPT-4 Turbo, GPT-3.5 Turbo, text-embedding-3, text-embedding-ada-002 | Previous-generation chat models and the current embedding models |
If you are unsure, use o200k_base — it powers the models most people use today (GPT-4o and GPT-4.1).
How to count your tokens
- Paste or type your text into the box.
- Choose the model family (
o200k_basefor GPT-4o / GPT-4.1 / o-series,cl100k_basefor GPT-4 / GPT-3.5 Turbo). - Read the token count at the top, alongside characters, words, and characters-per-token.
- Scan the colored chips below to see exactly how the text is split into tokens; toggle Show token IDs to view the raw integer id of each token.
- Click Copy count to copy just the number, or Copy token IDs to copy the full id list.
Example: input → output
Input:
Tokenization is fun!
With o200k_base this encodes to 5 tokens: Token, ization, is, fun, !. Notice that "Tokenization" splits into two tokens and that the leading space is part of the is and fun tokens — that is why token counts don't line up with word counts.
Why count tokens?
- Cost: OpenAI bills per token, so token count is your true cost driver — far more accurate than counting characters.
- Context limits: Every model has a maximum context window measured in tokens. Counting first tells you whether a prompt plus its expected response will fit.
- Prompt engineering: Trimming a prompt from 1,200 to 800 tokens is a measurable win you can see live as you edit.
Is it exact and private?
Yes to both. The tool uses OpenAI's real tiktoken encodings (o200k_base and cl100k_base) through the open-source gpt-tokenizer library — not a "divide by four" estimate — so the count matches the API. The tokenizer code is loaded once from a public CDN and then runs on your device; the text you paste is never uploaded, which makes it safe for private prompts and confidential data.
A small note on the colored view: some tokens are partial byte sequences of a multibyte character (common with emoji and Japanese or other non-Latin scripts). On its own, such a fragment shows a � replacement character, but the neighboring tokens still reconstruct the correct text and the count stays exact.


