Loading
Loading
Computers can't read text like humans. They only understand numbers. A tokenizer converts your text into numbers (called "tokens"). Think of it like breaking a word into syllables, but for computers.
Type text and see how it's broken into tokens
Every text starts as individual characters.
Count which letter pairs appear most often.
Replace common pairs with a single token.
| Model | Vocabulary | What it means |
|---|---|---|
| GPT-4 | ~128K | 128,000 unique tokens in its dictionary |
| Llama 3 | 128K | Clean power-of-2 vocabulary |
| Qwen 2.5 | 152K | Optimized for Chinese + English |