3 Comments
User's avatar
Kubi's avatar

This article confused me even more than I was.

In the initial example you started with 7 distinct tokens and ended up with 3 distinct tokens. Later on, you said you need to perform 20 merges through the BPE algorithm to add 20 additional tokens.

Once decreased the vocab size, other increased? How?

I read in another source that the merged tokens are still retained in the vocabulary. So the first example of you is misleading.

Expand full comment
Klaus's avatar

In Step 2, we iteratively merge tokens from single characters. But do we remove single characters from our vocabulary after merging them? The final vocab only consists of 3 tokens 'low', 'est', '</w>', should it not additionally still contain the original characters?

Expand full comment
Sachin Singh's avatar

Extremely well written. Covers all the points and gives a very good idea of tokenisation

Expand full comment