Tokenization breaks text into pieces that LLMs use to process language
Instead of understanding the meaning of words, large language models break up sentences into words and subwords, assigning each a set of numbers called tokens. Grammar rules are then "learned" by finding associations between these sets of numbers.





