<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>BPE on Hitesh Pattanayak</title><link>/tags/bpe/</link><description>Recent content in BPE on Hitesh Pattanayak</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Wed, 02 Oct 2024 00:00:00 +0000</lastBuildDate><atom:link href="/tags/bpe/index.xml" rel="self" type="application/rss+xml"/><item><title>Sub-Word Tokenization: Breaking Words Like a Pro</title><link>/posts/tokenization/</link><pubDate>Wed, 02 Oct 2024 00:00:00 +0000</pubDate><guid>/posts/tokenization/</guid><description>Take a detour before diving into transformers and explore sub-word tokenization techniques like Byte-Pair Encoding, WordPiece, and Unigram models. Learn how they handle rare words, reduce vocabulary size, and make models more efficient!</description></item></channel></rss>