Your String is Not What You Think It Is
Your String is Not What You Think It Is A Tour Through the Encoding Wars, and Why len("café") Returns 4 Reading time: ~13 minutes You called len("café") and Python told you 4. You passed that strin...

Source: DEV Community
Your String is Not What You Think It Is A Tour Through the Encoding Wars, and Why len("café") Returns 4 Reading time: ~13 minutes You called len("café") and Python told you 4. You passed that string to a function that encoded it to bytes. The bytes were 5 long. You stared at the screen for longer than you'd admit. Then you got a bug report from a user in Brazil whose name broke your database. Your colleague on a Windows machine opened the CSV you exported and saw é where there should have been é. You fixed it by guessing — add .encode('utf-8') here, .decode('utf-8') there — and it stopped crashing. But if someone asked you why, the honest answer is probably: "Something about encodings." Let's fix that gap. In Pressing a Key, I traced a keypress from the keyboard matrix to your shell's stdin. The scan code 0x04 became the letter a somewhere in the stack. But what is the letter a? It turns out the answer is deeper than you'd expect. The Core Confusion Here's the thing that trips everyon