The Data Structure That's Okay With Being Wrong
The Million-Row Problem You're building a URL shortener. Every time someone creates a short link, you generate a random code and check if it already exists in the database. One database query per a...

Source: DEV Community
The Million-Row Problem You're building a URL shortener. Every time someone creates a short link, you generate a random code and check if it already exists in the database. One database query per attempt. At 1,000 URLs, this is fine — the query takes a millisecond, the index is tiny, nobody notices. At 100 million URLs, you're generating codes that collide more often (birthday paradox), each collision triggers another database round trip, and those round trips add up under high throughput. You're not slow because your code is bad — you're slow because you're asking the database a question it doesn't need to answer. What if you could check "does this code already exist?" without touching the database at all? A Bit Array With an Attitude A Bloom filter is a bit array (say, 20 bits, all starting at 0) combined with a handful of hash functions. Let me walk through the full lifecycle: Starting state — empty bit array: [0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0] 0 1 2 3 4 5