Building a Database: Intro
The database series I wanted and couldn't find. “Do NOT build a database!” is the internet's advice. But what if everyone listened? There is still plenty of opportunity to improve the space, and I don't believe the last database has been built yet.
Every senior engineer you ask will tell you the same thing: do not build a database. It's a tarpit. It's decades of work. The graveyard is full of clever people who thought they could do better than Postgres.
They're mostly right. And yet — the last database has not been built. The developer experience around storage is still clumsy. We paper over it with ORMs and pretend the abstraction holds.
§What this series is
We are going to build a small, honest storage engine from first principles. Not a toy — a thing with real on-disk structure, a real B-tree, a real write path. You will be able to open the bytes in a hex editor and understand every one of them.
Below is a live look at the page format we'll use for the rest of the series. Drag the slider; insert a row; watch the slotted page fill from both ends.
§Why pages, why slots
A page is the unit of I/O. The disk doesn't care about your rows; it cares about blocks. So we pack variable-length records into fixed-size pages, and we track where each one lives with a small slot array growing from the header downward while the data grows up from the floor.
When the two meet, the page is full. That's the whole trick. Everything sophisticated is built on this boring, beautiful invariant.
§The query has to find the page
Storing bytes is half the job. The other half is answering questions about them quickly. Here is the same query taking two very different paths — a sequential scan versus an index seek. Step through it.
Next issue: the B-tree that makes the index seek possible. We'll build it node by node and watch it split.
Get new issues by email.
New deep dives the moment they’re published. No spam, unsubscribe anytime.