How Notion Solved Its Scalability Crisis
In 2021, Notion's popularity skyrocketed, but its service became unbearably slow. The company's post-crunch database was at terabytes of volume, even when compressed, and was ready to explode. Faced with losing customers, Notion needed to solve this issue fast.
At its core, everything in Notion is a block, which can be a piece of text, an image, or an entire page itself. Each block is stored as a RLE in PostgreSQL with its own unique ID. Blocks can be nested within other blocks to create complex tree-like structures, allowing for incredible versatility. However, this structure also means that even a simple document can result in hundreds or thousands of database entries.
With millions of users, Notion's database was overwhelmed, leading to increased latency and delayed responses. The company's single monolithic database could no longer handle the load, and costs were increasing exponentially.
The Solution: Horizontal Scaling
Notion decided to sharding all tables linked to the block table via foreign keys. This included workspaces, discussions, and comments, keeping related data in the same shard. The partition key was chosen to be the workspace ID, since users typically request data for a single workspace at a time.
The new sharding setup needed to handle existing data and scale to meet projected growth for at least two years. The instance type needed at least 60k total IOPs, and to maintain RDS replication, they set limits of 500 GB per table and 10 terabytes per physical instance.
After careful consideration, Notion chose to create 32 physical database instances, with each instance containing 15 separate logical shards. This resulted in a total of 480 shards across the 32 physical databases.
The Routing Mechanism
The routing mechanism was determined at the application level to determine where data is stored. The application uses load distribution provisioning smaller instances for new shards and hopes to reduce CPU, IOPS, and costs.
The Migration Plan
Notion's migration plan involved adding 96 new entries to PG bouncer, temporarily reflecting them to the 32 existing shards. This allowed them to gradually migrate data to the new shards as data was written to the new database.
However, testing uncovered a critical issue: since each old shard mapped to three new shards, they either needed to reduce the number of connections per PG bouncer instance or increase it by 3x. The solution was to chart the PG bouncer cluster themselves, creating four new clusters, each managing 24 databases. This allowed them to increase connections per PG bouncer per shard to 8, limiting total connections per Postgres instance to 200.
Dark Reads and Testing
Before rolling out these changes to production, Notion implemented dark reads for testing. They added functionality to fetch data from both old and new DBs when requests were being made, comparing results for consistency and locking any discrepancies to avoid impacting the user's experience.
The Failover Process
The failover process involved traffic pause, halting new connections while allowing ongoing queries to complete, replication check, ensuring new databases were fully caught up, and configuration updates. Application access to old databases was revoked, and PG bouncer was updated to point to new databases, reversing replication direction by streaming changes from new DBs to old DBs just in case.
Outcome
The recharting project was a significant success for Notion. Key outcomes included:
- Increased capacity
- Improved performance
- CPU and IOPS utilization decreased dramatically, with new utilization hovering around 20% during peak traffic, compared to the previous 90%.
This new architecture positions Notion to handle continued user growth and data demands.