How Discord Reduced WebSocket Traffic By 40%
Discord, a popular communication platform, had a significant problem on its hands. Its real-time communication service, Gateway, was consuming more and more resources, leading to increasing costs estimated to be hundreds of thousands of dollars. This was a major issue for the company until they discovered a way to reduce the traffic by 40%.
Discord's Real-Time Communication Service
Gateway, a service that provides instant updates to clients, has been using zlib compression for these connections since 2017. zlib is a widely used data compression library that provides lossless compression. It is designed to be compact, fast, and portable, making it suitable for a variety of applications.
How zlib Compression Works
zlib compression combines LZ77 and Huffman coding. LZ77 compression works by maintaining a sliding window of size n of previously seen data and a look-ahead window of size M. It searches for the longest match between the look-ahead window and the sliding window. The output consists of a sequence of literals and backreferences to the sliding window.
Huffman coding is a method of encoding data using variable-length codes. It assigns shorter codes to more frequent characters, reducing the overall size of the data.
Streaming Compression
The team at Discord experimented with streaming compression, which allows the compressor to maintain context across multiple messages. This enables the compressor to optimize compression based on historical data without having to start fresh with each message.
However, Discord's Gateway service was written in an older language, and they couldn't find any existing bindings that offered this functionality. To overcome this limitation, the team forked the repository and added streaming support. They later contributed this enhancement back to the original project.
Experiments and Optimizations
The team at Discord experimented with various compression algorithms and parameters to achieve the best compression ratio. They focused on three key parameters: chain lock, hash lock, and window lock. These parameters offer a trade-off between compression speed, memory usage, and compression ratio.
After experimenting with different settings, they settled on slightly higher than default settings, which provided improved compression while also fitting comfortably within their gateway node memory constraints.
Passive Update v2
The team also noticed that passive update v1, which accounted for over 35% of Gateway bandwidth, was sending unnecessary information. They created passive update v2, which only sends the required information, reducing passive update usage from 35% to only 5% of the total Gateway traffic.
Implementation and Rollout
The team implemented Zstandard for desktop users, which involved finding and integrating appropriate Zstandard bindings for each platform. They also had to write their own bindings for Rust.
To minimize the risk of such a big change, the rollout was done behind a feature flag, which allowed for quick rollback if issues arose, helped validate results, and enabled monitoring and baseline metrics to ensure that changes were not negatively impacting user experience.
Results
After all the changes, Discord reduced its Gateway bandwidth by nearly 40%. This is a massive win, saving them hundreds of thousands of dollars a year.