2018-11-27 00:13 |
tl;dr: We published some cool gems written at Coinbase for locking and rate-limiting using Redis. Check them out!
One of the technologies that powers many of our services at Coinbase is the blazing fast key-value data store, Redis. We would like to share some of the libraries we have built that make use of some of Redis’s unique features.
MasterLockAt Coinbase, ensuring that no one can withdraw more Bitcoin than they are entitled to is among our highest priorities. To guarantee correctness in many of our systems, we make heavy use of locking around critical sections of code. For example, if someone makes a request to send Bitcoin, we must ensure that multiple processes do not concurrently generate and broadcast separate transactions to the network. A mutual exclusion lock (mutex) is a well understood concurrency primitive that synchronizes access to a resource between threads and processes. If you are running a multithreaded application in a single process, you can use a native mutex implementation, but if you are running thousands of processes on dozens of servers like we are, you need a distributed lock manager (DLM). MasterLock is the result of our desire for a lightweight DLM and dissatisfaction with the many existing Ruby locking libraries that use Redis.
The basic idea behind locking with Redis is that if a process wants to obtain exclusive access to a resource called “Account 13”, it sets the key “lock:account:13” in Redis. As long as the key was not set beforehand, the process can access the resource and unset the key when it is done. If the key was already set, however, the process must wait until the process that acquired the lock unsets the key. Conveniently, the Redis SET command has an option NX that only sets the key if it does not already exist.
What happens if a process fails to release a lock? If you are running many processes on many machines, at least one of them is bound to die spontaneously at some point. The simplest solution is to set an expiration time on all locks, and if any lock is held longer than that duration, it is released automatically. Again, Redis makes this easy to implement with the EX/PX options to the SET command, which set an expiration time on the provided key. The main problem we have seen with this approach is that execution times of critical sections of code can be highly variable and hard to predict. If a lock is released automatically while the owner process still thinks it has the lock, then you may end up with concurrent access to the protected resource, which should never happen. That is how you lose money.
Another approach is to have a session with the locking service and ensure that execution stops if the session is dropped. This is the approach taken by PostgreSQL advisory locks and Google’s internal Chubby service. We can simulate a session in Redis by periodically extending the lock expiration times as long as the owner process is alive, similar to keepalive signals in some session protocols. This allows us to keep the default expiration times short but maintain control of the lock even if the critical section of code runs longer than the default expiration time. This is the approach taken by MasterLock.
When MasterLock is initialized, it starts a background thread that is responsible for extending the lock lifetimes. When a lock is obtained by another thread, it registers it with the background thread. As long as the thread that acquired the lock is alive and has not explicitly released the lock, the background thread renews the lease on the lock periodically. If the thread or whole process dies, then the lock is allowed to expire.
What about Redlock?Some of you may be thinking “what happens if Redis goes down”? High availability locking in distributed systems is a notoriously hard problem and generally requires deployment of heavyweight infrastructure like ZooKeeper. The Redis team has proposed an algorithm called Redlock that registers locks with a quorum of Redis instances for higher availability. We may implement this in MasterLock in the future if we find ourselves in need of a higher availability DLM.
TrafficJamAnother use case we have for Redis is rate limiting. Across the site, we rate limit login attempts, identity verification attempts, API requests, etc. The rate limit checks have to happen quickly, and a user should not be able to skirt the limits if their requests happen to be processed by different servers.
We chose to rate limit various actions using rolling limits. Let’s say we want to limit the number of login attempts made from a particular IP address to 10 per hour. If a user makes 10 login attempts within a short period of time, they will now be unable to login. Instead of requiring the user to wait an hour before making any more attempts, however, we allow a new attempt every 6 minutes. So the number of attempts remaining replenishes itself continuously over the course of the next hour until the number of remaining attempts hits 10.
To implement this, TrafficJam needs to track two pieces of information for each IP: the number of login attempts used and the timestamp of the last attempt. Based on that, TrafficJam can calculate the number of attempts remaining right now with the following equation. Note that the limit parameters like the maximum number of attempts and the time period are configured in the client code.
remaining = MIN(max, max - used + max * (now — lastUpdate) / period)If every server is trying to limit the same action concurrently, how do we avoid race conditions in the check-and-set logic? Instead of having the client code check and set, we run an atomic increment operation on the Redis server. The operation increments the number of attempts used if there are any remaining and return false otherwise. This way the clients optimistically attempt to increment the rate limit, and reject the request if the limit has already been hit.
With Redis, clients can submit custom Lua scripts to be evaluated synchronously on the server using the EVAL command. The TrafficJam library invokes a custom Lua script to atomically perform the increment operation described above on the Redis server each time a rate limit is checked. Custom scripting is extremely powerful and is also used by MasterLock to perform safe releases of locks.
Show me the code alreadyThese are just two examples of nifty ways that we use Redis at Coinbase. Feel free to check out the MasterLock and TrafficJam code on GitHub or use the gems in your Ruby applications.
Thanks to Rob Witoff, Varun Srinivasan, and Jori Lallo for reading drafts of this post.
Open Sourcing Our Distributed Toolbox was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Similar to Notcoin - Blum - Airdrops In 2024