This is part one of a series I'll be writing about database concurrency. Since this first post is a broad overview, I have simplified many concepts here.
High performance databases must support concurrency. As in many other software systems, databases can use read/write locks to maintain consistency under concurrent use (MyISAM in MySQL does this, for example). Conceptually - this is pretty simple: 1) There can be multiple readers. 2) Readers block writers. 3) Writers block each other as well as readers.
Modern database concurrency control has taken this concept pretty far, and protocols like strict 2-phase locking can give you concurrency and strong serializability guarantees. However, any system that depends on this sort of concurrency control suffers an inherent scalability problem: read/write locks prevent readers and writers from running simultaneously - by design. As the volume of reads/writes scale up, you run into situations where you have unnecessary queueing, or either the readers or writers get starved. MyISAM for example prioritizes writes ahead of reads; get enough write volume and your reads will block forever, because write operations will perpetually "cut in line."
There's no easy solution here. Prioritize readers ahead of writers? Now you're going to suffer the opposite problem . Set up a read-only slave? Enjoy dealing with your replication lag. Sharding? That almost makes NoSQL look attractive.
Most reasonably large systems have a lot of read and write transactions going on at once, so its not something we can really sweep under the rug.
MVCC is MVP
An interesting concurrency strategy in modern database systems is called multiversion concurrency control (MVCC). The fundamental insight behind MVCC is that readers and writers in different transactions will never block each other if we allow reality to briefly diverge and converge. Why is this useful?
- Every transaction starts isolated from every other transaction, they can all pretend they are the only ones running.
- We can now perform multiple operations and commit them as an all-or-nothing operation, guaranteeing the operations succeed or fail together.
- We can now read data from a consistent snapshot of the database, even as it continues to change in the background.
Allowing multiple versions of the database to exist simultaneously means we can provide all these guarantees under high concurrency. This is actually pretty incredible, if you think about it. But it's not all rainbows and sunshine.
"Eventual" consistency? No, sorry, we need a production database
Astute readers will probably realize supporting multiple consistent versions of reality complicates a lot of things that would otherwise be simple. For example, here are just a few complications to account for read queries:
- You can no longer quickly count the rows in a table. This seems to make many people both confused and angry, because it seems so unbelievably simple. The reality is that MVCC is very, very complicated. Remember, any number of transactions could have made INSERTS or DELETES that are currently invisible to your SELECT query, so the actual count depends on what operations are visible to the current transaction. The database index is no silver bullet because you still have to find and ignore all those pesky invisible rows.
- In addition, a DELETE statement doesn't necessarily delete a row, even after you commit. What if there is another transaction that started when that row still existed, and hasn't finished yet?
- UPDATE statements don't update - it writes a new row. The old row has to be kept around for transactions that haven't seen the update yet, or in case the transaction that wrote the UPDATE rolls back.
- If you do anything involving range queries, such as
SELECT * from accounts where balance > 1000,the database has to do all kinds of acrobatics with range locking, next-key locking, etc to ensure that gosh darnit, no insert or update operation in any other transaction can change this result until the transaction completes.
Which brings me to the elephant in the room: how to reconcile different versions of reality. Because eventually, you're going to encounter the database equivalent of a merge conflict:
- What happens if you try to UPDATE a row that a different transaction has updated with more recent data?
- What is the most recent data? The most recent commit? The most recent uncommitted change?
- How should constraint violations be handled? For example, what happens if two transactions try to claim the same unique username?
And the worst part of it is, under the default settings in Postgres and MySQL/InnoDB, these anomalies can silently corrupt your data with lost updates (an update gets reverted accidentally by a transaction that never knew about it) or write skew (two transactions read data and then write consistent updates that conflict when merged).
Next up: the different transaction isolation levels available in MVCC. Update: read part 2 here.
 Yes, you could use some sort of fairness algorithm, but that still doesn't solve the queueing problem.