This is part two of a series on database concurrency. Read the introduction at Database Concurrency, Part 1 .

Last time, I talked about multi-version concurrency control, or MVCC, and how it enables highly concurrent database systems while still guaranteeing transaction isolation. This is because MVCC allows reality (from the perspective of two distinct transactions) to diverge, giving us the unique advantage that readers and writers don't have to block each other. But how does it achieve this in practice, and what are the caveats?

Let me step back a bit and define some terms:

Transaction: A unit of work with well defined start and end boundaries, composed of a number of operations.

Isolation: The property that makes concurrent transactions appear as if they were executing serially.

Isolation in practice

Because the isolation property requires concurrent transactions to appear as if they are executing serially (one after the other), they must not interfere with each other by definition. Unfortunately, isolation in SQL is not a boolean property, but one of degree; some workloads don't benefit from full isolation, which is pretty slow. Here are some common isolation levels:

Read uncommitted: I like to think of this as "effectively, no isolation" because dirty read is allowed: uncommitted changes are globally visible. For example, you might have transactions:

Subtract $50 from A
Add $50 to B
Read balances from A and B

In this case its possible T2 executes in the middle of T1, and finds that $50 has disappeared without explanation. This makes it nearly impossible to reason about anything under concurrency.

Read committed: This level disallows dirty read, so it can only see committed transactions. However, it still allows non-repeatable read, an anomaly in which a transaction re-reads data and find that it has changed, if another transaction committed changes in between the two reads.

Repeatable read: This level disallows non-repeatable read. Unfortunately, it still allows phantom read, a different phenomenon where the same SELECT might return a different set of rows.

Wait - "non-repeatable" and "phantom" reads?

These two anomalies might at first seem identical, but according to the ANSI SQL spec, a read is "repeatable" or "non-repeatable" at the row level. Repeatable reads guarantee the same row will always have the same data.

A "phantom" read is a phenomenon at the result set level. This means the set might be different, even if no single row has changed. The simplest case is when an INSERT is committed in another transaction, causing it to be returned in a new read query.

Confusion here is justified, because this particular definition of "non-repeatable read" is not obvious: repeatable reads do not guarantee repeatable results. And there are a lot of sources where "phantom read" is said to be a special case of "non-repeatable read," including Wikipedia[1], in contradiction to the spec. Aren't databases fun?

Sometimes you need a sledgehammer

Ideally, we'd have an isolation level where dirty read, non-repeatable read, and phantom read weren't allowed. So where does this lead us?

The good news: it turns out there are actually two distinct isolation levels that guarantee this, snapshot isolation and serializable.

The bad news: disallowing all the above anomalies is not enough to guarantee serializability, which is why there are two. We used to think snapshot isolation would prevent read anomalies, until some folks proved it couldn't. Also, both Oracle and PostgreSQL have at one point or another called snapshot isolation "serializable," even though it isn't, and the SQL spec itself seems assume that preventing these three phenomena is equivalent to serializable isolation, even though this is easy to disprove.

So far this seems pedantic, so let's look at an example. Imagine two empty tables A and B, and two concurrent transactions that insert a count of the rows in the other table:

INSERT into B count(*) from A;
INSERT into A count(*) from B;

If these two transactions run at the same time under snapshot isolation, they will both insert a single row with a 0. This makes sense, because each transaction has its own snapshot of the database, in which it sees the other table is empty. However, of the two possible serial orderings [T1,T2], and [T2, T1], neither of them is consistent with what actually happened. So these transactions are not serializable; but they also didn't suffer any of the three anomalies we've defined so far. =(

We're seeing a new anomaly not defined in the spec: write skew.

Enforcing isolation

The ANSI SQL isolation levels don't actually prescribe how to implement each level, it only describes "prohibited phenomena." In practice, most databases use a combination of version tracking and row locking. And most row locking implementations use some variation of reader-writer locks, which are common in many software systems. Readers don't interfere with other readers, so they don't block each other, but they block writers. Writers block readers, as well as each other.

Here is a table that shows when locks are needed at each isolation level, when using lock-based concurrency control:

Isolation level Write Operation Read Operation Range Operation
Read Uncommitted Statement Statement Statement
Read Committed Until Commit Statement Statement
Repeatable Read Until Commit Until Commit Statement
Serializable Until Commit Until Commit Until Commit

As you might have guessed, stronger isolation guarantees need more locking, decreasing performance. Note that at the lowest level of isolation, we release locks after every statement, which is very performant, but as we've seen, extremely unsafe. At the high end of serializable isolation, every lock we acquire to execute a statement is held until commit. This strategy has a special name: two phase locking.

To guarantee serializability, two phase locking has (you guessed it) two phases: an acquiring phase (releasing locks is not allowed), and a releasing phase (acquiring new locks is not allowed). Strict two phase locking goes further, and waits until the very end to release all locks at once during commit, which is helpful for preventing cascading aborts, although it costs concurrency and is more prone to deadlocking.

The big picture

The point of all this is that transaction isolation in MVCC is not easy to get right. Even at snapshot isolation -- the highest[2] isolation level provided by some databases -- it's easy to run into bugs that will silently corrupt your data.

One mindset is to (stick your fingers in your ears and) 1) allow data to be randomly corrupted, or 2) forfeit performance by running your database under true serializable isolation. The other is to solve these problems at the application level, by deliberately structuring your transactions to avoid these problems.

Next up: How to work at read committed / repeatable read isolation and still prevent common anomalies at these levels!

[1] And some databases even treat "repeatable read" as equivalent to "serializable."
[2] Technically though, snapshot isolation is not a superset of repeatable read, because SI allows anomalies that do not occur under RR. See "A5B Write Skew" in A Critique of of ANSI SQL Isolation Levels.