This section introduces the concept of transactions in SQL databases. We will be exploring how they ensure data integrity and reliability, especially in multi-user environments.

The SQL examples in this section will be using PostgreSQL syntax unless otherwise noted. Each DBMS’ language implementation will differ.

§1 Transaction Basics #

A transaction in a database is a single logical unit of work, which may consist of one or more SQL statements. The key characteristic of a transaction is that it must be treated as a single, indivisible operation: either all of its changes are successfully applied to the database, or none of them are. This “all or nothing” principle is useful for maintaining data integrity during complex operations.

For example, consider a bank transfer:

moving money from account A to account B involves two operations:

Debit account A.
Credit account B.

If only the first operation succeeds, money is lost. If only the second succeeds, money is created. A transaction prevents inconsistent data: either all of its operations are successfully applied, or the entire transaction is undone, restoring the database to its previous state before the transaction began.

§1.1 ACID Properties #

Transactions are designed to adhere to a set of properties known as ACID, which stands for Atomicity, Consistency, Isolation, and Durability. These properties guarantee reliable processing of database transactions.

Atomicity: This property ensures that a transaction is treated as a single, indivisible unit of work. Either all of its operations are completed successfully, or none of them are. If any part of the transaction fails, the entire transaction is rolled back, and the database returns to its state before the transaction began.

Example: In a money transfer, either both the debit and credit operations complete, or neither does.

Consistency: A transaction must bring the database from one valid state to another valid state. It ensures that all data integrity rules (e.g., primary keys, foreign keys, check constraints) are maintained. If a transaction attempts to violate these rules, it is rolled back.

Example: A transaction cannot result in an account having a negative balance if a constraint prevents it.

Isolation: This property ensures that concurrent transactions execute independently without interfering with each other. The intermediate state of one transaction is not visible to other transactions. This makes it appear as if transactions are executed serially, even if they are running concurrently.

Example: If two users try to update the same inventory count at the same time, isolation ensures that the final count is correct, as if one updated then the other.

Durability: Once a transaction has been committed, its changes are permanent and will survive any subsequent system failures (e.g., power outages, crashes). The committed data is written to persistent storage.

Example: After a bank transfer is committed, even if the database server crashes immediately after, the transfer will still be recorded when the system recovers.

§2 Transaction Management: `COMMIT` and `ROLLBACK` #

SQL provides specific commands to manage transactions:

BEGIN or START TRANSACTION: Initiates a new transaction. All subsequent SQL statements until a COMMIT or ROLLBACK are part of this transaction.
COMMIT: Saves all changes made during the transaction permanently to the database. Once committed, the changes are durable and visible to other transactions.
ROLLBACK: Undoes all changes made during the current transaction. The database reverts to the state it was in before the transaction began. This is used when an error occurs or when a transaction needs to be canceled.

Example:

-- Start a new transaction
BEGIN;

-- Debit account A
UPDATE Accounts SET balance = balance - 100.00 WHERE account_id = 1;

-- Credit account B
UPDATE Accounts SET balance = balance + 100.00 WHERE account_id = 2;

-- If both updates succeed, commit the transaction
COMMIT;

In this example, neither account will be updated unless both are able to update. This does not prevent business logic errors such as an account having a negative balance, that requires explicitly checks.

Rollback Example:

-- use a DO block to declare a temporary variable
DO $$
DECLARE
    account1_balance NUMERIC;
BEGIN  -- Start transaction 
  -- Check balance
  SELECT balance INTO account1_balance FROM Accounts WHERE account_id = 1;

  -- Validate balance
  IF account1_balance < 100.00 THEN
    ROLLBACK;  -- abort transaction if insufficient balance
    -- terminate the DO block
    RAISE NOTICE 'Transaction failed: Insufficient funds in account 1.';  
  END IF;

  -- Debit account 1
  UPDATE Accounts SET balance = balance - 100.00 WHERE account_id = 1;
  -- Credit account 2
  UPDATE Accounts SET balance = balance + 100.00 WHERE account_id = 2;
  
  -- (implicit, not required) completes transaction only if all steps above succeed
  COMMIT; 
END $$;

Here, the transaction will only complete when the account has sufficient funds AND all table updates are completed successfully.

§3 Concurrency Control #

In a multi-user database or with asynchronous applications, many transactions often run concurrently. Concurrency control mechanisms are essential to ensure that these simultaneous transactions do not interfere with each other and that the ACID properties are maintained. Without proper concurrency control, data inconsistencies can occur.

§3.1 Potential Concurrency Issues #

All of these examples will assume an existing table with the following schema/contents:

CREATE TABLE Inventory (
  product_id INT PRIMARY KEY,
  stock_quantity INT,
  price DECIMAL
);

INSERT INTO Inventory (product_id, stock_quantity, price) VALUES (101, 50, 5.00);

Lost Updates: Occurs when two transactions read the same data, both modify it, and then one transaction’s changes are overwritten by the other’s, effectively “losing” the first update.

Example: Imagine two users (User A and User B) are simultaneously trying to reduce the stock of product_id = 101:

-- User A reads the current stock
SELECT stock_quantity FROM Inventory WHERE product_id = 101;
-- Result: 50

-- User A calculates new stock (50 - 10 = 40)
-- User A updates the stock
UPDATE Inventory SET stock_quantity = 40 WHERE product_id = 101;

At the same time:

-- User B reads the current stock
SELECT stock_quantity FROM Inventory WHERE product_id = 101;
-- Result: 50 (User A's update is not yet committed, so B sees the original 50)

-- User B calculates new stock (50 - 5 = 45)
-- User B updates the stock
UPDATE Inventory SET stock_quantity = 45 WHERE product_id = 101;

Now if you now query the Inventory table:

SELECT stock_quantity FROM Inventory WHERE product_id = 101;
-- Result: 40

Both users intended to reduce the stock. User A by 10, User B by 5. The expected final stock should be 50 - 10 - 5 = 35. However, the final stock is 40. User B’s update (reducing stock to 45) was “lost” because User A read the initial value of 50 and overwrote it with their calculated value of 40. This “lost update” demonstrates a valid update from one transaction being effectively undone by another concurrent transaction. Utilizing Transactions can prevent this problem entirely (more on this later).

Non-Repeatable Reads: Occurs when a transaction reads the same row multiple times and gets different values each time because another committed transaction has modified that row in between the reads. This is possible when utilizing the default isolation level in PostgreSQL (more on isolation levels below).

-- Transaction A starts
BEGIN TRANSACTION;
-- Transaction A reads the stock_quantity for product_id 101
SELECT stock_quantity FROM Inventory WHERE product_id = 101;
-- Result: 50
-- some other operations continue during Transaction A...

-- At the same time, Transaction B starts
BEGIN TRANSACTION;
-- Transaction B updates the stock_quantity for product_id 101
UPDATE Inventory SET stock_quantity = 45 WHERE product_id = 101;
-- Transaction B commits its changes
COMMIT;
-- The database now officially has stock_quantity = 45 for product_id 101.

-- Transaction A later reads the stock_quantity for product_id 101 again
SELECT stock_quantity FROM Inventory WHERE product_id = 101;
-- Result: 45 (Transaction A gets a different value on its second read)

-- Transaction A now has inconsistent data within its own transaction, as the same query yielded different results.

Phantom Reads: Similar to non-repeatable reads but involves sets of rows. Occurs when a transaction re-executes a query returning a set of rows and finds that the set of rows has changed (e.g., new rows inserted or existing rows deleted by another committed transaction). This is possible when utilizing the default isolation level in PostgreSQL (more on isolation levels below).

-- Transaction A starts
BEGIN TRANSACTION;
-- Transaction A queries for all products with stock_quantity greater than 40
SELECT product_id, stock_quantity FROM Inventory WHERE stock_quantity > 40;
-- Result: (101, 50)

-- At the same time, Transaction B starts
BEGIN TRANSACTION;
-- Transaction B inserts a new product that also meets the criteria
INSERT INTO Inventory (product_id, stock_quantity, price) VALUES (102, 60, 10.00);
-- Transaction B commits its changes
COMMIT;
-- The database now officially has product_id 102 with stock_quantity 60.

-- Transaction A re-executes the same query
SELECT product_id, stock_quantity FROM Inventory WHERE stock_quantity > 40;
-- Result: (101, 50), (102, 60) (Transaction A now sees a "phantom" row that wasn't there before)

-- Transaction A now has an inconsistent view of the dataset, as the same query produced a different set of rows.

§4 Locking Mechanisms #

To prevent concurrency issues like the ones listed above, database management systems employ locking mechanisms. Locks temporarily restrict access to data, ensuring that only one transaction can modify or read specific data at a time (depending on the lock type).

PostgreSQL automatically creates and manages locks when you execute queries within transactions or when running standalone queries. While most database management systems employ automatic locking, the specific types of locks and their granularity (e.g., row-level vs. table-level) can vary significantly between different database systems. Refer to the documentation for each database to understand its specific locking behavior.

§4.1 Types of Locks: #

Shared Locks (reading lock):

Purpose: Allows concurrent reads but prevents writes that would change the data being read.
Acquired when a transaction or query only needs to read data.
Multiple transactions can hold a shared lock on the same data simultaneously.
A shared lock prevents any other transaction from acquiring an exclusive lock on the same data (i.e., prevents writes while reads are ongoing).

Exclusive Locks (writing lock):

Purpose: Ensures that only one transaction can modify data at a time, preventing lost updates and dirty reads.
Acquired when a transaction or query needs to modify (insert, update, delete) data.
Only one transaction can hold an exclusive lock on a piece of data at any given time (generally a row, in some cases an entire table).
An exclusive lock prevents any other transaction from acquiring either a shared or an exclusive lock on the same data.

When a transaction or query tries to acquire an exclusive lock while any shared locks are held, it will typically wait until the existing shared locks are released, new shared locks will wait until the exclusive lock is released.

NOTE: because locks prevent writes or reads from other queries, be mindful to avoid running queries that may take a long time within a transaction, as it can block other database operations until the query completes.

§4.2 Setting Transaction Isolation Level #

The specific behavior of the database concurrency control and locking mechanisms can be adjusted in most DBMS. In PostgreSQL, the default isolation level is “Read Committed,” which completely prevents dirty reads. However this does not necessarily prevent other concurrency anomalies such as non-repeatable reads and phantom reads. PostgreSQL offers alternative isolation levels, like “Repeatable Read” and “Serializable,” which can prevent these types of issues but come with trade-offs in terms of concurrency, performance, and potential transaction retries. Similar controls exist in other DBMS systems as well.

Transaction levels can be set on individual queries by using the ISOLATION LEVEL keywords.

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM my_table WHERE id = 1;
COMMIT;

Exact details on the differences and ideal use cases for each isolation type is a more advanced topic and out of scope for this lecture. I just wanted you to be aware of these mechanisms, as they can have a massive impact for certain workloads.

More Resources #

Questions #

What SQL command permanently saves all changes made during a transaction to the database?

What command is used to undo all changes made during the current transaction and revert the database to its previous state?

Which ACID property ensures that a transaction brings the database from one valid state to another valid state, maintaining all data integrity rules?

What concurrency issue occurs when a transaction reads the same row multiple times and gets different values because another committed transaction modified that row in between reads?

What type of lock allows multiple transactions to read data simultaneously but prevents any transaction from modifying that data?

What concurrency issue occurs when a transaction re-executes a query and finds that the set of rows has changed due to another committed transaction inserting or deleting rows?

What type of lock ensures that only one transaction can modify data at a time, preventing lost updates?

In this code, what happens if the account referenced by p_from has less balance than p_amt? Be specific.

CREATE OR REPLACE FUNCTION SafeTransfer(
  p_from INT, p_to INT, p_amt NUMERIC
) RETURNS VOID LANGUAGE plpgsql AS $$
DECLARE
  v_balance NUMERIC;
BEGIN
  SELECT balance INTO v_balance FROM Accounts WHERE account_id = p_from;
  IF v_balance < p_amt THEN
    ROLLBACK;
    RAISE NOTICE 'Insufficient funds';
    RETURN;
  END IF;
  UPDATE Accounts SET balance = balance - p_amt WHERE account_id = p_from;
  UPDATE Accounts SET balance = balance + p_amt WHERE account_id = p_to;
  COMMIT;
END;
$$;

In this code, what happens if the account referenced by p_to does not exist? does the transaction COMMIT?

CREATE OR REPLACE FUNCTION SafeTransfer(
  p_from INT, p_to INT, p_amt NUMERIC
) RETURNS VOID LANGUAGE plpgsql AS $$
DECLARE
  v_balance NUMERIC;
BEGIN
  SELECT balance INTO v_balance FROM Accounts WHERE account_id = p_from;
  IF v_balance < p_amt THEN
    ROLLBACK;
    RAISE NOTICE 'Insufficient funds';
    RETURN;
  END IF;
  UPDATE Accounts SET balance = balance - p_amt WHERE account_id = p_from;
  UPDATE Accounts SET balance = balance + p_amt WHERE account_id = p_to;
  COMMIT;
END;
$$;

In this code, what happens to the values of first_read and second_read if another session commits an UPDATE to the same row just after the first SELECT executes but before the second SELECT executes?

CREATE OR REPLACE FUNCTION demo_non_repeatable()
RETURNS TABLE(product_id INT, first_read INT, second_read INT)
LANGUAGE plpgsql AS $$
BEGIN
  START TRANSACTION ISOLATION LEVEL READ COMMITTED;

  first_read := (SELECT stock_quantity FROM Inventory WHERE product_id = 101);
  PERFORM pg_sleep(5);  -- wait 5 seconds
  second_read := (SELECT stock_quantity FROM Inventory WHERE product_id = 101);

  COMMIT;
  RETURN QUERY SELECT 101, first_read, second_read;
END;
$$;

In this code, what happens if the function is called while another transaction already holds an EXCLUSIVE lock on the row being selected?

CREATE OR REPLACE FUNCTION get_stock_now(p_id INT)
RETURNS INT
LANGUAGE plpgsql AS $$
DECLARE
  v_stock INT;
BEGIN
  SELECT stock_quantity INTO v_stock
    FROM Inventory
    WHERE product_id = p_id
    FOR SHARE;  -- `FOR SHARE` creates a shared lock
  RETURN v_stock;
END;
$$;

Section 14: Transactions and Concurrency Control