This section is a much more advanced exploration of subqueries, a powerful SQL feature for solving complex data retrieval problems. You will learn about different types of subqueries, the distinction between correlated and non-correlated subqueries, and essential operators used with them. Additionally, we will explore techniques like Common Table Expressions (CTEs) and Views to manage and simplify complex SQL queries.
§1 Review of Subqueries (Nested Queries) #
A subquery (also known as a nested query or inner query) is a SELECT statement embedded within another SQL query. It acts as a component of a larger query, allowing you to perform operations that might be difficult or impossible with a single, flat query.
The primary purpose of subqueries is to solve complex problems by breaking them down into smaller, more manageable parts. They enable you to use the result of one query as input for another, facilitating advanced filtering, calculations, and data manipulation.
For example, you might use a subquery to find the average order value and then use that average in an outer query to list all orders above that average.
§2 Types of Subqueries #
Subqueries can be categorized based on the number of values they return:
§2.1 Scalar Subqueries #
A scalar subquery returns a single value (one row, one column). Because they return a single value, they can be used in most places where a single expression is expected, such as in the SELECT list, WHERE clause, HAVING clause, or even as an argument to a function.
Syntax Example (in SELECT clause):
SELECT
column1,
(SELECT MAX(column_name) FROM another_table) AS max_value
FROM
your_table;
Scalar Subquery Example:
Find the average unit_price of all products and display it alongside each product’s name.
(this example uses ROUND(value,2) to avoid floating point rounding errors)
Note how the value of
average_priceis identical in all columns, a scalar only returns a single value.
§2.2 Row Subqueries #
A row subquery returns a single row but can return multiple columns. They are typically used in the WHERE or HAVING clauses, often with comparison operators (=, >, <, >=, <=, !=/<>) or with IN and NOT IN when comparing multiple columns.
Syntax Example:
SELECT column1, column2
FROM table1
WHERE (column1, column2) = (SELECT col_a, col_b FROM table2 WHERE condition);
The
tuplesyntax:(column1, column2) = (SELECT col_a, col_b FROM table2 WHERE condition);The expression
(column1, column2)is a tuple, and the row subquery must return exactly one row with the same number of columns so the comparison can succeed.
Row Subquery Example:
Find products that have the same unit_price and category_id as a specific product (e.g., product with id = 1).
§2.3 Table Subqueries #
A table subquery returns a table (multiple rows and multiple columns). These are highly versatile and can be used in the FROM clause (where they are often called derived tables), or with operators like IN, NOT IN, EXISTS, and NOT EXISTS.
Syntax Example (in FROM clause - Derived Table):
SELECT outer_col1, outer_col2
FROM (SELECT col_a, col_b FROM table1 WHERE condition) AS derived_table
WHERE derived_table.col_a > 10;
Table Subquery Example (Derived Table):
Find the total quantity of products ordered for each category.
Table subqueries allow you to create complex queries by creating intermediate tables. In the example above, try removing the SUM() and the GROUP BY clause:
SELECT c.name AS category_name, derived_orders.total_qty
FROM categories AS c
JOIN (
SELECT p.category_id, oi.qty AS total_qty
FROM products AS p
JOIN order_items AS oi ON p.id = oi.product_id
) AS derived_orders
ON c.id = derived_orders.category_id;
§3 Correlated vs. Non-Correlated Subqueries #
Understanding the execution flow of subqueries is important for performance and correct logic.
§3.1 Non-Correlated Subqueries #
A non-correlated subquery is a subquery that can be executed independently of the outer query. Its results are evaluated once, and then those results are passed to the outer query. The inner query does not reference any columns from the outer query.
Characteristics:
- Executes once.
- Results are cached and used by the outer query.
- Generally more efficient than correlated subqueries.
Non-Correlated Subquery Example:
List all products whose unit_price is greater than the average unit_price of all products.
§3.2 Correlated Subqueries #
A correlated subquery is a subquery that depends on the outer query for its values. It executes once for each row processed by the outer query. The inner query references one or more columns from the outer query.
Characteristics:
- Executes repeatedly, once for every row of the outer query.
- Can be less efficient for large datasets due to repeated execution.
- Often used with
EXISTS,NOT EXISTS, or for row-by-row comparisons.
Correlated Subquery Example:
Find all categories that have at least one product with a unit_price greater than $13.
In a correlated subquery, the value after
SELECTis effectively ignored, as the query is evaluating rows that simplyEXISTS, so any valid value can be placed afterSELECTin this query.SELECT 1 FROM products AS p WHERE p.category_id = c.id AND p.unit_price > 13This query is returning any row that belongs to the current category, where the
unit_priceis greater than $13, this effectively returns all categories that contain a product worth more than $13.
§4 Operators Used with Subqueries #
Several operators are commonly used to integrate subquery results into the outer query’s logic.
§4.1
IN / NOT IN:
#
- Used with row or table subqueries.
IN: Checks if a value exists within the set of values returned by the subquery.NOT IN: Checks if a value does not exist within the set of values returned by the subquery.
NOT INhas undefined behavior when encounteringNULLdata, useNOT EXISTSinstead ifNULLmay be encountered.
Example (IN):
Find customers who have placed orders.
§4.2
EXISTS / NOT EXISTS:
#
- Used with table subqueries, typically correlated.
EXISTS: ReturnsTRUEif the subquery returns any rows; otherwise,FALSE. It’s efficient because it stops processing as soon as it finds the first matching row.NOT EXISTS: ReturnsTRUEif the subquery returns no rows; otherwise,FALSE.
Example (EXISTS):
List categories that have at least one product.
§4.3
ANY / ALL:
#
- Used with scalar or row subqueries, often with comparison operators.
ANY(orSOME): ReturnsTRUEif the comparison isTRUEfor any value in the set returned by the subquery.> ANY: Greater than at least one value (i.e., greater than the minimum).< ANY: Less than at least one value (i.e., less than the maximum).
ALL: ReturnsTRUEif the comparison isTRUEfor all values in the set returned by the subquery.> ALL: Greater than every value (i.e., greater than the maximum).< ALL: Less than every value (i.e., less than the minimum).
Example (> ANY):
Find products with a unit_price greater than any unit_price in the ‘Electronics’ category.
SELECT name, unit_price
FROM products
WHERE unit_price > ANY (
SELECT p.unit_price FROM products AS p
JOIN categories AS c ON p.category_id = c.id
WHERE c.name = 'Electronics'
);
SQLite does not support the
ANYorALLoperators, this can be worked around by using theINoperator with a more specific subquery. Postgres, MySQL, and Oracle all support theANY/Alloperators.
§5 Techniques for Managing Complex Queries #
As queries grow in complexity, maintaining readability, reusability, and performance becomes challenging. SQL offers several techniques to manage this.
§5.1 Breaking Down Complex Problems #
The most fundamental technique is to decompose a large, complex problem into smaller, more manageable sub-problems. Each sub-problem can then be addressed with a simple query or subquery. This makes debugging easier and improves overall understanding.
Example: Find customers who have ordered products from at least 2 different categories
Let’s break this down:
- Identify what data we need: We need to connect customers to the categories of products they’ve ordered
- Break it into sub-problems:
- Find all products each customer has ordered
- Determine the categories of those products
- Count distinct categories per customer
- Filter for customers with 3+ categories
Build the solution incrementally:
- Find all products ordered by each customer
SELECT o.customer_id, oi.product_id FROM orders o JOIN order_items oi ON o.id = oi.order_id- Add category information
SELECT o.customer_id, p.category_id FROM orders o JOIN order_items oi ON o.id = oi.order_id JOIN products p ON oi.product_id = p.id- Add customer information
SELECT c.name, p.category_id FROM orders o JOIN order_items oi ON o.id = oi.order_id JOIN products p ON oi.product_id = p.id JOIN customers c ON c.id = o.customer_id;- Count distinct categories per customer
SELECT c.name, COUNT(DISTINCT p.category_id) as category_count FROM orders o JOIN order_items oi ON o.id = oi.order_id JOIN products p ON oi.product_id = p.id JOIN customers c ON c.id = o.customer_id GROUP BY c.name- Filter for customers with 2+ categories
SELECT name, category_count FROM ( SELECT c.name, COUNT(DISTINCT p.category_id) as category_count FROM orders o JOIN order_items oi ON o.id = oi.order_id JOIN products p ON oi.product_id = p.id JOIN customers c ON c.id = o.customer_id GROUP BY c.name ) AS customer_categories WHERE category_count >= 2Or alternatively using a CTE for better readability:
WITH customer_categories AS ( SELECT c.name, COUNT(DISTINCT p.category_id) as category_count FROM orders o JOIN order_items oi ON o.id = oi.order_id JOIN products p ON oi.product_id = p.id JOIN customers c ON c.id = o.customer_id GROUP BY c.name ) SELECT name, category_count FROM customer_categories WHERE category_count >= 2
Complex problems become manageable when broken into smaller pieces. Start simple, test each piece, then assemble your solution. The most elegant queries often emerge from patient, systematic construction
§5.2 Common Table Expressions (CTEs) #
Common Table Expressions (CTEs), introduced with the WITH clause, are temporary, named result sets that you can reference within a single SELECT, INSERT, UPDATE, or DELETE statement. They significantly improve query readability and can be used to break down complex, multi-step calculations.
A CTE can be referenced multiple times within the same query, which allows avoiding redundant code. They are necessary for recursive queries (e.g., hierarchical data).
Syntax:
WITH cte_name (column1, column2, ...) AS (
SELECT column1, column2, ...
FROM table_name
WHERE condition
)
SELECT *
FROM cte_name
WHERE condition;
CTE Example:
Calculate the total order value for each order and then find orders with a total value greater than $20.
§6 More Resources #
- https://mode.com/sql-tutorial/sql-sub-queries
- https://www.sqlrevisited.com/2024/08/correlated-vs-non-correlated-subquery.html
- https://learnsql.com/blog/common-table-expressions-complete-guide/
- https://www.w3schools.com/sql/sql_exists.asp
- https://www.w3schools.com/sql/sql_in.asp
§7 Questions #
Please number each response to directly correspond with the question. Provide clear, concise answers that fully address each question without unnecessary elaboration or essay-style writing. Provide only the SQL query unless otherwise specified.
- What is a subquery in SQL?
- What clauses can scalar subqueries be used with?
- True or False: A subquery can return multiple rows and multiple columns.
- Explain the primary difference between a correlated and a non-correlated subquery.
- Which operators require the subquery to be correlated?
- Which type of subquery is generally more efficient for large datasets: correlated or non-correlated?
- In the example
SELECT name, (SELECT MAX(unit_price) FROM products) FROM products;, how many times is the subquery executed?
- When would you use the
EXISTSoperator with a subquery?
- What does
EXISTS (SELECT 1 ...)return when the subquery finds at least one row?
- Why is
SELECT 1used inside anEXISTSsubquery?
- When would you choose
NOT EXISTSinstead ofNOT INwith a subquery (consider howNULLvalues are handled)?
- What is a Common Table Expression (CTE) and what keyword is used to define it?
- What is the primary benefit of using a CTE over a nested subquery?
- Provide the operator that must be used when comparing a tuple
(col1, col2)to a row subquery.
The remaining questions will use the database below, use the text box to test queries for the questions.
For these questions, I do not want the SQL outputs, only the SQL query.
You do not have to use subqueries to solve all of these, JOINs are acceptable.
Use the (view schema) link to view the list of tables and their structure.
- Display every product name and price together with the single highest unit price found in the products table.
- Return every product whose
supplier_id,category_idmatches exactly the product withsku‘WIDG-B’.
- Return the
nameandsupplier_idof any product whosesupplier_idequals that of the cheapest ‘Electronics’ item.
- List every product name together with its total
qtyordered, do not omit products with 0qtyordered.
- Show each product name, its
unit_price, and the average price of its category.
- List products whose unit_price is above the overall average price.
- List categories that contain at least one product priced above $10 using
EXISTS.
- Show the names of customers who have at least one order in any status.
- Display products that have never been ordered.
- List every product that belongs to a category, using
IN.
- Find customers that currently have zero orders (use
EXISTS/NOT EXISTS).
- Use a CTE named
OrderTotalsto compute the total value (qty*unit_price) for each order, then returnorder_idandtotal_valuefor orders worth more than 20.
- Create a CTE named
CntProdthat lists every category name and theCOUNT()of its products; then use it to show categories having fewer than 3 products.