Section 5: SQL Introduction

Query the Database

This section introduces the Structured Query Language (SQL), the standard language for managing and manipulating relational databases. We will explore the fundamental concepts of SQL and learn how to retrieve and filter data. This lecture focuses on the Data Manipulation Language (DML) aspects of SQL, which are essential for anyone working with data.

§1 What is SQL? #

SQL is the standard language for communicating with relational databases. Through concise, declarative statements it lets you define schemas, insert, update, delete, and query data with powerful filtering, joining, and aggregation logic. SQL works* across PostgreSQL, MySQL, SQL Server, Oracle, and more.

*While core SQL statements work across engines, each engine adds proprietary extensions and differs in detail.

Beyond data manipulation, SQL also secures data (GRANT, REVOKE) and keeps it consistent with transactions (BEGIN, COMMIT, ROLLBACK). Views, CTEs, window functions, and stored procedures turn it into a full analytics and orchestration layer, bridging raw tables to apps, dashboards, and analytics tools.

SQL is divided into several language elements:

CategoryPurposeExamples
Data Definition Language (DDL)Define database structure or schemaCREATE, ALTER, DROP
Data Manipulation Language (DML)Manage data within schema objectsSELECT, INSERT, UPDATE, DELETE
Data Control Language (DCL)Control access to the databaseGRANT, REVOKE
Transaction Control Language (TCL)Handle database transactionsCOMMIT, ROLLBACK

A typical SQL query lifecycle involves:

  1. Describe: Understand the schema and the relationships between tables.
  2. Retrieve: Get the data from the relevant tables.
  3. Shape: Format, filter, and aggregate the data.
  4. Return: Return the final result set.

§2 SQL Syntax #

Every SQL statement follows this pattern:

Examples:

SELECT first_name, last_name FROM employees WHERE dept = 'Sales';
INSERT INTO customers (id, name) VALUES (1, 'Ada Lovelace');
UPDATE products SET price = price * 1.05 WHERE category = 'Electronics';
DELETE FROM orders WHERE order_date < '2023-01-01';
CREATE TABLE logs (id SERIAL PRIMARY KEY, msg VARCHAR(255));
-- This is a SQL comment; SQL uses two consecutive dashes followed by a space to denote a comment

Every SQL statement begins with a keyword, followed by an identifier, then various clauses depending on the type of command being run. Lastly, all statements are completed with a semicolon (;).


SQL ignores whitespace, so a single space, many spaces, tabs, and newlines are all interpreted the same. Both of these statements are valid and equivalent:

SELECT * FROM myTable;
SELECT
  *
 FROM
     myTable;

Use indentation and whitespace to improve readability:

SELECT department,
       COUNT(*) AS emp_cnt,
       AVG(salary) AS avg_sal
FROM employees;

Here, whitespace is used to make the filter clause much more readable.

In SQL, keywords are not case-sensitive (unless the database is configured to enforce keyword casing). Identifiers are always case-sensitive.

These two statements are equivalent:

SELECT * FROM myTable;  -- preferred
select * from myTable;  -- equivalent, but less readable

It is considered best practice to write keywords in all caps for clarity.

§3 The SELECT Statement #

The SELECT statement is the most commonly used statement in SQL. It retrieves data from a database. The data returned is stored in a result table called the result-set. The SELECT statement is powerful and flexible, allowing you to retrieve, filter, sort, and group data.

§3.1 Basic SELECT Statement Example #

The example below is a working SQLite instance running in your browser. You can edit the SQL statement and get live updates. Try changing the SQL statement in the text box below to:
SELECT email FROM customers;

The (view schema) link above the text box takes you to a page for viewing the database schema. You can use this link to understand the structure of any database examples provided on this site.

§3.2 SELECT Statement Keywords #

These are the most common keywords used with a SELECT statement, in the order they are typically written:

§3.3 SELECT Statement Syntax #

The SQL SELECT statement follows a fixed logical order of clauses. Think of it as a pipeline: each clause feeds its result to the next one.

SELECT    <projection list>
FROM      <data source(s)>
WHERE     <row filter>
GROUP BY  <grouping cols>
HAVING    <group filter>
ORDER BY  <sort list>
LIMIT/OFFSET <result slice>;

Sample query using all SELECT statement keywords:

SELECT department,
       COUNT(*) AS emp_cnt,
       AVG(salary) AS avg_sal
FROM   employees
WHERE  hire_date >= '2020-01-01'
GROUP  BY department
HAVING AVG(salary) > 60000
ORDER  BY avg_sal DESC
LIMIT  5 OFFSET 10;

§3.3.1 SELECT FROM #

When querying items from a database, you use the statement: SELECT <columns> FROM <table_name>.

The <columns> can be a single column name, a comma-delimited list of columns, or a wildcard *.

Try changing the query below to these options:

The original SQL statement was SELECT * FROM customers;. The * is a wildcard meaning to select all columns in the table.

For real queries in an application, it is best practice to always specify explicit columns instead of using *. This improves readability and performance, as the database only retrieves the necessary data. It also avoids issues with database schema changes affecting the output, such as when additional columns are added.

You can try querying other tables in the database. Use the (view schema) link to see what other tables exist, for example: SELECT * FROM products;

§3.3.2 DISTINCT #

The DISTINCT keyword can be added after SELECT to remove all duplicate rows from the output set.

Try updating the query below to SELECT DISTINCT name,unit_price FROM products;. Notice how all the duplicate Widget A columns disappear.

Also notice how one Widget A column with a different price did not get removed. DISTINCT only removes completely identical rows (in this context, “row” refers only to the output).

§3.3.3 WHERE #

WHERE is the SQL clause that filters rows before they are returned, updated, or deleted. It keeps only the rows for which the search condition evaluates to TRUE. Rows that evaluate to FALSE or UNKNOWN (NULL) are discarded.

It supports the following comparison operators:

It also supports the following Boolean operators:

You can also add additional predicates such as LIKE (for pattern matching), IN (to specify multiple possible values), BETWEEN (for a range), NOT BETWEEN, and IS NULL or IS NOT NULL (to check for NULL values).

LIKE is a pattern-matching operator that filters character data using two wildcards:

IN tests whether a value equals any member of a list or sub-query result.


Live Example

Examples to try on the sample DB above:

Exact match:
-SELECT * FROM products WHERE sku = 'WIDG-A';

Negation:
-SELECT * FROM orders WHERE status <> 'CANCELLED';
-SELECT * FROM orders WHERE status != 'CANCELLED';

Range:
-SELECT * FROM orders WHERE order_date BETWEEN '2025-09-10' AND '2025-09-15';

Pattern:
-SELECT * FROM customers WHERE email LIKE '%@example.com';

Boolean:
-SELECT * FROM orders WHERE status IN ('PAID', 'SHIPPED');

Sub-query:
-SELECT * FROM products WHERE id IN (SELECT product_id FROM order_items);

Null-check:
-SELECT sku,name FROM products WHERE category_id IS NULL;

Remember you can view all contents of any table with SELECT * FROM <table_name>;

§3.3.4 ORDER BY #

ORDER BY is the SQL clause that turns an unordered set of rows into a cursor that is guaranteed to come back in a specific sequence. In other words, ORDER BY will sort the set returned by the SELECT statement.

In the example below, the outputs are sorted in ascending order (ASC). You can also change to sort in descending order (DESC).

The example above demonstrates sorting by multiple columns by adding comma-separated <column_name> DESC/ASC clauses.

If your output has NULL values, you can specify NULLS FIRST or NULLS LAST to control the sorting of NULL values.

§3.3.5 LIMIT #

When running queries or updates on a large database, such as a production environment, it is important to use a LIMIT clause to avoid accidentally modifying or querying more rows than intended. A good habit is to always include a LIMIT on your queries to avoid accidentally making massive queries to the database.

In the example below, we limit the query to a reasonable 5 entries. See what happens when you remove the LIMIT 5 from the statement below.

Complex queries that span multiple tables on a large database can easily cause performance issues if a LIMIT is not set. It is good practice to get into the habit of always adding a LIMIT clause to your SQL statements when working with large tables.

In worst-case scenarios, a complex query can accidentally cause an entire database to become unresponsive until the query completes, resulting in several seconds to even minutes of the database being unusable for anyone. LIMIT helps to minimize these kinds of impacts.


You can combine LIMIT with OFFSET to paginate the output. This keeps every individual query small and predictable while still allowing access to all data. You can try these queries on the database above.

-- Page 1
SELECT * FROM users
ORDER BY id
LIMIT 10 OFFSET 0;  -- get first 10 entries

-- Page 2
SELECT * FROM users
ORDER BY id
LIMIT 10 OFFSET 10;  -- get next 10 entries

Always pair OFFSET with an ORDER BY so the result set is stable across pages. If you skip ORDER BY, the same OFFSET value can return a different row each time, so your return value may skip rows, repeat rows, or miss data altogether.

The ORDER BY column also needs to be a stable unique key to avoid data inconsistency when data is simultaneously being added/removed.

Calculate the offset as (page_number – 1) × rows_per_page.

§4 Expressions #

You can use expressions in any SELECT statement, for example: unit_price * 0.5 to calculate a 50% off price. The trade-off between doing this in the database versus in the application code depends on the specific use case. The database is often more efficient for large datasets.

This does not affect the data in the database, the values are calculated on-the-fly when the query is returned.

You can view the original prices by removing the *.5 from the above query.

§5 Aliases #

SQL aliases are temporary nicknames you can give to columns or tables in a query.

Aliases are set using the AS keyword after any identifier, e.g. <old_identifier> AS <new_identifier>

Notice how this example renames name to product_name and unit_price to price_with_tax in the returned output set.

§6 More Resources #

Here are a few additional resources for the concepts covered in this lecture:

§7 Questions #

Always provide at least a brief explanation for your answers.
Please number each response to directly correspond with the question. Provide clear, concise answers that fully address each question without unnecessary elaboration or essay-style writing.

  1. Which SQL language category contains the statements SELECT, INSERT, UPDATE, and DELETE?
  1. List the six most common clauses of a SELECT statement in the order the database engine logically processes them.
  1. Why is it generally better to select specific columns instead of using *?
  1. What is the purpose of the LIMIT clause?
  1. True or false: DISTINCT removes rows that are identical only in the columns you list after SELECT.
  1. What wildcard characters does the LIKE operator provide, and what does each one match?
  1. Write both variants of the comparison operator that means “not equal” in SQL.
  1. When you paginate with LIMIT … OFFSET …, which additional clause must you include to keep the result set stable across pages, and why?
  1. What punctuation mark officially ends every SQL statement?
  1. Are SQL keywords case-sensitive in a default PostgreSQL/MySQL installation?
  1. The following query is intended to list new orders created after 2025-09-05, but it returns zero rows even though the data exists. Identify the error.
SELECT * FROM orders
where status = NEW 
AND order_date > '2025-09-05';

The remaining questions will use the database below, use the text box to test queries for the questions.
For these questions, I do not want the SQL outputs, only the SQL query.

Use the (view schema) link to view the list of tables and their structure.

  1. Write a query that returns the name and email of every customer whose email ends in @example.com in the customers table.
  1. Write a query that queries the products table, and produces a list that shows each product’s name and its unit price increased by 50%. Name the new price column new_price.
  1. Write a query that lists all information about orders that do not have a status of 'SHIPPED'.
  1. Write a query that lists every customer’s name and e-mail address. Return the columns with the friendly headings CustomerName and EmailAddress.
  1. Write a query that retrieves the sku, name, and unit price of every product that costs less than $10.00.
  1. Write a query that produces a list of unique product names together with their unit price (two columns).
  1. Write a query that shows every order that is neither ‘NEW’ nor ‘CANCELLED’. Display only the order id and status.
  1. Write a query that returns all customer orders placed between 2025-09-01 and 2025-09-10 (inclusive). Show the id, customer_id, and order_date.
  1. Write a query that returns products that do not have a category assigned. List the SKU and name.
  1. Write a query that produces a list of products ordered first by category_id (ascending), then within each category by unit price (descending). Show SKU, name, category_id, and unit_price.
  1. Write a query that returns the five most expensive products (use unit_price) that belong to the ‘Electronics’ category. Show SKU, name, and price.