This section introduces SQL indexes, an important tool for optimizing database performance. Understanding indexes is important for building fast and scalable applications.
§1 What is an Index? #
An index is a special lookup table that the database uses to speed up data retrieval. Think of an index like the index at the back of a book: instead of scanning the entire book (the table) to find a specific topic (a row of data), you can look up the topic in the index, which tells you exactly which page to go to.
Without an index the database has to perform a full table scan for many queries, meaning it must read through every single row in a table to find the data that matches your query’s WHERE clause. This can be intolerably slow for large tables with millions of rows.
An index is a data structure that stores the values of a specific column or a set of columns in a sorted order. Each entry in the index also stores a pointer to the corresponding row in the actual table. When you run a query with a WHERE clause on an indexed column, the database can use the index to quickly find the location of the data, which is faster as it avoids a full table scan.
§2
The CREATE INDEX Statement
#
The CREATE INDEX statement is a simple statement, it is used to create a new index on one or more columns of a table.
CREATE INDEX index_name
ON table_name (column1, column2, ...);
index_name: A unique name for the index.table_name: The name of the table to create the index on.(column1, column2, ...): The column(s) to be included in the index.
§2.1
CREATE INDEX Example
#
Let’s say we have a products table and we frequently query it to find products by their sku. Without an index, the database would have to scan the entire table every time.
First, let’s look at a query without an index. The database has to scan all rows.
-- No index on sku, so the database has to do a full table scan
SELECT * FROM products WHERE sku = 'WIDG-C';
Now, let’s create an index on the sku column.
CREATE INDEX idx_products_sku ON products (sku);
Once the index is created, repeating the same query lets the database find the rows where sku = 'WIDG-C' via idx_products_sku much faster than it could without that index.
-- the same query now performs much faster on large tables
SELECT * FROM products WHERE sku = 'WIDG-C';
The performance impact of an index is typically not noticeable unless the table has at least hundreds of thousands of rows. Once a database is large enough, an index can dramatically improve query performance, from seconds or even minutes to milliseconds.
§2.2 Unique Indexes #
A unique index ensures that the indexed column (or columns) contains only unique values. It serves two purposes: data integrity and performance. If you try to insert or update a row with a value that already exists in a unique index, the database will return an error.
CREATE UNIQUE INDEX index_name
ON table_name (column1, ...);
Primary keys automatically have a unique index created for them.
A UNIQUE constraint on a column is the standard SQL way to enforce uniqueness. Most DBMS implement this by creating a unique index behind the scenes.
For example, an ALTER TABLE ... ADD UNIQUE (col) and CREATE UNIQUE INDEX ON ... (col) often have the same result. The key difference is intent: a UNIQUE constraint is about enforcing a data rule, while CREATE UNIQUE INDEX is about creating a performance-enhancing index that also happens to enforce a rule.
If you attempt to create a unique index on a column that already contains duplicate values, the database will return an error, and the index creation will fail. You must first remove the duplicate data before the unique index can be successfully created.
§2.3 Composite Indexes #
A composite (or multi-column) index is an index created on more than one column of a table. The order of the columns in the index matters greatly. An index on (col_a, col_b) can be used efficiently for queries filtering on col_a alone, or on both col_a and col_b. However, it is generally not useful for queries that only filter on col_b.
Consider a query on a customers table:
SELECT * FROM customers WHERE last_name = 'Smith' AND first_name = 'John';
To optimize this query, you would create a composite index on (last_name, first_name):
CREATE INDEX idx_customers_name ON customers (last_name, first_name);
While it might be tempting to create many composite indexes to optimize every possible query, this has significant downsides. Each index consumes disk space and slows down write operations (
INSERT,UPDATE,DELETE).
The database must update every index each time data is modified. It’s a trade-off between read performance and write performance.
The goal is to create a small number of effective indexes that cover the most common query patterns.
§3 When to Use Indexes (and When Not To) #
Good candidates for indexing:
- Columns frequently used in
WHEREclauses. - Columns used in
JOINconditions (foreign keys are excellent candidates). - Columns frequently used in
ORDER BYclauses.
When to be cautious with indexes:
- Every time you
INSERT,UPDATE, orDELETEa row, the database must also update all the indexes on that table. Too many indexes can slow down write operations. - For very small tables, a full table scan is often faster than using an index because the overhead of reading the index is greater than the cost of scanning the table.
- Columns that have very few unique values (e.g., a
gendercolumn with ‘Male’, ‘Female’, ‘Other’) are generally poor candidates for indexing. The index won’t be very selective.
§4 Dropping Indexes #
If an index is no longer needed or is harming write performance, you can remove it using the DROP INDEX statement.
Syntax:
DROP INDEX index_name;
§5 More Resources #
- https://www.postgresql.org/docs/current/sql-createindex.html
- https://www.w3schools.com/sql/sql_create_index.asp
- https://www.geeksforgeeks.org/sql-indexes/
§6 Questions #
Please number each response to directly correspond with the question. Provide clear, concise answers that fully address each question without unnecessary elaboration or essay-style writing.
- What is the primary purpose of a SQL index, and what problem does it solve?
- Write the SQL statement to create an index named
idx_products_skuon theskucolumn of theproductstable.
- What is a potential downside of having many indexes on a table that is frequently updated with
INSERTorUPDATEstatements?
- You have a query that often filters by
last_nameand then byfirst_name. What would be an effective composite index to create, and why is the order of columns important?
- Is it useful to create an index on every table? why?