Section 2: Relational Foundations

§1 Relational Foundations #

This section talks about foundational concepts for understanding how data is structured and managed in most modern database systems. We will explore core components such as tables and keys, and learn how they connect to form useful data stores.

§2 relational database concepts #

Relational databases are based on the relational model, an idea from mathematics. It’s a way of seeing data organized in tables.

§2.1 Tables, rows, columns #

Tables (Relations): Think of a table like a spreadsheet. It holds a collection of related data. For example, a Students table would hold information about students.

+-----------+-----------+----------+
| StudentID | FirstName | LastName |
+-----------+-----------+----------+
| 14378     | Alice     | Smith    |
| 28371     | Bob       | Jones    |
+-----------+-----------+----------+

Rows (Tuples/Records): Each row in a table represents a single item or record. In a Students table, each row would be a different student.

+-----------+-----------+----------+
| StudentID | FirstName | LastName |
+-----------+-----------+----------+
| 14378     | Alice     | Smith    |  ⬅︎ One Row of data
| 28371     | Bob       | Jones    |
+-----------+-----------+----------+

Columns (Attributes/Fields): Columns define the properties of the items in the table. For a Students table, you might have columns like StudentID, FirstName, and LastName.

+-----------+-----------+----------+
| StudentID | FirstName | LastName |
+-----------+-----------+----------+
| 14378     | Alice     | Smith    |
| 28371     | Bob       | Jones    |
+-----------+-----------+----------+
                 ⬆︎ 
        One column of data

§2.2 Domains and data types #

A domain is the set of all possible legal values for a column, put another way: a domain is the set of all values that the column is allowed to hold, defined by the data type plus any additional constraints (NOT NULL, CHECK, FOREIGN KEY, etc.). This is enforced by data types.
For example, a StudentID column might have a data type of INTEGER, meaning only whole numbers are allowed. A FirstName column would be a TEXT or VARCHAR (variable character) type. Using the right data type is crucial for data integrity.

§2.2.1 A Note on `NULL` #

All data types accept the value NULL unless the column was explicitly set as NOT NULL. NULL represents the absence of value, and is different than an empty string or a 0. A Database always considers NULL to be an unknown value, SQL considers NULL = NULL to be a False statement, because unknown values cannot be compared.

§2.3 Schema vs. instance #

Schema: This is the blueprint or structure of the database. It defines the tables, the columns in those tables, the data types for each column, the keys, and the relationships between tables. The schema is designed once and doesn’t change often.

Schema changes are inherently high-risk: they usually require coordinated updates to every application that touches the database, careful data migration scripts, extensive regression testing, and often a maintenance window. A single mis-migration can corrupt data or break downstream systems, so schema alterations need to be planned, reviewed, and rehearsed far more rigorously than routine code changes.

Instance: is the live, self-contained execution environment that makes the database files accessible to clients.

In the simplest case an instance is a single process (e.g., one mysql or postgres server).
At the other extreme an instance can be a distributed cluster (shards, replicas, or a managed cloud service). What qualifies them as “one instance” is that clients connect through a single logical endpoint and treat the whole as one coherent database.
The instance is transient, start it and the database is online; stop it and the data remains inert on disk. Backups taken from one instance can be restored into any other, the instance and the data are distinct entities.

Simple Instance Architecture Diagram

graph TD Cl((Client)) subgraph B[Single Instance] direction LR EP[Database Endpoint] end subgraph C[Data Layer] DB[(Database File)] end Cl -->|connect| B EP --> DB

Complex Sharded Instance Architecture Diagram

graph TD Cl((Client)) subgraph B["Logical Instance"] EP[Database Endpoint] end subgraph C[Shard Layer] S1[(Shard 1)] S2[(Shard 2)] Sn[(Shard n)] end Cl -->|connect| B EP --> S1 EP --> S2 EP --> Sn

Notice how in both diagrams, the client only connects to a single endpoint, regardless of the underlying database architecture.

§3 Data Types #

Data types are essential for maintaining data integrity by defining the legal values a column can hold. While the specific names and features can vary between database systems, they generally fall into several common categories:

Numeric Types: For storing numbers. This includes integers (e.g., INTEGER and BIGINT for whole numbers) and exact decimal types (e.g., DECIMAL and NUMERIC for financial data) and floating-point types for approximate values (FLOAT, DOUBLE).
String (Text) Types: For storing text. Common types include CHAR (fixed-length strings), VARCHAR (variable-length strings), and TEXT (for long-form text).
Date and Time Types: For storing temporal information, such as DATE (for dates), TIME (for times), and TIMESTAMP (for date and time combined).
Boolean Types: For storing true/false values. Some systems implement this with a dedicated BOOLEAN type, while others simply use a number (like 0 for false and 1 for true).
Binary Types: For storing raw byte data, such as images, audio files, or other binary objects (BLOB, BINARY).

Choosing the correct data type is a balance between flexibility and performance. Here is a comparison of common data types across different database systems:

Data Type	Oracle	PostgreSQL	MySQL	SQLite
32-bit int	`INTEGER`	`INTEGER`	`INT`	`INTEGER`
64-bit int	`NUMBER(19)`	`BIGINT`	`BIGINT`	`INTEGER`
Variable text	`VARCHAR2(n)`	`VARCHAR(n)`	`VARCHAR(n)`	`TEXT`
Exact decimal	`NUMBER(p,s)`	`NUMERIC`	`DECIMAL`	`REAL`
Boolean	`NUMBER(1)`	`BOOLEAN`	`BOOLEAN`	`INTEGER` (0/1)
Date & time	`DATE`/`TIMESTAMP`	`TIMESTAMP`	`DATETIME`	`TEXT`/`NUMERIC`
Small decimal	`NUMBER(p,s)`	`NUMERIC`/`DECIMAL`	`DECIMAL(p,s)`	`REAL`
Big decimal	`NUMBER(p,s)`	`NUMERIC`/`DECIMAL`	`DECIMAL(p,s)`	`REAL`
Fixed-length string	`CHAR(n)`	`CHAR(n)`	`CHAR(n)`	`TEXT` (affinity)
Variable-length string	`VARCHAR2(n)`/`VARCHAR(n)`	`VARCHAR(n)`	`VARCHAR(n)`	`TEXT`
Large text	`CLOB`	`TEXT`	`LONGTEXT`	`TEXT`
Date only	`DATE`	`DATE`	`DATE`	`TEXT` (ISO8601)
Time only	N/A (`DATE` with time)	`TIME`	`TIME`	`TEXT` (ISO8601)
Date/time	`TIMESTAMP`	`TIMESTAMP`	`DATETIME`	`TEXT` (ISO8601)
Binary/blob	`BLOB`/`RAW`	`BYTEA`	`BLOB`	`BLOB`

Rule of thumb: choose the smallest type that safely holds every possible value.
e.g. prefer an INTEGER to a BIGINT when storing whole numbers that will never be larger than 2,147,483,647.

§4 Database Keys #

Keys are special columns used to enforce rules and relationships. They are fundamental to the relational model. There are seven major key types in relational databases, each enforces a different aspect of integrity or navigability.

Superkey: A set of one or more columns that, taken together, can uniquely identify a row in a table.
Candidate Key: A minimal superkey, no subset of its attributes still guarantees uniqueness. This means it’s a superkey, but if you removed any column from it, it would no longer be a unique identifier. A table can have multiple candidate keys.
Primary Key: The one candidate key that is chosen to be the main identifier for a table. It must be unique and cannot be null. It’s how you guarantee every row is uniquely identifiable.
Alternate Key: any remaining candidate key not selected as the primary key.
Foreign Key: A column (or set of columns) in one table that refers to the primary key of another table. This is how you create a link or relationship between tables. It enforces referential integrity, ensuring that a row in one table cannot point to a non-existent row in another.
Composite Key: A primary key that is made up of two or more columns. This is used when a single column isn’t enough to uniquely identify a row.
Surrogate Key: a system-generated, meaningless identifier introduced when no natural key is ideal. An AUTO_INCREMENT column is the most common form of this, where the database adds a unique incrementing integer to every row in the table.

Examples of each type of database key

Table: Department

dept_code CHAR(2)	dept_name VARCHAR(40)	dept_phone CHAR(10)
CS	Computer Science	5551234567
MA	Mathematics	5552345678
EN	Engineering	5553456789

Primary Key: dept_code
Superkeys: dept_code, dept_code & dept_name, dept_code & dept_phone
Candidate Keys: dept_code

Table: Employee

emp_id INT AUTO_INCREMENT	dept_code CHAR(2)	ssn CHAR(11)	first_name VARCHAR(40)	last_name VARCHAR(40)	birth_date DATE
1	CS	123-45-6789	Alice	Smith	1990-05-15
2	CS	234-56-7890	Bob	Johnson	1985-11-03
3	MA	345-67-8901	Carol	Lee	1992-08-21
4	EN	456-78-9012	David	Patel	1988-02-27
5	EN	567-89-0123	Eve	Chen	1991-12-09

Primary Key: emp_id
Surrogate Key: emp_id
Superkeys: emp_id, ssn, dept_code & ssn, …
Foreign Key: Department.dept_code
Alternate Key: ssn
Composite Candidate Key: dept_code & ssn

§5 Database Relationships #

Relationships define how tables are connected to each other, using foreign keys.

One-to-One (1:1): Each record in Table A can be linked to one, and only one, record in Table B, and vice-versa. This is not very common. Example: A Users table and a UserProfiles table where each user has exactly one profile.
erDiagram Users ||--|| UserProfiles : "has"
One-to-Many (1:N): A single record in Table A can be linked to many records in Table B, but each record in Table B can only be linked to one record in Table A. This is the most common relationship type. Example: One Author can have many Books.
erDiagram Authors ||--o{ Books : "writes"
Many-to-Many (N:M): A record in Table A can be linked to many records in Table B, and a record in Table B can also be linked to many records in Table A. This requires a third table, often called a “junction” or “linking” table, which contains foreign keys from both tables. Example: Students and Courses. A student can enroll in many courses, and a course can have many students.
erDiagram Students ||--o{ Enrollments : enrolls Courses ||--o{ Enrollments : hosts

§6 Questions #

Always provide at least a brief explanations for your answers.
Please number each response to directly correspond with the question. Provide clear, concise answers that fully address each question without unnecessary elaboration or essay-style writing.

What is the difference between a primary key and a foreign key?

Provide an example of a Many to Many relationship that might be found in a database containing banking data. what would the “junction” table be?

What is the difference between a candidate key and a superkey?

When is a surrogate key useful?

Explain the difference between a database schema and a database instance.

Why should you choose to use an INTEGER instead of a BIGINT when storing numbers such as zip codes.

Assume the column email VARCHAR(255) is declared as UNIQUE in a Users table. Does this make email a candidate key? Justify your answer.

A university database has the following tables:

Students:

StudentID	FirstName	LastName

Courses:

CourseID	Title

Enrollments:

StudentID	CourseID	Semester	Grade

List all the foreign keys in the Enrollments table and explain what each one enforces.