§1 Relational Foundations #
This section talks about foundational concepts for understanding how data is structured and managed in most modern database systems. We will explore core components such as tables and keys, and learn how they connect to form useful data stores.
§2 relational database concepts #
Relational databases are based on the relational model, an idea from mathematics. It’s a way of seeing data organized in tables.
§2.1 Tables, rows, columns #
- Tables (Relations): Think of a table like a spreadsheet. It holds a collection of related data. For example, a
Studentstable would hold information about students.
+-----------+-----------+----------+
| StudentID | FirstName | LastName |
+-----------+-----------+----------+
| 14378 | Alice | Smith |
| 28371 | Bob | Jones |
+-----------+-----------+----------+
- Rows (Tuples/Records): Each row in a table represents a single item or record. In a
Studentstable, each row would be a different student.
+-----------+-----------+----------+
| StudentID | FirstName | LastName |
+-----------+-----------+----------+
| 14378 | Alice | Smith | ⬅︎ One Row of data
| 28371 | Bob | Jones |
+-----------+-----------+----------+
- Columns (Attributes/Fields): Columns define the properties of the items in the table. For a
Studentstable, you might have columns likeStudentID,FirstName, andLastName.
+-----------+-----------+----------+
| StudentID | FirstName | LastName |
+-----------+-----------+----------+
| 14378 | Alice | Smith |
| 28371 | Bob | Jones |
+-----------+-----------+----------+
⬆︎
One column of data
§2.2 Domains and data types #
A domain is the set of all possible legal values for a column, put another way: a domain is the set of all values that the column is allowed to hold, defined by the data type plus any additional constraints (NOT NULL, CHECK, FOREIGN KEY, etc.). This is enforced by data types.
For example, a StudentID column might have a data type of INTEGER, meaning only whole numbers are allowed. A FirstName column would be a TEXT or VARCHAR (variable character) type. Using the right data type is crucial for data integrity.
§2.2.1
A Note on NULL
#
All data types accept the value NULL unless the column was explicitly set as NOT NULL. NULL represents the absence of value, and is different than an empty string or a 0. A Database always considers NULL to be an unknown value, SQL considers NULL = NULL to be a False statement, because unknown values cannot be compared.
§2.3 Schema vs. instance #
Schema: This is the blueprint or structure of the database. It defines the tables, the columns in those tables, the data types for each column, the keys, and the relationships between tables. The schema is designed once and doesn’t change often.
Schema changes are inherently high-risk: they usually require coordinated updates to every application that touches the database, careful data migration scripts, extensive regression testing, and often a maintenance window. A single mis-migration can corrupt data or break downstream systems, so schema alterations need to be planned, reviewed, and rehearsed far more rigorously than routine code changes.
Instance: is the live, self-contained execution environment that makes the database files accessible to clients.
- In the simplest case an instance is a single process (e.g., one mysql or postgres server).
- At the other extreme an instance can be a distributed cluster (shards, replicas, or a managed cloud service). What qualifies them as “one instance” is that clients connect through a single logical endpoint and treat the whole as one coherent database.
- The instance is transient, start it and the database is online; stop it and the data remains inert on disk. Backups taken from one instance can be restored into any other, the instance and the data are distinct entities.
Simple Instance Architecture Diagram
Complex Sharded Instance Architecture Diagram
Notice how in both diagrams, the client only connects to a single endpoint, regardless of the underlying database architecture.
§3 Data Types #
Data types are essential for maintaining data integrity by defining the legal values a column can hold. While the specific names and features can vary between database systems, they generally fall into several common categories:
- Numeric Types: For storing numbers. This includes integers (e.g.,
INTEGERandBIGINTfor whole numbers) and exact decimal types (e.g.,DECIMALandNUMERICfor financial data) and floating-point types for approximate values (FLOAT,DOUBLE). - String (Text) Types: For storing text. Common types include
CHAR(fixed-length strings),VARCHAR(variable-length strings), andTEXT(for long-form text). - Date and Time Types: For storing temporal information, such as
DATE(for dates),TIME(for times), andTIMESTAMP(for date and time combined). - Boolean Types: For storing true/false values. Some systems implement this with a dedicated
BOOLEANtype, while others simply use a number (like 0 for false and 1 for true). - Binary Types: For storing raw byte data, such as images, audio files, or other binary objects (
BLOB,BINARY).
Choosing the correct data type is a balance between flexibility and performance. Here is a comparison of common data types across different database systems:
| Data Type | Oracle | PostgreSQL | MySQL | SQLite |
|---|---|---|---|---|
| 32-bit int | INTEGER | INTEGER | INT | INTEGER |
| 64-bit int | NUMBER(19) | BIGINT | BIGINT | INTEGER |
| Variable text | VARCHAR2(n) | VARCHAR(n) | VARCHAR(n) | TEXT |
| Exact decimal | NUMBER(p,s) | NUMERIC | DECIMAL | REAL |
| Boolean | NUMBER(1) | BOOLEAN | BOOLEAN | INTEGER (0/1) |
| Date & time | DATE/TIMESTAMP | TIMESTAMP | DATETIME | TEXT/NUMERIC |
| Small decimal | NUMBER(p,s) | NUMERIC/DECIMAL | DECIMAL(p,s) | REAL |
| Big decimal | NUMBER(p,s) | NUMERIC/DECIMAL | DECIMAL(p,s) | REAL |
| Fixed-length string | CHAR(n) | CHAR(n) | CHAR(n) | TEXT (affinity) |
| Variable-length string | VARCHAR2(n)/VARCHAR(n) | VARCHAR(n) | VARCHAR(n) | TEXT |
| Large text | CLOB | TEXT | LONGTEXT | TEXT |
| Date only | DATE | DATE | DATE | TEXT (ISO8601) |
| Time only | N/A (DATE with time) | TIME | TIME | TEXT (ISO8601) |
| Date/time | TIMESTAMP | TIMESTAMP | DATETIME | TEXT (ISO8601) |
| Binary/blob | BLOB/RAW | BYTEA | BLOB | BLOB |
Rule of thumb: choose the smallest type that safely holds every possible value.
e.g. prefer anINTEGERto aBIGINTwhen storing whole numbers that will never be larger than2,147,483,647.
§4 Database Keys #
Keys are special columns used to enforce rules and relationships. They are fundamental to the relational model. There are seven major key types in relational databases, each enforces a different aspect of integrity or navigability.
- Superkey: A set of one or more columns that, taken together, can uniquely identify a row in a table.
- Candidate Key: A minimal superkey, no subset of its attributes still guarantees uniqueness. This means it’s a superkey, but if you removed any column from it, it would no longer be a unique identifier. A table can have multiple candidate keys.
- Primary Key: The one candidate key that is chosen to be the main identifier for a table. It must be unique and cannot be null. It’s how you guarantee every row is uniquely identifiable.
- Alternate Key: any remaining candidate key not selected as the primary key.
- Foreign Key: A column (or set of columns) in one table that refers to the primary key of another table. This is how you create a link or relationship between tables. It enforces referential integrity, ensuring that a row in one table cannot point to a non-existent row in another.
- Composite Key: A primary key that is made up of two or more columns. This is used when a single column isn’t enough to uniquely identify a row.
- Surrogate Key: a system-generated, meaningless identifier introduced when no natural key is ideal. An
AUTO_INCREMENTcolumn is the most common form of this, where the database adds a unique incrementing integer to every row in the table.
Examples of each type of database key
Table: Department
| dept_code CHAR(2) | dept_name VARCHAR(40) | dept_phone CHAR(10) |
|---|---|---|
| CS | Computer Science | 5551234567 |
| MA | Mathematics | 5552345678 |
| EN | Engineering | 5553456789 |
- Primary Key:
dept_code - Superkeys:
dept_code,dept_code & dept_name,dept_code & dept_phone - Candidate Keys:
dept_code
Table: Employee
| emp_id INT AUTO_INCREMENT | dept_code CHAR(2) | ssn CHAR(11) | first_name VARCHAR(40) | last_name VARCHAR(40) | birth_date DATE |
|---|---|---|---|---|---|
| 1 | CS | 123-45-6789 | Alice | Smith | 1990-05-15 |
| 2 | CS | 234-56-7890 | Bob | Johnson | 1985-11-03 |
| 3 | MA | 345-67-8901 | Carol | Lee | 1992-08-21 |
| 4 | EN | 456-78-9012 | David | Patel | 1988-02-27 |
| 5 | EN | 567-89-0123 | Eve | Chen | 1991-12-09 |
- Primary Key:
emp_id - Surrogate Key:
emp_id - Superkeys:
emp_id,ssn,dept_code & ssn, … - Foreign Key:
Department.dept_code - Alternate Key:
ssn - Composite Candidate Key:
dept_code & ssn
§5 Database Relationships #
Relationships define how tables are connected to each other, using foreign keys.
- One-to-One (1:1): Each record in Table A can be linked to one, and only one, record in Table B, and vice-versa. This is not very common. Example: A
Userstable and aUserProfilestable where each user has exactly one profile.erDiagram Users ||--|| UserProfiles : "has" - One-to-Many (1:N): A single record in Table A can be linked to many records in Table B, but each record in Table B can only be linked to one record in Table A. This is the most common relationship type. Example: One
Authorcan have manyBooks.erDiagram Authors ||--o{ Books : "writes" - Many-to-Many (N:M): A record in Table A can be linked to many records in Table B, and a record in Table B can also be linked to many records in Table A. This requires a third table, often called a “junction” or “linking” table, which contains foreign keys from both tables. Example:
StudentsandCourses. A student can enroll in many courses, and a course can have many students.erDiagram Students ||--o{ Enrollments : enrolls Courses ||--o{ Enrollments : hosts
§6 Questions #
Always provide at least a brief explanations for your answers.
Please number each response to directly correspond with the question. Provide clear, concise answers that fully address each question without unnecessary elaboration or essay-style writing.
- What is the difference between a primary key and a foreign key?
- Provide an example of a Many to Many relationship that might be found in a database containing banking data. what would the “junction” table be?
- What is the difference between a candidate key and a superkey?
- When is a surrogate key useful?
- Explain the difference between a database schema and a database instance.
- Why should you choose to use an
INTEGERinstead of aBIGINTwhen storing numbers such as zip codes.
- Assume the column
email VARCHAR(255)is declared asUNIQUEin aUserstable. Does this makeemaila candidate key? Justify your answer.
- A university database has the following tables:
Students:
| StudentID | FirstName | LastName |
|---|
Courses:
| CourseID | Title |
|---|
Enrollments:
| StudentID | CourseID | Semester | Grade |
|---|
List all the foreign keys in the Enrollments table and explain what each one enforces.