Section 1: Intro to Databases

§1 Introduction to Databases #

Data is essential to modern applications and websites. Every interaction, from liking a social media post, tracking live scores in a multiplayer game, or managing items in an online shopping cart, involves information that is stored and manipulated within a database system. These databases are sophisticated digital repositories designed for the efficient storage, retrieval, and manipulation of vast amounts of information. In this course, you will learn fundamental concepts of data storage and manipulation.

§2 What is a Database? #

A database is an organized collection of information. While the concept can encompass various forms, such as a contact book, a spreadsheet, or a physical filing cabinet, in the context of websites and computer applications, a database typically refers to a computer program. These programs are formally known as Database Management Systems (DBMS). Users interact with databases using a specialized programming language, the most common of which is Structured Query Language, or SQL for short.

A DBMS is designed to efficiently store, retrieve, manage, and update structured data, which is typically stored electronically in a computer system. Databases are essential for systematically managing large amounts of information.

§3 What are Uses for a Database? #

Databases are ubiquitous and are used in almost every industry and application where data needs to be managed. Common uses include:

Databases are used in almost every modern computer system. Your smartphone uses hundreds of small databases for all sorts of purposes. For example: every major messaging app uses a sqlite database on-device to manage all information related to your conversations.

You can find dozens of these databases on your computer right now by running the following in a command prompt/terminal:

Windows #

open command prompt (NOT powershell) and run:

cd C:\ & dir *.db /s /b

MacOS #

open terminal.app and run:

find . -name "*.db"

You should get output like this:

... dozens of files...
./Library/Messages/CollaborationNoticeCache/collaborationNotices.db
./Library/Messages/NickNameCache/nickNameKeyStore.db
./Library/Messages/NickNameCache/nicknameRecordsStore.db
./Library/Messages/NickNameCache/handleSharingPreferences.db
./Library/Messages/NickNameCache/unknownSenderRecordInfoStore.db
./Library/Messages/NickNameCache/archivedNicknamesKeyStore.db
./Library/Messages/NickNameCache/handledNicknamesKeyStore.db
./Library/Messages/NickNameCache/pendingNicknamesKeyStore.db
./Library/Messages/chat.db
./Library/Messages/LiteSegmentStore.db
./Library/Messages/prewarm.db
./Library/Messages/Sync/sync.db

§4 Why Use a Database Over Other Storage Types? #

While simple datasets can be stored in text files or spreadsheets, databases offer significant advantages over other approaches, especially as data volume and complexity grow:

This course primarily focuses on Relational Databases, though there are many specialized databases that excel at specific use-cases, we will briefly cover some examples of alternate databases as well.

§5 Analogy: Database vs. Filing Cabinet or Spreadsheet #

To understand the benefits of a database, let’s compare it to more traditional storage methods:

§5.1 Filing Cabinet #

Imagine a large office with many filing cabinets. Each cabinet might represent a different category of information (e.g., “Customer Records,” “Employee Files”). Inside each cabinet, there are folders, and inside each folder, there are individual documents.

A loose database equivalent of this would be:

§5.2 Spreadsheet (e.g., Microsoft Excel) #

A spreadsheet is like a single, large table where you can store data in rows and columns.

The database equivalent of this would be:

§5.3 Limitations of Filing Cabinets and Spreadsheets Compared to Databases: #

FeatureFiling CabinetSpreadsheetDatabase
Search/RetrievalManual, time-consuming, prone to human error.Limited search functions, slow for large datasets.Fast, complex queries.
Data IntegrityNo built-in validation; easy to misfile.No enforced rules; easy to enter incorrect data.Enforces data types, unique values, relationships, and validation rules.
ConcurrencyOnly one person can access a folder at a time.Difficult for multiple users to edit simultaneously.Multiple users can read/write concurrently without data corruption.*
SecurityPhysical lock; anyone with key has full access.File password; limited granular control.Granular permissions (who can read/write specific data).
ScalabilityLimited by physical space.Performance degrades rapidly with large data.Designed to handle vast amounts of data and users.
RelationshipsManual cross-referencing.Difficult to link data across multiple sheets.Defines and enforces relationships between different data sets.
RedundancyEasy to have duplicate documents.Easy to duplicate rows/columns.Provides tools to minimize redundancy.

* users of a SQL database do not actually read/write to the same row of data at the exact same time, the database enforces “concurrency” so that subsequent updates wait a short amount of time for the previous update to complete (typically just a few milliseconds) to ensure the data is always consistent.

There are many database management systems (DBMS) available, categorized by their underlying data model. Some popular examples:

§6.1 Relational Databases (SQL Databases) #

These are the most common type of database, storing data in tables with predefined schemas and relationships. Well suited for storing structured data.

There are also Cloud-Based SQL database services, which provide a managed database where the provider takes care of updates and management.

This is a non-exhaustive list; there are many, many more DBMS software available.

§6.2 NoSQL Databases (Non-Relational Databases) #

Document Databases: Store data together as a “Document” object, particularly well suited to scale across many servers and hold enormous datasets, at the trade-off of certain query capabilities.

Key-Value Stores: A simple and flexible structure that stores any data along with an identifier, or key. These are often extremely fast but serve a narrow use case.

Graph Databases: A specialized structure that focuses on relationships between data.

Column-Family Stores: Designed for distributed systems and high throughput.

§7 What are NoSQL Databases? #

NoSQL (Not Only SQL) databases are non-relational database management systems designed for specific use cases requiring flexible schemas and/or high scalability for large volumes of unstructured or semi-structured data. They emerged to address the limitations of traditional SQL databases when handling massive, unstructured data.

Key Characteristics of NoSQL Databases:

The choice between a relational and NoSQL database depends heavily on the specific requirements of the application, including data structure, scalability needs, consistency requirements, and query patterns. Many modern applications use a combination of both database types for different needs.

§8 Simple Database Architecture: How Applications Interact with the Database Server #

A typical database architecture involves an application interacting with a database server.

  1. Application (Client): The software users interact with (e.g., website, mobile app) that initiates data requests.
  2. Database Driver/Connector: A library that translates the application’s requests into a format the database server understands (e.g., SQL queries).
  3. Network: The medium over which requests are sent from the application to the database server. (Note: This step is not applicable for embedded databases like SQLite).
  4. Database Server: A dedicated service running the database management system (DBMS) software. It receives, processes, and executes requests, interacting with the physical data storage before sending results back to the application.
  5. Data Storage: The physical location (e.g., disk drives) where the database server manages and stores the actual data files.
graph LR A[Application e.g. Web App] --> B[Database Driver] B -.-> C{{Network}} C -.-> D[[Database Server e.g., MySQL]] D --> E[(Data Storage Physical Files)]

§9 Questions #

Always provide at least a brief explanations for your answers.
Please number each response to directly correspond with the question. Provide clear, concise answers that fully address each question without unnecessary elaboration or essay-style writing.

  1. What are some advantages of using a database instead of Excel spreadsheets?
  2. What is the most used database software (counted by number of deployed instances/installs)?
  1. Briefly describe an ideal application for each of the 5 database types: Relational Database (SQL), Document Database, Key-Value Store, Graph, and Column-Oriented.
  2. Which feature of a database becomes most important if 500 customer-service reps need to update the same product-inventory file at the same time?
  3. What built-in database feature prevents two different users from simultaneously overwriting the same customer’s phone number?
  4. Which of the five database categories would you use to track real-time chat-message counts for an online game, and why?
  5. If you needed to model highly connected social-network “friend” relationships efficiently, which of the five database types should you pick?
  6. A weather-monitoring system ingests millions of time-stamped sensor readings per hour. Which of the five database categories is designed for this workload?
  7. If you need to store a structured user contact list, including names, addresses, phone numbers, and other information. Which of the five database categories will be well-suited this workload?