Assignment 5: Importing and Exporting Data #
SQL databases support importing data from external sources, such as csv files, as well as exporting data for backup purposes, via a SQL dump. Importing external data is useful for migration purposes. Exporting data by dumping the sql database to a file is a simple and effective way to take a point-in-time backup of a database.
This assignment will walk you through how you can import data from a CSV file, and how you can export the contents of a database as a dump file.
§1 Importing Data From CSV #
We will start with a simple CSV file containing data you can import into a database, The most straightforward approach is to create a new database table to import the CSV data into.
Example data.csv:
id,name,email
1,Alice,alice@example.com
2,Bob,bob@example.com
3,Charlie,charlie@example.com
Launch psql and connect to the postgres database.
Before importing the CSV file, you need a table in your Postgres database that matches the structure of your CSV file. The column names and data types should correspond to your CSV data.
Let’s create a table named users that matches our example data.csv:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);
id SERIAL PRIMARY KEY: Creates an auto-incrementing integer column as the primary key.name VARCHAR(100): Creates a string column for names.email VARCHAR(100): Creates a string column for emails.
Reminder: We covered most common SQL data types in Section 2. https://evrist-scc-classrooms.github.io/cisp350-site/week-2/section-2/#data-types
Now we are ready to import the csv file into our database. Postgres psql provides a \copy command to facilitate import operations from your machine to the database server.
\copy users(id, name, email)
FROM '/path/to/your/data.csv'
CSV HEADER;
The \copy command syntax is as follows:
\copy <TABLENAME>(<DATA1>, <DATA2>, ...)
DELIMETER ',' --set the delimiter, defaults to a comma ',' which is also the csv default (this can be omitted)
CSV HEADER; --instructs the copy command to read a csv file that contains a header line
After running the COPY or \copy command, you should see a message indicating the number of rows copied. To verify the import, query your table:
SELECT * FROM users;
You should see the data from your CSV file displayed in the psql output.
§2 Exporting Database Data With SQL Dump #
One of the easiest methods of backing up a database is to dump its contents to a file. A SQL dump is simply a file containing all SQL commands to re-create the database as it was. This file can use used to restore data to the same state as when the backup was taken.
To create a dump of a Postgres database, you use the pg_dump command, this is typically installed alongside the postgres cli tools pgsql.
The most basic usage of pg_dump is:
pg_dump your_database_name > database.sql
This will create a plain text file containing SQL commands to restore the database. This is simple, but typically you would want to use a compressed dump file for backup purposes, a compressed .dump file is easier to restore than a .sql file.
pg_dump -Fc your_database_name > database.dump
This will create a compressed file called database.dump on your machine, this file contains everything needed to restore the full contents of the database with pg_restore.
You can list the database names of a postgres instance with
\listinpsql, the Postgres container used for these assignments defaults to the databaseroot.
§3 Restoring a Database From a Dump #
Restoring a database from a dump file is as simple as one command.
In Postgres, restoring a database from a dump uses the pg_restore command.
pg_restore --clean -d existing_database_name path/to/your_dump_file.dump
- The
--cleanflag overwrites an existing database to ensure it exactly matches the contents of the dump. -drefers to the database to restore to, the default database in the Postgres container isroot
After running pg_restore, the database will contain all of the tables and data from the dump file.
Assignment Directions #
§4 1. Start the Database Instances #
- Extract the project archive to a new directory.
- Open a terminal window and navigate into that directory.
- Confirm you are in the correct place by running
ls:
$ ls
docker-compose.yaml workdir README.md
- Launch the database container:
docker-compose up
This command will start the PostgreSQL instances.
Logs will stream to your terminal; leave this window open while you work.
$ docker-compose up
[+] Running 2/2
Attaching to postgres-a5
postgres-a5 | The files belonging to this database system will be owned by user "postgres".
postgres-a5 | This user must also own the server process.
postgres-a5 |
postgres-a5 | The database cluster will be initialized with locale "en_US.utf8".
postgres-a5 | The default database encoding has accordingly been set to "UTF8".
postgres-a5 | The default text search configuration will be set to "english".
postgres-a5 |
postgres-a5 | Data page checksums are disabled.
postgres-a5 |
postgres-a5 | fixing permissions on existing directory /var/lib/postgresql/data ... ok
postgres-a5 | creating subdirectories ... ok
...
To shut down the databases after completing the assignment, press Ctrl + C (⌘ + C on MacOS) in this same terminal window.
If you accidentally close the window, simply reopen a terminal and navigate bask to the assignment directory, then run docker-compose up again to re-attach.
§5 2. Connect to the Database #
For this assignment you will use several different CLI tools, so you will connect to the database container using bash instead of launching straight into psql.
This time I created a directory that is shared between your computer and the container that runs the database, all files in the workdir directory are synced.
- run:
docker exec -it postgres-a5 bash - This will launch a linux bash shell on the container running the database.
- You will start in the
/workdirdirectory, if you typels, you will see thedata.csvfile, any files in this directory are synced to theworkdirfolder on your computer. - Complete the following tasks:
- Open
workdir/data.csvand understand the “schema” of the CSV file (the first “header” line is most important here) - Use
psqlto create a newpurchasestable with a schema that will work with the CSV file. (I expect to see Integers, Numbers, and Date types used.) - Use the
\copycommand to import/workdir/data.sqlinto your database. - Ensure the data imported successfully by using a
SELECTstatement. (typeexitonce to exitpsqlafter verifying the data is present) - Use
pg_dumpto dump your database contents:pg_dump root > backup.sql- The
backup.sqlwill be synced toworkdiron your computer, you will submit that file.
- The
- Open
- Type
exitto leave the container when you are done.
§6 3. Stop the Database Instance #
Do not shut down the container until you have finished the assignment.
If you accidentally stop the database, repeat the commands from section 1.
When ready:
- Return to the terminal running
docker-compose up, you will see the output logs from the database instance. - Press
Ctrl + Cto gracefully stop both containers. - Verify nothing is running:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# empty output
§7 4. Submission #
Submit the backup.sql file generated by pg_dump on canvas.
If you have any questions or issues, please message or email the professor.