Assignment 4: Business Analytics

At this point, we have covered many of the basics of SQL queries, including how to begin to put together complex queries. This assignment will have you perform data analytics tasks against a simplified business database using the skills you have learned.

This database contains 5 tables with approximately 45,000 rows of data.

Scenario: You are performing analytics for “Cycle,” an e-bike rental company specializing in providing on-demand e-bikes to customers all over the city. (Comparable Real Companies: Bird, Lime)

Your task is to find answers for some business questions by querying their database.


§1 1. Start the Database Instances #

  1. Extract the project archive to a new directory.
  2. Open a terminal window and navigate into that directory.
  3. Confirm you are in the correct place by running ls:
$ ls
docker-compose.yaml   init   README.md
  1. Launch the database container:
docker-compose up

This command will start the PostgreSQL instances. Logs will stream to your terminal; leave this window open while you work.

$ docker-compose up
Attaching to postgres-a4
postgres-a4 | The files belonging to this database system will be owned by user "postgres".
postgres-a4 | This user must also own the server process.
postgres-a4 |
postgres-a4 | The database cluster will be initialized with locale "en_US.utf8".
postgres-a4 | The default database encoding has accordingly been set to "UTF8".
postgres-a4 | The default text search configuration will be set to "english".
postgres-a4 |
postgres-a4 | Data page checksums are disabled.
postgres-a4 |
postgres-a4 | fixing permissions on existing directory /var/lib/postgresql/data ... ok
postgres-a4 | creating subdirectories ... ok
postgres-a4 | selecting dynamic shared memory implementation ... posix
...

To shut down the databases after completing the assignment, press Ctrl + C (⌘ + C on MacOS) in this same terminal window.

If you accidentally close the window, simply reopen a terminal and navigate bask to the assignment directory, then run docker-compose up again to re-attach.


§2 2. Connect to the Database #


§3 3. Stop the Database Instance #

Do not shut down the container until you have finished the assignment.
If you accidentally stop the database, repeat the commands from section 1.

When ready:

  1. Return to the terminal running docker-compose up, you will see the output logs from the database instance.
  2. Press Ctrl + C to gracefully stop both containers.
  3. Verify nothing is running:
$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
# empty output

§4 4. Submission #

Create a plain-text submission on canvas with numbered sections for each question:

Answer each question with the requested information, followed by the SQL query/queries you used.

§5 5. Questions #

  1. Monthly Revenue Produce one row per calendar month (YYYY-MM) showing total revenue from rides rounded to nearest cents. (Use TO_CHAR(<DATE_VALUE_HERE>, 'YYYY-MM') for date formatting)
    Expected columns: month, revenue_dollars.

  2. Monthly Customer Signups
    Find the number of new customers who signed up each month in 2024.
    Columns: month, new_customers
    HINT: you can compare dates date_value >= DATE '2025-09-01'

  3. Top-5 Stations by Net Traffic
    Net traffic = (rides that started here) + (rides that ended here).
    List the five busiest stations, their city, and net traffic count.

  4. Average Ride Duration by Gender
    Return one row per gender with the average ride duration in minutes rounded to three decimals.

  5. Payment-Type Totals
    Provide the total revenue by payment type in dollars.
    Columns: payment_type, total_revenue.

  6. Payment-Type Market Share
    Give the percentage share of revenue by payment type.
    HINT: you can calculate the percent aggregate across the entire table with OVER ()
    e.g. ROUND(SUM(p.amount_cents) * 100.0 / SUM(SUM(p.amount_cents)) OVER (), 2)
    Columns: payment_type, share_pct.

  7. Station Utilization
    Find the top 5 most utilized stations
    Return columns: station_name, total_rides.

  8. Missing Revenue
    Find the difference between the ride cost and value paid by all customers (In dollars).

  9. Lost Revenue from “Voucher” Failures
    Assume that voucher payments fail 5% of the time and are refunded immediately.
    Estimate how much revenue (in dollars) Cycle-Rentals “lost” in the month of 2025-09 because of voucher failures.
    HINT: you can compare dates date_value >= DATE '2025-09-01'

  10. Next-Best-Station Recommendation
    For Station-1, find the station most frequently paired as the destination for rides starting at Station-1.
    Return: recommended_station_name, frequency.

  11. Long Rides
    Count the number of rides that lasted longer than 60 minutes.

  12. Most-Traveled customers
    Find the top 10 customers that have traveled the longest distance.
    Output: customer_id, first_name, last_name, lifetime_travel_distance
    You can calculate the distance (in miles) from two coordinate pairs with this formula:

2 * 3961 * ASIN(
  SQRT(
    POWER(SIN((RADIANS(s2.latitude - s1.latitude)) / 2), 2) +
    COS(RADIANS(s1.latitude)) * COS(RADIANS(s2.latitude)) *
    POWER(SIN((RADIANS(s2.longitude - s1.longitude)) / 2), 2)
  )
)