SQL database schema design principles

When designing your first schema for a relational database like SQLite there are no right or wrong decisions but there are consistent and inconsistent schemas. Here are eight schema design principles you can follow to avoid future headaches, especially if you are a beginner. And, if you are working with an existing database, you should probably ignore what I say here and follow the already established patterns for consistency's sake.

1. Naming convention

Use descriptive snake_case (lowercase letters, numbers, and underscores) over descriptive or abreviated PascalCase, camelCase, "Sentence case", or "Title Case" to make it easier to read and understand SQL queries and the meaning of data in tables.

customer_orders -- vs -- CustomerOrders
user_preference -- vs -- userpreferences

Don't use dots, hyphens, or spaces in names because they cause confusion. The primary use of dots is for identifying objects in code with patters like database.schema.table.column. Hyphens and spaces will force you to write lots of unnecessary quotes in your script.

SELECT user_preferences FROM orders
-- vs --
SELECT "user preferences" FROM orders
-- vs --
SELECT "user-preferences" FROM orders

2. Table names

Use plural over singular words for table names to avoid collusion with reserved keywords and to better describe that a table represents a collection of items rathar than a single item. For join tables, let all words be plural.

books           -- vs -- book
authors         -- vs -- author
books_authors   -- vs -- book_author

Many people advocate that you should use singular table names because it requires less typing and reads better when joining tables in SQL query, but somehow my brain favors plural table names as the more natural convention.

3. Column names

Use specific column names over generalized but ambiguous names, so people can tell what the contents of a column is about without depending on other values in a row. Also, no need to prefix column names with the name of the containing table.

event_count    -- vs -- item_type
name           -- vs -- venue_name
   
SELECT name, capacity, sum(event_count)
FROM venues
GROUP BY name, capacity
-- vs --
SELECT venue_name, venue_capacity, sum(item_value) AS event_count
FROM venues
WHERE item_type = 'Event Count'
GROUP BY venue_name, venue_capacity

4. Primary keys

Use integer auto-incrementing id column as primary key over multi-column primary keys and in addition to other identifiers like UUIDs to make it easier to reason about and perform certain analysis over the data much easier, like finding and deleting duplicate records. You can apply multi-column unique constrains, if needed.

DELETE FROM customer_orders
WHERE id NOT IN (SELECT min(id) FROM customer_orders GROUP BY name)

SQLite tip: Simply add an id INTEGER PRIMARY KEY column and the databse will automatically increment it as new records are added. See SQLite autoincrement docs

5. Foreign keys

Given a table customers with id column as the primary key, then foreign key columns refercing that table are best named customer_id. That pattern makes it easier to write and read SQL queries. Avoid using foreign key names that don’t obviously match up to a table.

SELECT * FROM customers
JOIN orders ON customers.id = orders.customer_id
JOIN events ON customer.id = events.customer_id
-- vs --
SELECT * FROM customers
JOIN orders ON customers.customer_id = orders.cid
JOIN events ON customers.customer_id events.customer

6. Dates and time

Store datetime values as text in ISO8601 format ("YYYY-MM-DD HH:MM:SS.SSS") over unix epoch or Julian day format in SQLite and other databases without a native data type for datetime values. That way you can read datetime values directly from the data table without having to do conversions.

2042-11-11 11:11:11.042 vs 2299317071 (Unix Epoch)

For applications requiring more than three digits millisecond precision, use the Julian day format.

Use single column instead of storing the year, month, and day in separate columns which complicates time series queries.

SELECT date(ordered_on)
-- vs --
SELECT date (order_year || '-' || order_month || '-' || order_day)

Keep it stupid simple when it comes to time zones and always store all datetime values in UTC to avoid constant time conversion headaches.

7. Table structure

Use tall tables with a smaller number of JSON columns over wide tables with many columns, many of which store JSON data and have sequential names (e.g. section1, section2, etc.). Restructuring wide tables into separate tables with unique names will make them easier and faster to query in the future.

SELECT 
    SUM(
        (CASE WHEN section1 IS NOT NULL
        THEN 1 ELSE 0 END) +
        (CASE WHEN section2 IS NOT NULL
        THEN 1 ELSE 0 END) +
        (CASE WHEN section3 IS NOT NULL
        THEN 1 ELSE 0 END) + 
        (CASE WHEN section4 IS NOT NULL
        THEN 1 ELSE 0 END)
    ) AS sections_count
FROM orders
WHERE id = 42
-- vs --
SELECT count(section)
FROM contracts
WHERE order_id = 42

Don't go to table normalization extremes, though. It is fine to store things like customer zip code and product country of origin in the orders table to avoid doing foreign key lookups for no real benefit.

8. Single source of truth

Be religious about having a single source of truth for a piece of data and dropping unused tables/columns (e.g. name_old) to avoid confusion and data analysis issues. If you need to duplicate data in a new table for metrics calculation or view agregation, indicate this with a suffix in the table name.

SELECT *
FROM vip_customers_view

© 2016-2024 Anton Hristov, all rights reserved.

Handcrafted using Gatsby, React, JavaScript, HTML5, and CSS3. Living on Azure Static Web Apps.