Database

Database Design: Good vs Bad Examples with Diagrams

Overview

This document provides practical examples of good and bad database design patterns, illustrated with MermaidJS diagrams. Each example shows the problem, explains why it's problematic, and demonstrates the correct approach.

1. Table Structure Design

❌ Bad Example: Denormalized Customer Table

erDiagram CUSTOMERS_BAD { int customer_id PK string first_name string last_name string email string phone string address1 string address2 string city string state string zip string country string order1_product string order1_date decimal order1_amount string order2_product string order2_date decimal order2_amount string order3_product string order3_date decimal order3_amount }

Problems:

Repeating groups (order1, order2, order3)
Wasted space when customers have fewer than 3 orders
Difficult to add more orders without schema changes
Data redundancy and update anomalies

✅ Good Example: Normalized Design

erDiagram CUSTOMERS { int customer_id PK string first_name string last_name string email string phone datetime created_at datetime updated_at } ADDRESSES { int address_id PK int customer_id FK string address_type string street_address string city string state string postal_code string country boolean is_primary } ORDERS { int order_id PK int customer_id FK datetime order_date decimal total_amount string status } ORDER_ITEMS { int order_item_id PK int order_id FK int product_id FK int quantity decimal unit_price decimal total_price } PRODUCTS { int product_id PK string product_name string description decimal price int stock_quantity } CUSTOMERS ||--o{ ADDRESSES : "has" CUSTOMERS ||--o{ ORDERS : "places" ORDERS ||--o{ ORDER_ITEMS : "contains" PRODUCTS ||--o{ ORDER_ITEMS : "included_in"

Benefits:

No data redundancy
Flexible number of orders per customer
Easy to add new attributes
Maintains data integrity
Supports complex queries efficiently

2. Relationship Design Patterns

❌ Bad Example: Poor Many-to-Many Implementation

erDiagram STUDENTS_BAD { int student_id PK string name string email string course1 string course2 string course3 string course4 string course5 } COURSES_BAD { int course_id PK string course_name string student1 string student2 string student3 string student4 string student5 }

Problems:

Limited number of courses per student
Limited number of students per course
Data redundancy
Difficult to query relationships
No additional relationship attributes (enrollment date, grade)

✅ Good Example: Proper Junction Table

erDiagram STUDENTS { int student_id PK string first_name string last_name string email date date_of_birth datetime created_at } COURSES { int course_id PK string course_code string course_name string description int credits int max_enrollment } ENROLLMENTS { int enrollment_id PK int student_id FK int course_id FK date enrollment_date string grade string status datetime created_at } INSTRUCTORS { int instructor_id PK string first_name string last_name string email string department } COURSE_INSTRUCTORS { int course_instructor_id PK int course_id FK int instructor_id FK string role datetime assigned_date } STUDENTS ||--o{ ENROLLMENTS : "enrolls_in" COURSES ||--o{ ENROLLMENTS : "has_enrollment" COURSES ||--o{ COURSE_INSTRUCTORS : "taught_by" INSTRUCTORS ||--o{ COURSE_INSTRUCTORS : "teaches"

Benefits:

Unlimited students per course and courses per student
Additional relationship attributes
Clean separation of concerns
Easy to track enrollment history
Supports complex reporting

3. Data Type and Constraint Issues

❌ Bad Example: Poor Data Types and Missing Constraints

-- Bad table design
CREATE TABLE users_bad (
    id TEXT,                    -- Should be INT with AUTO_INCREMENT
    username TEXT,              -- No length limit, no uniqueness
    email TEXT,                 -- No validation, no uniqueness
    age TEXT,                   -- Should be INT with constraints
    salary TEXT,                -- Should be DECIMAL
    is_active TEXT,             -- Should be BOOLEAN
    created_date TEXT,          -- Should be DATETIME
    phone TEXT,                 -- No format validation
    password TEXT               -- No encryption, poor security
);

✅ Good Example: Proper Data Types and Constraints

-- Good table design
CREATE TABLE users_good (
    id INT PRIMARY KEY AUTO_INCREMENT,
    username VARCHAR(50) NOT NULL UNIQUE,
    email VARCHAR(254) NOT NULL UNIQUE,
    age INT CHECK (age >= 0 AND age <= 150),
    salary DECIMAL(10,2) CHECK (salary >= 0),
    is_active BOOLEAN NOT NULL DEFAULT TRUE,
    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME ON UPDATE CURRENT_TIMESTAMP,
    phone VARCHAR(20),
    password_hash VARCHAR(255) NOT NULL,
    
    INDEX idx_username (username),
    INDEX idx_email (email),
    INDEX idx_created_at (created_at)
);

4. Hierarchical Data Anti-patterns

❌ Bad Example: Adjacency List with Poor Structure

erDiagram CATEGORIES_BAD { int id PK string name int parent_id string full_path int level }

Problems:

full_path and level are derived data (redundant)
Difficult to maintain consistency
Performance issues with deep hierarchies
Update anomalies when moving subtrees

✅ Good Example: Clean Adjacency List with Computed Paths

erDiagram CATEGORIES { int category_id PK string name string slug int parent_id FK string description boolean is_active int sort_order datetime created_at datetime updated_at } CATEGORY_PATHS { int path_id PK int ancestor_id FK int descendant_id FK int depth } CATEGORIES ||--o{ CATEGORIES : "parent_of" CATEGORIES ||--o{ CATEGORY_PATHS : "ancestor" CATEGORIES ||--o{ CATEGORY_PATHS : "descendant"

Benefits:

Clean base table without redundancy
Separate path table for efficient queries
Easy to find all ancestors/descendants
Maintains referential integrity

5. Indexing Strategies

❌ Bad Example: Poor Indexing

graph TD A[Table: orders
1M rows] --> B[No indexes except PK] B --> C[Query: WHERE customer_id = 123
Full table scan] C --> D[Performance: 2+ seconds] E[Query: WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'
Full table scan] --> F[Performance: 3+ seconds] G[Query: ORDER BY order_date DESC
Expensive sort operation] --> H[Performance: 4+ seconds]

✅ Good Example: Strategic Indexing

graph TD A[Table: orders
1M rows] --> B[Well-designed indexes] B --> C[IX_customer_id
IX_order_date
IX_status_date
IX_customer_status] C --> D[Query: WHERE customer_id = 123
Index seek] D --> E[Performance: 10ms] C --> F[Query: WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'
Index range scan] F --> G[Performance: 50ms] C --> H[Query: ORDER BY order_date DESC
Index order] H --> I[Performance: 20ms]

Index Strategy:

-- Strategic indexes
CREATE INDEX ix_orders_customer_id ON orders(customer_id);
CREATE INDEX ix_orders_order_date ON orders(order_date);
CREATE INDEX ix_orders_status_date ON orders(status, order_date);
CREATE INDEX ix_orders_customer_status ON orders(customer_id, status, order_date);

6. Security Anti-patterns

❌ Bad Example: Security Vulnerabilities

erDiagram USERS_INSECURE { int id PK string username string password_plain string ssn string credit_card string api_key string role string permissions_csv }

Problems:

Plain text passwords
Sensitive data unencrypted
Permissions stored as CSV string
No audit trail
Single role per user

✅ Good Example: Secure Design

erDiagram USERS_SECURE { int user_id PK string username string email string password_hash string salt datetime last_login boolean is_locked datetime created_at string created_by } SENSITIVE_DATA { int sensitive_id PK int user_id FK string encrypted_ssn string encrypted_cc_hash string encryption_key_id datetime created_at } ROLES { int role_id PK string role_name string description boolean is_active } PERMISSIONS { int permission_id PK string permission_name string resource string action string description } USER_ROLES { int user_role_id PK int user_id FK int role_id FK datetime assigned_at string assigned_by datetime expires_at } ROLE_PERMISSIONS { int role_permission_id PK int role_id FK int permission_id FK datetime granted_at string granted_by } AUDIT_LOG { int audit_id PK int user_id FK string action string table_name string old_values string new_values datetime created_at string ip_address } USERS_SECURE ||--o{ USER_ROLES : "has" USERS_SECURE ||--o{ SENSITIVE_DATA : "owns" USERS_SECURE ||--o{ AUDIT_LOG : "performs" ROLES ||--o{ USER_ROLES : "assigned_to" ROLES ||--o{ ROLE_PERMISSIONS : "has" PERMISSIONS ||--o{ ROLE_PERMISSIONS : "granted_via"

7. Performance Optimization Examples

❌ Bad Example: Inefficient Query Pattern

sequenceDiagram participant App as Application participant DB as Database Note over App,DB: N+1 Query Problem App->>DB: SELECT * FROM orders WHERE customer_id = ? loop For each order App->>DB: SELECT * FROM customers WHERE id = ? App->>DB: SELECT * FROM products WHERE id = ? end Note over App,DB: 1 + N + N queries instead of 1

✅ Good Example: Optimized Query Patterns

sequenceDiagram participant App as Application participant DB as Database Note over App,DB: Efficient JOIN Query App->>DB: SELECT o.*, c.name, p.product_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.customer_id = ? Note over App,DB: Single optimized query

8. Temporal Data Handling

❌ Bad Example: Poor Audit Trail

erDiagram PRODUCTS_BAD { int id PK string name decimal price int quantity string last_updated_by datetime last_updated_at }

Problems:

No history of changes
Can't track who changed what when
Can't restore previous values
Limited audit capabilities

✅ Good Example: Comprehensive Audit System

erDiagram PRODUCTS { int product_id PK string name decimal current_price int current_quantity string status datetime created_at string created_by datetime updated_at string updated_by int version } PRODUCT_HISTORY { int history_id PK int product_id FK string name decimal price int quantity string status datetime valid_from datetime valid_to string changed_by string change_reason string operation_type } PRICE_HISTORY { int price_history_id PK int product_id FK decimal old_price decimal new_price datetime effective_date string reason string approved_by } PRODUCTS ||--o{ PRODUCT_HISTORY : "has_history" PRODUCTS ||--o{ PRICE_HISTORY : "price_changes"

9. Scalability Considerations

❌ Bad Example: Monolithic Design

erDiagram EVERYTHING { int id PK string user_name string user_email string product_name decimal product_price string order_info string payment_details string shipping_address string review_text int review_rating datetime created_at }

✅ Good Example: Microservice-Ready Design

erDiagram %% User Service USERS { int user_id PK string username string email datetime created_at } USER_PROFILES { int profile_id PK int user_id FK string first_name string last_name date date_of_birth } %% Product Service PRODUCTS { int product_id PK string name string description decimal price string category } INVENTORY { int inventory_id PK int product_id FK int quantity string location } %% Order Service ORDERS { int order_id PK int user_id FK decimal total_amount string status datetime created_at } ORDER_ITEMS { int item_id PK int order_id FK int product_id FK int quantity decimal unit_price } %% Review Service REVIEWS { int review_id PK int user_id FK int product_id FK int rating string review_text datetime created_at } USERS ||--o{ USER_PROFILES : "has" USERS ||--o{ ORDERS : "places" USERS ||--o{ REVIEWS : "writes" PRODUCTS ||--o{ INVENTORY : "tracked_in" PRODUCTS ||--o{ ORDER_ITEMS : "included_in" PRODUCTS ||--o{ REVIEWS : "reviewed_in" ORDERS ||--o{ ORDER_ITEMS : "contains"

10. Common Query Anti-patterns

❌ Bad Example: Inefficient Queries

-- Bad: SELECT *
SELECT * FROM products WHERE category = 'Electronics';

-- Bad: No WHERE clause
SELECT customer_name FROM customers ORDER BY created_at;

-- Bad: Functions in WHERE clause
SELECT * FROM orders WHERE YEAR(order_date) = 2023;

-- Bad: Correlated subquery
SELECT * FROM customers c 
WHERE (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) > 5;

✅ Good Example: Optimized Queries

-- Good: Select only needed columns
SELECT product_id, name, price FROM products WHERE category = 'Electronics';

-- Good: Use LIMIT and appropriate WHERE clause
SELECT customer_name FROM customers 
WHERE created_at >= '2023-01-01' 
ORDER BY created_at 
LIMIT 100;

-- Good: Use date ranges instead of functions
SELECT * FROM orders 
WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01';

-- Good: Use JOIN instead of correlated subquery
SELECT DISTINCT c.* 
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.id
HAVING COUNT(o.id) > 5;

Summary

Key Takeaways

Normalize appropriately - Avoid redundancy but don't over-normalize
Use proper data types - Choose the right type for each column
Implement constraints - Ensure data integrity at the database level
Design strategic indexes - Balance query performance with maintenance overhead
Plan for security - Encrypt sensitive data and implement proper access controls
Think about scalability - Design for growth and potential service separation
Document relationships clearly - Use proper foreign keys and junction tables
Optimize queries - Write efficient SQL and avoid common anti-patterns

Design Checklist

[ ] All tables have primary keys
[ ] Foreign key relationships are properly defined
[ ] Data types are appropriate and constrained
[ ] Indexes support common query patterns
[ ] Sensitive data is encrypted
[ ] Audit trails are implemented where needed
[ ] Schema supports business rules
[ ] Design can scale with data growth
[ ] Queries are optimized for performance
[ ] Documentation is complete and current

By following these patterns and avoiding the anti-patterns shown above, you'll create databases that are maintainable, performant, and secure.