Database Design: Good vs Bad Examples with Diagrams
Overview
This document provides practical examples of good and bad database design patterns, illustrated with MermaidJS diagrams. Each example shows the problem, explains why it's problematic, and demonstrates the correct approach.
1. Table Structure Design
❌ Bad Example: Denormalized Customer Table
erDiagram
CUSTOMERS_BAD {
int customer_id PK
string first_name
string last_name
string email
string phone
string address1
string address2
string city
string state
string zip
string country
string order1_product
string order1_date
decimal order1_amount
string order2_product
string order2_date
decimal order2_amount
string order3_product
string order3_date
decimal order3_amount
}
Problems:
- Repeating groups (order1, order2, order3)
- Wasted space when customers have fewer than 3 orders
- Difficult to add more orders without schema changes
- Data redundancy and update anomalies
✅ Good Example: Normalized Design
erDiagram
CUSTOMERS {
int customer_id PK
string first_name
string last_name
string email
string phone
datetime created_at
datetime updated_at
}
ADDRESSES {
int address_id PK
int customer_id FK
string address_type
string street_address
string city
string state
string postal_code
string country
boolean is_primary
}
ORDERS {
int order_id PK
int customer_id FK
datetime order_date
decimal total_amount
string status
}
ORDER_ITEMS {
int order_item_id PK
int order_id FK
int product_id FK
int quantity
decimal unit_price
decimal total_price
}
PRODUCTS {
int product_id PK
string product_name
string description
decimal price
int stock_quantity
}
CUSTOMERS ||--o{ ADDRESSES : "has"
CUSTOMERS ||--o{ ORDERS : "places"
ORDERS ||--o{ ORDER_ITEMS : "contains"
PRODUCTS ||--o{ ORDER_ITEMS : "included_in"
Benefits:
- No data redundancy
- Flexible number of orders per customer
- Easy to add new attributes
- Maintains data integrity
- Supports complex queries efficiently
2. Relationship Design Patterns
❌ Bad Example: Poor Many-to-Many Implementation
erDiagram
STUDENTS_BAD {
int student_id PK
string name
string email
string course1
string course2
string course3
string course4
string course5
}
COURSES_BAD {
int course_id PK
string course_name
string student1
string student2
string student3
string student4
string student5
}
Problems:
- Limited number of courses per student
- Limited number of students per course
- Data redundancy
- Difficult to query relationships
- No additional relationship attributes (enrollment date, grade)
✅ Good Example: Proper Junction Table
erDiagram
STUDENTS {
int student_id PK
string first_name
string last_name
string email
date date_of_birth
datetime created_at
}
COURSES {
int course_id PK
string course_code
string course_name
string description
int credits
int max_enrollment
}
ENROLLMENTS {
int enrollment_id PK
int student_id FK
int course_id FK
date enrollment_date
string grade
string status
datetime created_at
}
INSTRUCTORS {
int instructor_id PK
string first_name
string last_name
string email
string department
}
COURSE_INSTRUCTORS {
int course_instructor_id PK
int course_id FK
int instructor_id FK
string role
datetime assigned_date
}
STUDENTS ||--o{ ENROLLMENTS : "enrolls_in"
COURSES ||--o{ ENROLLMENTS : "has_enrollment"
COURSES ||--o{ COURSE_INSTRUCTORS : "taught_by"
INSTRUCTORS ||--o{ COURSE_INSTRUCTORS : "teaches"
Benefits:
- Unlimited students per course and courses per student
- Additional relationship attributes
- Clean separation of concerns
- Easy to track enrollment history
- Supports complex reporting
3. Data Type and Constraint Issues
❌ Bad Example: Poor Data Types and Missing Constraints
-- Bad table design
CREATE TABLE users_bad (
id TEXT, -- Should be INT with AUTO_INCREMENT
username TEXT, -- No length limit, no uniqueness
email TEXT, -- No validation, no uniqueness
age TEXT, -- Should be INT with constraints
salary TEXT, -- Should be DECIMAL
is_active TEXT, -- Should be BOOLEAN
created_date TEXT, -- Should be DATETIME
phone TEXT, -- No format validation
password TEXT -- No encryption, poor security
);
✅ Good Example: Proper Data Types and Constraints
-- Good table design
CREATE TABLE users_good (
id INT PRIMARY KEY AUTO_INCREMENT,
username VARCHAR(50) NOT NULL UNIQUE,
email VARCHAR(254) NOT NULL UNIQUE,
age INT CHECK (age >= 0 AND age <= 150),
salary DECIMAL(10,2) CHECK (salary >= 0),
is_active BOOLEAN NOT NULL DEFAULT TRUE,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME ON UPDATE CURRENT_TIMESTAMP,
phone VARCHAR(20),
password_hash VARCHAR(255) NOT NULL,
INDEX idx_username (username),
INDEX idx_email (email),
INDEX idx_created_at (created_at)
);
4. Hierarchical Data Anti-patterns
❌ Bad Example: Adjacency List with Poor Structure
erDiagram
CATEGORIES_BAD {
int id PK
string name
int parent_id
string full_path
int level
}
Problems:
full_path
andlevel
are derived data (redundant)- Difficult to maintain consistency
- Performance issues with deep hierarchies
- Update anomalies when moving subtrees
✅ Good Example: Clean Adjacency List with Computed Paths
erDiagram
CATEGORIES {
int category_id PK
string name
string slug
int parent_id FK
string description
boolean is_active
int sort_order
datetime created_at
datetime updated_at
}
CATEGORY_PATHS {
int path_id PK
int ancestor_id FK
int descendant_id FK
int depth
}
CATEGORIES ||--o{ CATEGORIES : "parent_of"
CATEGORIES ||--o{ CATEGORY_PATHS : "ancestor"
CATEGORIES ||--o{ CATEGORY_PATHS : "descendant"
Benefits:
- Clean base table without redundancy
- Separate path table for efficient queries
- Easy to find all ancestors/descendants
- Maintains referential integrity
5. Indexing Strategies
❌ Bad Example: Poor Indexing
graph TD
A[Table: orders
1M rows] --> B[No indexes except PK] B --> C[Query: WHERE customer_id = 123
Full table scan] C --> D[Performance: 2+ seconds] E[Query: WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'
Full table scan] --> F[Performance: 3+ seconds] G[Query: ORDER BY order_date DESC
Expensive sort operation] --> H[Performance: 4+ seconds]
1M rows] --> B[No indexes except PK] B --> C[Query: WHERE customer_id = 123
Full table scan] C --> D[Performance: 2+ seconds] E[Query: WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'
Full table scan] --> F[Performance: 3+ seconds] G[Query: ORDER BY order_date DESC
Expensive sort operation] --> H[Performance: 4+ seconds]
✅ Good Example: Strategic Indexing
graph TD
A[Table: orders
1M rows] --> B[Well-designed indexes] B --> C[IX_customer_id
IX_order_date
IX_status_date
IX_customer_status] C --> D[Query: WHERE customer_id = 123
Index seek] D --> E[Performance: 10ms] C --> F[Query: WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'
Index range scan] F --> G[Performance: 50ms] C --> H[Query: ORDER BY order_date DESC
Index order] H --> I[Performance: 20ms]
1M rows] --> B[Well-designed indexes] B --> C[IX_customer_id
IX_order_date
IX_status_date
IX_customer_status] C --> D[Query: WHERE customer_id = 123
Index seek] D --> E[Performance: 10ms] C --> F[Query: WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'
Index range scan] F --> G[Performance: 50ms] C --> H[Query: ORDER BY order_date DESC
Index order] H --> I[Performance: 20ms]
Index Strategy:
-- Strategic indexes
CREATE INDEX ix_orders_customer_id ON orders(customer_id);
CREATE INDEX ix_orders_order_date ON orders(order_date);
CREATE INDEX ix_orders_status_date ON orders(status, order_date);
CREATE INDEX ix_orders_customer_status ON orders(customer_id, status, order_date);
6. Security Anti-patterns
❌ Bad Example: Security Vulnerabilities
erDiagram
USERS_INSECURE {
int id PK
string username
string password_plain
string ssn
string credit_card
string api_key
string role
string permissions_csv
}
Problems:
- Plain text passwords
- Sensitive data unencrypted
- Permissions stored as CSV string
- No audit trail
- Single role per user
✅ Good Example: Secure Design
erDiagram
USERS_SECURE {
int user_id PK
string username
string email
string password_hash
string salt
datetime last_login
boolean is_locked
datetime created_at
string created_by
}
SENSITIVE_DATA {
int sensitive_id PK
int user_id FK
string encrypted_ssn
string encrypted_cc_hash
string encryption_key_id
datetime created_at
}
ROLES {
int role_id PK
string role_name
string description
boolean is_active
}
PERMISSIONS {
int permission_id PK
string permission_name
string resource
string action
string description
}
USER_ROLES {
int user_role_id PK
int user_id FK
int role_id FK
datetime assigned_at
string assigned_by
datetime expires_at
}
ROLE_PERMISSIONS {
int role_permission_id PK
int role_id FK
int permission_id FK
datetime granted_at
string granted_by
}
AUDIT_LOG {
int audit_id PK
int user_id FK
string action
string table_name
string old_values
string new_values
datetime created_at
string ip_address
}
USERS_SECURE ||--o{ USER_ROLES : "has"
USERS_SECURE ||--o{ SENSITIVE_DATA : "owns"
USERS_SECURE ||--o{ AUDIT_LOG : "performs"
ROLES ||--o{ USER_ROLES : "assigned_to"
ROLES ||--o{ ROLE_PERMISSIONS : "has"
PERMISSIONS ||--o{ ROLE_PERMISSIONS : "granted_via"
7. Performance Optimization Examples
❌ Bad Example: Inefficient Query Pattern
sequenceDiagram
participant App as Application
participant DB as Database
Note over App,DB: N+1 Query Problem
App->>DB: SELECT * FROM orders WHERE customer_id = ?
loop For each order
App->>DB: SELECT * FROM customers WHERE id = ?
App->>DB: SELECT * FROM products WHERE id = ?
end
Note over App,DB: 1 + N + N queries instead of 1
✅ Good Example: Optimized Query Patterns
sequenceDiagram
participant App as Application
participant DB as Database
Note over App,DB: Efficient JOIN Query
App->>DB: SELECT o.*, c.name, p.product_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.customer_id = ? Note over App,DB: Single optimized query
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.customer_id = ? Note over App,DB: Single optimized query
8. Temporal Data Handling
❌ Bad Example: Poor Audit Trail
erDiagram
PRODUCTS_BAD {
int id PK
string name
decimal price
int quantity
string last_updated_by
datetime last_updated_at
}
Problems:
- No history of changes
- Can't track who changed what when
- Can't restore previous values
- Limited audit capabilities
✅ Good Example: Comprehensive Audit System
erDiagram
PRODUCTS {
int product_id PK
string name
decimal current_price
int current_quantity
string status
datetime created_at
string created_by
datetime updated_at
string updated_by
int version
}
PRODUCT_HISTORY {
int history_id PK
int product_id FK
string name
decimal price
int quantity
string status
datetime valid_from
datetime valid_to
string changed_by
string change_reason
string operation_type
}
PRICE_HISTORY {
int price_history_id PK
int product_id FK
decimal old_price
decimal new_price
datetime effective_date
string reason
string approved_by
}
PRODUCTS ||--o{ PRODUCT_HISTORY : "has_history"
PRODUCTS ||--o{ PRICE_HISTORY : "price_changes"
9. Scalability Considerations
❌ Bad Example: Monolithic Design
erDiagram
EVERYTHING {
int id PK
string user_name
string user_email
string product_name
decimal product_price
string order_info
string payment_details
string shipping_address
string review_text
int review_rating
datetime created_at
}
✅ Good Example: Microservice-Ready Design
erDiagram
%% User Service
USERS {
int user_id PK
string username
string email
datetime created_at
}
USER_PROFILES {
int profile_id PK
int user_id FK
string first_name
string last_name
date date_of_birth
}
%% Product Service
PRODUCTS {
int product_id PK
string name
string description
decimal price
string category
}
INVENTORY {
int inventory_id PK
int product_id FK
int quantity
string location
}
%% Order Service
ORDERS {
int order_id PK
int user_id FK
decimal total_amount
string status
datetime created_at
}
ORDER_ITEMS {
int item_id PK
int order_id FK
int product_id FK
int quantity
decimal unit_price
}
%% Review Service
REVIEWS {
int review_id PK
int user_id FK
int product_id FK
int rating
string review_text
datetime created_at
}
USERS ||--o{ USER_PROFILES : "has"
USERS ||--o{ ORDERS : "places"
USERS ||--o{ REVIEWS : "writes"
PRODUCTS ||--o{ INVENTORY : "tracked_in"
PRODUCTS ||--o{ ORDER_ITEMS : "included_in"
PRODUCTS ||--o{ REVIEWS : "reviewed_in"
ORDERS ||--o{ ORDER_ITEMS : "contains"
10. Common Query Anti-patterns
❌ Bad Example: Inefficient Queries
-- Bad: SELECT *
SELECT * FROM products WHERE category = 'Electronics';
-- Bad: No WHERE clause
SELECT customer_name FROM customers ORDER BY created_at;
-- Bad: Functions in WHERE clause
SELECT * FROM orders WHERE YEAR(order_date) = 2023;
-- Bad: Correlated subquery
SELECT * FROM customers c
WHERE (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) > 5;
✅ Good Example: Optimized Queries
-- Good: Select only needed columns
SELECT product_id, name, price FROM products WHERE category = 'Electronics';
-- Good: Use LIMIT and appropriate WHERE clause
SELECT customer_name FROM customers
WHERE created_at >= '2023-01-01'
ORDER BY created_at
LIMIT 100;
-- Good: Use date ranges instead of functions
SELECT * FROM orders
WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01';
-- Good: Use JOIN instead of correlated subquery
SELECT DISTINCT c.*
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.id
HAVING COUNT(o.id) > 5;
Summary
Key Takeaways
- Normalize appropriately - Avoid redundancy but don't over-normalize
- Use proper data types - Choose the right type for each column
- Implement constraints - Ensure data integrity at the database level
- Design strategic indexes - Balance query performance with maintenance overhead
- Plan for security - Encrypt sensitive data and implement proper access controls
- Think about scalability - Design for growth and potential service separation
- Document relationships clearly - Use proper foreign keys and junction tables
- Optimize queries - Write efficient SQL and avoid common anti-patterns
Design Checklist
- [ ] All tables have primary keys
- [ ] Foreign key relationships are properly defined
- [ ] Data types are appropriate and constrained
- [ ] Indexes support common query patterns
- [ ] Sensitive data is encrypted
- [ ] Audit trails are implemented where needed
- [ ] Schema supports business rules
- [ ] Design can scale with data growth
- [ ] Queries are optimized for performance
- [ ] Documentation is complete and current
By following these patterns and avoiding the anti-patterns shown above, you'll create databases that are maintainable, performant, and secure.