Data Structures and Algorithms Guide
Table of Contents
- Introduction
- Time and Space Complexity
- Basic Data Structures
- Advanced Data Structures
- Fundamental Algorithms
- Algorithm Design Paradigms
- Problem-Solving Strategies
- Common Algorithm Patterns
Introduction
Data structures and algorithms form the foundation of computer science and software engineering. Understanding these concepts is crucial for:
- Writing efficient code
- Solving complex problems
- Technical interviews
- System design
- Performance optimization
Data Structure: A way of organizing and storing data to enable efficient access and modification.
Algorithm: A step-by-step procedure for solving a problem or performing a computation.
What are Data Structures?
Data structures are specialized formats for organizing, processing, retrieving, and storing data. They provide a way to manage and organize data efficiently for specific use cases.
Common data structures include:
- Arrays: Contiguous memory locations for similar elements
- Linked Lists: Nodes containing data and pointers to next node
- Stacks: LIFO (Last-In First-Out) structure
- Queues: FIFO (First-In First-Out) structure
- Trees: Hierarchical structures (Binary Trees, AVL Trees, B-Trees)
- Graphs: Nodes connected by edges
- Hash Tables: Key-value pairs using hash functions
- Heaps: Complete binary tree with heap property
What are Algorithms?
Algorithms are step-by-step procedures for solving computational problems. They define a sequence of operations to be performed to achieve a specific task.
Common algorithm categories:
- Sorting: QuickSort, MergeSort, BubbleSort
- Searching: Binary Search, Linear Search
- Graph Algorithms: BFS, DFS, Dijkstra's
- Dynamic Programming: Break problems into subproblems
- Greedy Algorithms: Make locally optimal choices
- Divide and Conquer: Break problem into smaller subproblems
Time and Space Complexity
Big O Notation
Big O notation describes the upper bound of algorithm performance as input size grows.
Notation | Name | Example |
---|---|---|
O(1) | Constant | Array access by index |
O(log n) | Logarithmic | Binary search |
O(n) | Linear | Linear search |
O(n log n) | Linearithmic | Merge sort |
O(n²) | Quadratic | Nested loops |
O(2^n) | Exponential | Recursive Fibonacci |
O(n!) | Factorial | Traveling salesman (brute force) |
Space Complexity
- Auxiliary Space: Extra space used by algorithm (excluding input)
- Space Complexity: Total space used (including input)
Time Complexity Cheat Sheet
Data Structure | Access | Search | Insertion | Deletion |
---|---|---|---|---|
Array | O(1) | O(n) | O(n) | O(n) |
Linked List | O(n) | O(n) | O(1) | O(1) |
Stack | O(n) | O(n) | O(1) | O(1) |
Queue | O(n) | O(n) | O(1) | O(1) |
Hash Table | N/A | O(1) | O(1) | O(1) |
Binary Search Tree | O(log n) | O(log n) | O(log n) | O(log n) |
Heap | N/A | O(n) | O(log n) | O(log n) |
Sorting Algorithm | Best | Average | Worst | Space |
---|---|---|---|---|
Bubble Sort | O(n) | O(n²) | O(n²) | O(1) |
Selection Sort | O(n²) | O(n²) | O(n²) | O(1) |
Insertion Sort | O(n) | O(n²) | O(n²) | O(1) |
Quick Sort | O(n log n) | O(n log n) | O(n²) | O(log n) |
Merge Sort | O(n log n) | O(n log n) | O(n log n) | O(n) |
Heap Sort | O(n log n) | O(n log n) | O(n log n) | O(1) |
Basic Data Structures
1. Arrays
Description: Collection of elements stored in contiguous memory locations.
Operations:
- Access: O(1)
- Search: O(n)
- Insertion: O(n) (worst case)
- Deletion: O(n) (worst case)
Use Cases:
- When you need random access to elements
- Fixed-size collections
- Mathematical operations on sequences
Implementation Example:
# Static array (fixed size)
arr = [1, 2, 3, 4, 5]
# Dynamic array (resizable)
dynamic_arr = []
dynamic_arr.append(1) # O(1) amortized
2. Linked Lists
Singly Linked List
Description: Linear collection where each element points to the next.
Structure:
class ListNode:
def __init__(self, val=0, next=None):
self.val = val
self.next = next
Operations:
- Access: O(n)
- Search: O(n)
- Insertion: O(1) at head, O(n) at arbitrary position
- Deletion: O(1) if node reference available, O(n) otherwise
Doubly Linked List
Each node has pointers to both next and previous nodes.
Circular Linked List
Last node points back to the first node.
Use Cases:
- Dynamic size requirements
- Frequent insertions/deletions
- Implementing other data structures (stacks, queues)
3. Stacks
Description: LIFO (Last In, First Out) data structure.
Operations:
- Push: O(1)
- Pop: O(1)
- Peek/Top: O(1)
- isEmpty: O(1)
Implementation:
class Stack:
def __init__(self):
self.items = []
def push(self, item):
self.items.append(item)
def pop(self):
if not self.is_empty():
return self.items.pop()
def peek(self):
if not self.is_empty():
return self.items[-1]
def is_empty(self):
return len(self.items) == 0
Use Cases:
- Function call management
- Expression evaluation
- Undo operations
- Browser history
- Depth-First Search (DFS)
4. Queues
Description: FIFO (First In, First Out) data structure.
Operations:
- Enqueue: O(1)
- Dequeue: O(1)
- Front: O(1)
- isEmpty: O(1)
Types:
- Simple Queue: Basic FIFO
- Circular Queue: End connects to beginning
- Priority Queue: Elements have priorities
- Deque: Double-ended queue
Implementation:
from collections import deque
class Queue:
def __init__(self):
self.items = deque()
def enqueue(self, item):
self.items.append(item)
def dequeue(self):
if not self.is_empty():
return self.items.popleft()
def front(self):
if not self.is_empty():
return self.items[0]
def is_empty(self):
return len(self.items) == 0
Use Cases:
- Process scheduling
- Breadth-First Search (BFS)
- Buffer for data streams
- Print job management
Advanced Data Structures
1. Trees
Binary Trees
Description: Each node has at most two children (left and right).
Types:
- Complete Binary Tree: All levels filled except possibly the last
- Perfect Binary Tree: All internal nodes have two children
- Balanced Binary Tree: Height difference between subtrees ≤ 1
Traversals:
- Inorder: Left → Root → Right
- Preorder: Root → Left → Right
- Postorder: Left → Right → Root
- Level Order: Breadth-first traversal
Binary Search Trees (BST)
Properties:
- Left subtree values < root value
- Right subtree values > root value
- Both subtrees are BSTs
Operations (Average/Worst case):
- Search: O(log n) / O(n)
- Insertion: O(log n) / O(n)
- Deletion: O(log n) / O(n)
Implementation:
class TreeNode:
def __init__(self, val=0, left=None, right=None):
self.val = val
self.left = left
self.right = right
class BST:
def __init__(self):
self.root = None
def insert(self, val):
self.root = self._insert_rec(self.root, val)
def _insert_rec(self, node, val):
if not node:
return TreeNode(val)
if val < node.val:
node.left = self._insert_rec(node.left, val)
else:
node.right = self._insert_rec(node.right, val)
return node
Self-Balancing Trees
- AVL Trees: Height-balanced BST
- Red-Black Trees: Self-balancing with color properties
- Splay Trees: Self-adjusting BST
2. Heaps
Description: Complete binary tree with heap property.
Types:
- Max Heap: Parent ≥ children
- Min Heap: Parent ≤ children
Operations:
- Insert: O(log n)
- Extract Min/Max: O(log n)
- Peek Min/Max: O(1)
- Build Heap: O(n)
Implementation:
import heapq
# Min heap (default in Python)
min_heap = []
heapq.heappush(min_heap, 3)
heapq.heappush(min_heap, 1)
heapq.heappush(min_heap, 4)
min_val = heapq.heappop(min_heap) # Returns 1
# Max heap (negate values)
max_heap = []
heapq.heappush(max_heap, -3)
heapq.heappush(max_heap, -1)
max_val = -heapq.heappop(max_heap) # Returns 3
Use Cases:
- Priority queues
- Dijkstra's algorithm
- Heap sort
- Finding k largest/smallest elements
3. Hash Tables
Description: Data structure that maps keys to values using hash functions.
Components:
- Hash Function: Converts key to array index
- Collision Resolution: Handle multiple keys mapping to same index
Collision Resolution Methods:
- Chaining: Store multiple values in linked list
- Open Addressing: Find alternative slots
- Linear probing
- Quadratic probing
- Double hashing
Operations (Average/Worst case):
- Search: O(1) / O(n)
- Insert: O(1) / O(n)
- Delete: O(1) / O(n)
Implementation:
class HashTable:
def __init__(self, size=10):
self.size = size
self.table = [[] for _ in range(size)]
def _hash(self, key):
return hash(key) % self.size
def put(self, key, value):
index = self._hash(key)
bucket = self.table[index]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value)
return
bucket.append((key, value))
def get(self, key):
index = self._hash(key)
bucket = self.table[index]
for k, v in bucket:
if k == key:
return v
raise KeyError(key)
4. Graphs
Description: Collection of vertices (nodes) connected by edges.
Types:
- Directed vs Undirected
- Weighted vs Unweighted
- Cyclic vs Acyclic
Representations:
- Adjacency Matrix: 2D array
- Adjacency List: Array of lists
- Edge List: List of edges
Implementation:
# Adjacency List
class Graph:
def __init__(self):
self.graph = {}
def add_vertex(self, vertex):
if vertex not in self.graph:
self.graph[vertex] = []
def add_edge(self, vertex1, vertex2):
self.graph[vertex1].append(vertex2)
self.graph[vertex2].append(vertex1) # For undirected graph
Fundamental Algorithms
1. Searching Algorithms
Linear Search
Description: Check each element sequentially.
- Time Complexity: O(n)
- Space Complexity: O(1)
def linear_search(arr, target):
for i, val in enumerate(arr):
if val == target:
return i
return -1
Binary Search
Description: Divide and conquer on sorted array.
- Time Complexity: O(log n)
- Space Complexity: O(1) iterative, O(log n) recursive
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
2. Sorting Algorithms
Bubble Sort
- Time Complexity: O(n²)
- Space Complexity: O(1)
- Stable: Yes
def bubble_sort(arr):
n = len(arr)
for i in range(n):
for j in range(0, n - i - 1):
if arr[j] > arr[j + 1]:
arr[j], arr[j + 1] = arr[j + 1], arr[j]
Quick Sort
- Time Complexity: O(n log n) average, O(n²) worst
- Space Complexity: O(log n)
- Stable: No
def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quick_sort(left) + middle + quick_sort(right)
Merge Sort
- Time Complexity: O(n log n)
- Space Complexity: O(n)
- Stable: Yes
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
def merge(left, right):
result = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result.extend(left[i:])
result.extend(right[j:])
return result
3. Graph Algorithms
Depth-First Search (DFS)
def dfs_recursive(graph, start, visited=None):
if visited is None:
visited = set()
visited.add(start)
print(start)
for neighbor in graph[start]:
if neighbor not in visited:
dfs_recursive(graph, neighbor, visited)
def dfs_iterative(graph, start):
visited = set()
stack = [start]
while stack:
vertex = stack.pop()
if vertex not in visited:
visited.add(vertex)
print(vertex)
stack.extend(neighbor for neighbor in graph[vertex]
if neighbor not in visited)
Breadth-First Search (BFS)
from collections import deque
def bfs(graph, start):
visited = set()
queue = deque([start])
visited.add(start)
while queue:
vertex = queue.popleft()
print(vertex)
for neighbor in graph[vertex]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
Dijkstra's Algorithm (Shortest Path)
import heapq
def dijkstra(graph, start):
distances = {node: float('inf') for node in graph}
distances[start] = 0
pq = [(0, start)]
while pq:
current_distance, current = heapq.heappop(pq)
if current_distance > distances[current]:
continue
for neighbor, weight in graph[current]:
distance = current_distance + weight
if distance < distances[neighbor]:
distances[neighbor] = distance
heapq.heappush(pq, (distance, neighbor))
return distances
Algorithm Design Paradigms
1. Divide and Conquer
Strategy: Break problem into smaller subproblems, solve recursively, combine results.
Examples:
- Merge Sort
- Quick Sort
- Binary Search
- Maximum Subarray Problem
Template:
def divide_and_conquer(problem):
# Base case
if problem_is_small(problem):
return solve_directly(problem)
# Divide
subproblems = divide(problem)
# Conquer
results = [divide_and_conquer(subproblem) for subproblem in subproblems]
# Combine
return combine(results)
2. Dynamic Programming
Strategy: Break problem into overlapping subproblems, store results to avoid recomputation.
Types:
- Memoization: Top-down approach
- Tabulation: Bottom-up approach
Examples:
- Fibonacci Sequence
- Longest Common Subsequence
- 0/1 Knapsack Problem
- Coin Change Problem
Fibonacci Example:
# Memoization (Top-down)
def fib_memo(n, memo={}):
if n in memo:
return memo[n]
if n <= 2:
return 1
memo[n] = fib_memo(n-1, memo) + fib_memo(n-2, memo)
return memo[n]
# Tabulation (Bottom-up)
def fib_tab(n):
if n <= 2:
return 1
dp = [0] * (n + 1)
dp[1] = dp[2] = 1
for i in range(3, n + 1):
dp[i] = dp[i-1] + dp[i-2]
return dp[n]
3. Greedy Algorithms
Strategy: Make locally optimal choice at each step.
Examples:
- Activity Selection Problem
- Huffman Coding
- Fractional Knapsack
- Dijkstra's Algorithm
Activity Selection Example:
def activity_selection(activities):
# Sort by end time
activities.sort(key=lambda x: x[1])
selected = [activities[0]]
last_end_time = activities[0][1]
for start, end in activities[1:]:
if start >= last_end_time:
selected.append((start, end))
last_end_time = end
return selected
4. Backtracking
Strategy: Build solution incrementally, abandon candidates that cannot lead to valid solution.
Examples:
- N-Queens Problem
- Sudoku Solver
- Graph Coloring
- Subset Sum
N-Queens Example:
def solve_n_queens(n):
def is_safe(board, row, col):
# Check column
for i in range(row):
if board[i] == col:
return False
# Check diagonals
for i in range(row):
if abs(board[i] - col) == abs(i - row):
return False
return True
def backtrack(board, row):
if row == n:
return [board[:]]
solutions = []
for col in range(n):
if is_safe(board, row, col):
board[row] = col
solutions.extend(backtrack(board, row + 1))
return solutions
return backtrack([0] * n, 0)
Problem-Solving Strategies
1. Two Pointers Technique
Use Cases:
- Sorted arrays
- Finding pairs with specific sum
- Palindrome checking
- Container with most water
def two_sum_sorted(arr, target):
left, right = 0, len(arr) - 1
while left < right:
current_sum = arr[left] + arr[right]
if current_sum == target:
return [left, right]
elif current_sum < target:
left += 1
else:
right -= 1
return [-1, -1]
2. Sliding Window
Use Cases:
- Subarray problems
- String problems
- Finding maximum/minimum in window
def max_sum_subarray(arr, k):
if len(arr) < k:
return -1
# Calculate sum of first window
window_sum = sum(arr[:k])
max_sum = window_sum
# Slide the window
for i in range(k, len(arr)):
window_sum = window_sum - arr[i-k] + arr[i]
max_sum = max(max_sum, window_sum)
return max_sum
3. Fast and Slow Pointers (Floyd's Algorithm)
Use Cases:
- Cycle detection in linked lists
- Finding middle of linked list
- Detecting loop start
def has_cycle(head):
if not head or not head.next:
return False
slow = fast = head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
if slow == fast:
return True
return False
4. Binary Search Variations
Use Cases:
- Finding insertion point
- Search in rotated array
- Finding peak element
def find_peak_element(nums):
left, right = 0, len(nums) - 1
while left < right:
mid = (left + right) // 2
if nums[mid] > nums[mid + 1]:
right = mid
else:
left = mid + 1
return left
Common Algorithm Patterns
1. Tree Patterns
- Tree Traversal: DFS (preorder, inorder, postorder), BFS
- Tree Construction: From traversals
- Lowest Common Ancestor
- Path Sum Problems
2. Graph Patterns
- Shortest Path: Dijkstra, Bellman-Ford, Floyd-Warshall
- Minimum Spanning Tree: Kruskal, Prim
- Topological Sort: DFS-based, Kahn's algorithm
- Connected Components: Union-Find
3. String Patterns
- Pattern Matching: KMP, Rabin-Karp
- Palindrome Problems: Expand around center, Manacher's
- Anagram Problems: Sorting, frequency counting
- Subsequence Problems: DP approaches