Data Structures and Algorithms Guide

Table of Contents

  1. Introduction
  2. Time and Space Complexity
  3. Basic Data Structures
  4. Advanced Data Structures
  5. Fundamental Algorithms
  6. Algorithm Design Paradigms
  7. Problem-Solving Strategies
  8. Common Algorithm Patterns

Introduction

Data structures and algorithms form the foundation of computer science and software engineering. Understanding these concepts is crucial for:

  • Writing efficient code
  • Solving complex problems
  • Technical interviews
  • System design
  • Performance optimization

Data Structure: A way of organizing and storing data to enable efficient access and modification.

Algorithm: A step-by-step procedure for solving a problem or performing a computation.

What are Data Structures?

Data structures are specialized formats for organizing, processing, retrieving, and storing data. They provide a way to manage and organize data efficiently for specific use cases.

Common data structures include:

  • Arrays: Contiguous memory locations for similar elements
  • Linked Lists: Nodes containing data and pointers to next node
  • Stacks: LIFO (Last-In First-Out) structure
  • Queues: FIFO (First-In First-Out) structure
  • Trees: Hierarchical structures (Binary Trees, AVL Trees, B-Trees)
  • Graphs: Nodes connected by edges
  • Hash Tables: Key-value pairs using hash functions
  • Heaps: Complete binary tree with heap property

What are Algorithms?

Algorithms are step-by-step procedures for solving computational problems. They define a sequence of operations to be performed to achieve a specific task.

Common algorithm categories:

  • Sorting: QuickSort, MergeSort, BubbleSort
  • Searching: Binary Search, Linear Search
  • Graph Algorithms: BFS, DFS, Dijkstra's
  • Dynamic Programming: Break problems into subproblems
  • Greedy Algorithms: Make locally optimal choices
  • Divide and Conquer: Break problem into smaller subproblems

Time and Space Complexity

Big O Notation

Big O notation describes the upper bound of algorithm performance as input size grows.

Notation Name Example
O(1) Constant Array access by index
O(log n) Logarithmic Binary search
O(n) Linear Linear search
O(n log n) Linearithmic Merge sort
O(n²) Quadratic Nested loops
O(2^n) Exponential Recursive Fibonacci
O(n!) Factorial Traveling salesman (brute force)

Space Complexity

  • Auxiliary Space: Extra space used by algorithm (excluding input)
  • Space Complexity: Total space used (including input)

Time Complexity Cheat Sheet

Data Structure Access Search Insertion Deletion
Array O(1) O(n) O(n) O(n)
Linked List O(n) O(n) O(1) O(1)
Stack O(n) O(n) O(1) O(1)
Queue O(n) O(n) O(1) O(1)
Hash Table N/A O(1) O(1) O(1)
Binary Search Tree O(log n) O(log n) O(log n) O(log n)
Heap N/A O(n) O(log n) O(log n)
Sorting Algorithm Best Average Worst Space
Bubble Sort O(n) O(n²) O(n²) O(1)
Selection Sort O(n²) O(n²) O(n²) O(1)
Insertion Sort O(n) O(n²) O(n²) O(1)
Quick Sort O(n log n) O(n log n) O(n²) O(log n)
Merge Sort O(n log n) O(n log n) O(n log n) O(n)
Heap Sort O(n log n) O(n log n) O(n log n) O(1)

Basic Data Structures

1. Arrays

Description: Collection of elements stored in contiguous memory locations.

Operations:

  • Access: O(1)
  • Search: O(n)
  • Insertion: O(n) (worst case)
  • Deletion: O(n) (worst case)

Use Cases:

  • When you need random access to elements
  • Fixed-size collections
  • Mathematical operations on sequences

Implementation Example:

# Static array (fixed size)
arr = [1, 2, 3, 4, 5]

# Dynamic array (resizable)
dynamic_arr = []
dynamic_arr.append(1)  # O(1) amortized

2. Linked Lists

Singly Linked List

Description: Linear collection where each element points to the next.

Structure:

class ListNode:
    def __init__(self, val=0, next=None):
        self.val = val
        self.next = next

Operations:

  • Access: O(n)
  • Search: O(n)
  • Insertion: O(1) at head, O(n) at arbitrary position
  • Deletion: O(1) if node reference available, O(n) otherwise

Doubly Linked List

Each node has pointers to both next and previous nodes.

Circular Linked List

Last node points back to the first node.

Use Cases:

  • Dynamic size requirements
  • Frequent insertions/deletions
  • Implementing other data structures (stacks, queues)

3. Stacks

Description: LIFO (Last In, First Out) data structure.

Operations:

  • Push: O(1)
  • Pop: O(1)
  • Peek/Top: O(1)
  • isEmpty: O(1)

Implementation:

class Stack:
    def __init__(self):
        self.items = []
    
    def push(self, item):
        self.items.append(item)
    
    def pop(self):
        if not self.is_empty():
            return self.items.pop()
    
    def peek(self):
        if not self.is_empty():
            return self.items[-1]
    
    def is_empty(self):
        return len(self.items) == 0

Use Cases:

  • Function call management
  • Expression evaluation
  • Undo operations
  • Browser history
  • Depth-First Search (DFS)

4. Queues

Description: FIFO (First In, First Out) data structure.

Operations:

  • Enqueue: O(1)
  • Dequeue: O(1)
  • Front: O(1)
  • isEmpty: O(1)

Types:

  • Simple Queue: Basic FIFO
  • Circular Queue: End connects to beginning
  • Priority Queue: Elements have priorities
  • Deque: Double-ended queue

Implementation:

from collections import deque

class Queue:
    def __init__(self):
        self.items = deque()
    
    def enqueue(self, item):
        self.items.append(item)
    
    def dequeue(self):
        if not self.is_empty():
            return self.items.popleft()
    
    def front(self):
        if not self.is_empty():
            return self.items[0]
    
    def is_empty(self):
        return len(self.items) == 0

Use Cases:

  • Process scheduling
  • Breadth-First Search (BFS)
  • Buffer for data streams
  • Print job management

Advanced Data Structures

1. Trees

Binary Trees

Description: Each node has at most two children (left and right).

Types:

  • Complete Binary Tree: All levels filled except possibly the last
  • Perfect Binary Tree: All internal nodes have two children
  • Balanced Binary Tree: Height difference between subtrees ≤ 1

Traversals:

  • Inorder: Left → Root → Right
  • Preorder: Root → Left → Right
  • Postorder: Left → Right → Root
  • Level Order: Breadth-first traversal

Binary Search Trees (BST)

Properties:

  • Left subtree values < root value
  • Right subtree values > root value
  • Both subtrees are BSTs

Operations (Average/Worst case):

  • Search: O(log n) / O(n)
  • Insertion: O(log n) / O(n)
  • Deletion: O(log n) / O(n)

Implementation:

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

class BST:
    def __init__(self):
        self.root = None
    
    def insert(self, val):
        self.root = self._insert_rec(self.root, val)
    
    def _insert_rec(self, node, val):
        if not node:
            return TreeNode(val)
        
        if val < node.val:
            node.left = self._insert_rec(node.left, val)
        else:
            node.right = self._insert_rec(node.right, val)
        
        return node

Self-Balancing Trees

  • AVL Trees: Height-balanced BST
  • Red-Black Trees: Self-balancing with color properties
  • Splay Trees: Self-adjusting BST

2. Heaps

Description: Complete binary tree with heap property.

Types:

  • Max Heap: Parent ≥ children
  • Min Heap: Parent ≤ children

Operations:

  • Insert: O(log n)
  • Extract Min/Max: O(log n)
  • Peek Min/Max: O(1)
  • Build Heap: O(n)

Implementation:

import heapq

# Min heap (default in Python)
min_heap = []
heapq.heappush(min_heap, 3)
heapq.heappush(min_heap, 1)
heapq.heappush(min_heap, 4)

min_val = heapq.heappop(min_heap)  # Returns 1

# Max heap (negate values)
max_heap = []
heapq.heappush(max_heap, -3)
heapq.heappush(max_heap, -1)
max_val = -heapq.heappop(max_heap)  # Returns 3

Use Cases:

  • Priority queues
  • Dijkstra's algorithm
  • Heap sort
  • Finding k largest/smallest elements

3. Hash Tables

Description: Data structure that maps keys to values using hash functions.

Components:

  • Hash Function: Converts key to array index
  • Collision Resolution: Handle multiple keys mapping to same index

Collision Resolution Methods:

  1. Chaining: Store multiple values in linked list
  2. Open Addressing: Find alternative slots
    • Linear probing
    • Quadratic probing
    • Double hashing

Operations (Average/Worst case):

  • Search: O(1) / O(n)
  • Insert: O(1) / O(n)
  • Delete: O(1) / O(n)

Implementation:

class HashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]
    
    def _hash(self, key):
        return hash(key) % self.size
    
    def put(self, key, value):
        index = self._hash(key)
        bucket = self.table[index]
        
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)
                return
        
        bucket.append((key, value))
    
    def get(self, key):
        index = self._hash(key)
        bucket = self.table[index]
        
        for k, v in bucket:
            if k == key:
                return v
        
        raise KeyError(key)

4. Graphs

Description: Collection of vertices (nodes) connected by edges.

Types:

  • Directed vs Undirected
  • Weighted vs Unweighted
  • Cyclic vs Acyclic

Representations:

  1. Adjacency Matrix: 2D array
  2. Adjacency List: Array of lists
  3. Edge List: List of edges

Implementation:

# Adjacency List
class Graph:
    def __init__(self):
        self.graph = {}
    
    def add_vertex(self, vertex):
        if vertex not in self.graph:
            self.graph[vertex] = []
    
    def add_edge(self, vertex1, vertex2):
        self.graph[vertex1].append(vertex2)
        self.graph[vertex2].append(vertex1)  # For undirected graph

Fundamental Algorithms

1. Searching Algorithms

Description: Check each element sequentially.

  • Time Complexity: O(n)
  • Space Complexity: O(1)
def linear_search(arr, target):
    for i, val in enumerate(arr):
        if val == target:
            return i
    return -1

Description: Divide and conquer on sorted array.

  • Time Complexity: O(log n)
  • Space Complexity: O(1) iterative, O(log n) recursive
def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    
    return -1

2. Sorting Algorithms

Bubble Sort

  • Time Complexity: O(n²)
  • Space Complexity: O(1)
  • Stable: Yes
def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        for j in range(0, n - i - 1):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]

Quick Sort

  • Time Complexity: O(n log n) average, O(n²) worst
  • Space Complexity: O(log n)
  • Stable: No
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    
    return quick_sort(left) + middle + quick_sort(right)

Merge Sort

  • Time Complexity: O(n log n)
  • Space Complexity: O(n)
  • Stable: Yes
def merge_sort(arr):
    if len(arr) <= 1:
        return arr
    
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    
    return merge(left, right)

def merge(left, right):
    result = []
    i = j = 0
    
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    
    result.extend(left[i:])
    result.extend(right[j:])
    return result

3. Graph Algorithms

Depth-First Search (DFS)

def dfs_recursive(graph, start, visited=None):
    if visited is None:
        visited = set()
    
    visited.add(start)
    print(start)
    
    for neighbor in graph[start]:
        if neighbor not in visited:
            dfs_recursive(graph, neighbor, visited)

def dfs_iterative(graph, start):
    visited = set()
    stack = [start]
    
    while stack:
        vertex = stack.pop()
        if vertex not in visited:
            visited.add(vertex)
            print(vertex)
            stack.extend(neighbor for neighbor in graph[vertex] 
                        if neighbor not in visited)

Breadth-First Search (BFS)

from collections import deque

def bfs(graph, start):
    visited = set()
    queue = deque([start])
    visited.add(start)
    
    while queue:
        vertex = queue.popleft()
        print(vertex)
        
        for neighbor in graph[vertex]:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)

Dijkstra's Algorithm (Shortest Path)

import heapq

def dijkstra(graph, start):
    distances = {node: float('inf') for node in graph}
    distances[start] = 0
    pq = [(0, start)]
    
    while pq:
        current_distance, current = heapq.heappop(pq)
        
        if current_distance > distances[current]:
            continue
        
        for neighbor, weight in graph[current]:
            distance = current_distance + weight
            
            if distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(pq, (distance, neighbor))
    
    return distances

Algorithm Design Paradigms

1. Divide and Conquer

Strategy: Break problem into smaller subproblems, solve recursively, combine results.

Examples:

  • Merge Sort
  • Quick Sort
  • Binary Search
  • Maximum Subarray Problem

Template:

def divide_and_conquer(problem):
    # Base case
    if problem_is_small(problem):
        return solve_directly(problem)
    
    # Divide
    subproblems = divide(problem)
    
    # Conquer
    results = [divide_and_conquer(subproblem) for subproblem in subproblems]
    
    # Combine
    return combine(results)

2. Dynamic Programming

Strategy: Break problem into overlapping subproblems, store results to avoid recomputation.

Types:

  • Memoization: Top-down approach
  • Tabulation: Bottom-up approach

Examples:

  • Fibonacci Sequence
  • Longest Common Subsequence
  • 0/1 Knapsack Problem
  • Coin Change Problem

Fibonacci Example:

# Memoization (Top-down)
def fib_memo(n, memo={}):
    if n in memo:
        return memo[n]
    if n <= 2:
        return 1
    
    memo[n] = fib_memo(n-1, memo) + fib_memo(n-2, memo)
    return memo[n]

# Tabulation (Bottom-up)
def fib_tab(n):
    if n <= 2:
        return 1
    
    dp = [0] * (n + 1)
    dp[1] = dp[2] = 1
    
    for i in range(3, n + 1):
        dp[i] = dp[i-1] + dp[i-2]
    
    return dp[n]

3. Greedy Algorithms

Strategy: Make locally optimal choice at each step.

Examples:

  • Activity Selection Problem
  • Huffman Coding
  • Fractional Knapsack
  • Dijkstra's Algorithm

Activity Selection Example:

def activity_selection(activities):
    # Sort by end time
    activities.sort(key=lambda x: x[1])
    
    selected = [activities[0]]
    last_end_time = activities[0][1]
    
    for start, end in activities[1:]:
        if start >= last_end_time:
            selected.append((start, end))
            last_end_time = end
    
    return selected

4. Backtracking

Strategy: Build solution incrementally, abandon candidates that cannot lead to valid solution.

Examples:

  • N-Queens Problem
  • Sudoku Solver
  • Graph Coloring
  • Subset Sum

N-Queens Example:

def solve_n_queens(n):
    def is_safe(board, row, col):
        # Check column
        for i in range(row):
            if board[i] == col:
                return False
        
        # Check diagonals
        for i in range(row):
            if abs(board[i] - col) == abs(i - row):
                return False
        
        return True
    
    def backtrack(board, row):
        if row == n:
            return [board[:]]
        
        solutions = []
        for col in range(n):
            if is_safe(board, row, col):
                board[row] = col
                solutions.extend(backtrack(board, row + 1))
        
        return solutions
    
    return backtrack([0] * n, 0)

Problem-Solving Strategies

1. Two Pointers Technique

Use Cases:

  • Sorted arrays
  • Finding pairs with specific sum
  • Palindrome checking
  • Container with most water
def two_sum_sorted(arr, target):
    left, right = 0, len(arr) - 1
    
    while left < right:
        current_sum = arr[left] + arr[right]
        if current_sum == target:
            return [left, right]
        elif current_sum < target:
            left += 1
        else:
            right -= 1
    
    return [-1, -1]

2. Sliding Window

Use Cases:

  • Subarray problems
  • String problems
  • Finding maximum/minimum in window
def max_sum_subarray(arr, k):
    if len(arr) < k:
        return -1
    
    # Calculate sum of first window
    window_sum = sum(arr[:k])
    max_sum = window_sum
    
    # Slide the window
    for i in range(k, len(arr)):
        window_sum = window_sum - arr[i-k] + arr[i]
        max_sum = max(max_sum, window_sum)
    
    return max_sum

3. Fast and Slow Pointers (Floyd's Algorithm)

Use Cases:

  • Cycle detection in linked lists
  • Finding middle of linked list
  • Detecting loop start
def has_cycle(head):
    if not head or not head.next:
        return False
    
    slow = fast = head
    
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next
        
        if slow == fast:
            return True
    
    return False

4. Binary Search Variations

Use Cases:

  • Finding insertion point
  • Search in rotated array
  • Finding peak element
def find_peak_element(nums):
    left, right = 0, len(nums) - 1
    
    while left < right:
        mid = (left + right) // 2
        
        if nums[mid] > nums[mid + 1]:
            right = mid
        else:
            left = mid + 1
    
    return left

Common Algorithm Patterns

1. Tree Patterns

  • Tree Traversal: DFS (preorder, inorder, postorder), BFS
  • Tree Construction: From traversals
  • Lowest Common Ancestor
  • Path Sum Problems

2. Graph Patterns

  • Shortest Path: Dijkstra, Bellman-Ford, Floyd-Warshall
  • Minimum Spanning Tree: Kruskal, Prim
  • Topological Sort: DFS-based, Kahn's algorithm
  • Connected Components: Union-Find

3. String Patterns

  • Pattern Matching: KMP, Rabin-Karp
  • Palindrome Problems: Expand around center, Manacher's
  • Anagram Problems: Sorting, frequency counting
  • Subsequence Problems: DP approaches