Unclassified

Regular Expressions (Regex) - Complete Guide

Introduction
Basic Syntax
Metacharacters
Character Classes
Quantifiers
Anchors
Groups and Capturing
Lookaheads and Lookbehinds
Flags/Modifiers
Common Patterns
Examples
Best Practices
Tools and Testing

Introduction

Regular expressions (regex or regexp) are powerful pattern-matching tools used to search, match, and manipulate text. They provide a concise and flexible way to identify strings of text based on specific patterns.

When to Use Regex

Text validation (emails, phone numbers, passwords)
Data extraction from logs or documents
Search and replace operations
Input sanitization
Parsing structured text

Basic Syntax

Literal Characters

Most characters in regex match themselves literally:

hello

Matches: "hello" in "hello world"

Case Sensitivity

By default, regex is case-sensitive:

Hello

Matches: "Hello" but not "hello"

Metacharacters

Metacharacters have special meanings in regex and need to be escaped with \ to match literally.

Character	Meaning	Example
`.`	Any character except newline	`a.c` matches "abc", "axc"
`^`	Start of string/line	`^hello` matches "hello" at start
`$`	End of string/line	`world$` matches "world" at end
`*`	Zero or more	`ab*` matches "a", "ab", "abbb"
`+`	One or more	`ab+` matches "ab", "abbb" but not "a"
`?`	Zero or one	`ab?` matches "a", "ab"
`\`	Escape character	`\.` matches literal "."
`	`	OR operator
`[]`	Character class	`[abc]` matches "a", "b", or "c"
`()`	Grouping	`(ab)+` matches "ab", "abab"
`{}`	Quantifier	`a{2,4}` matches "aa", "aaa", "aaaa"

Character Classes

Basic Character Classes

[abc]       # Matches 'a', 'b', or 'c'
[a-z]       # Matches any lowercase letter
[A-Z]       # Matches any uppercase letter
[0-9]       # Matches any digit
[a-zA-Z]    # Matches any letter
[a-zA-Z0-9] # Matches any alphanumeric character

Negated Character Classes

[^abc]      # Matches any character except 'a', 'b', or 'c'
[^0-9]      # Matches any non-digit character

Predefined Character Classes

Class	Equivalent	Description
`\d`	`[0-9]`	Any digit
`\D`	`[^0-9]`	Any non-digit
`\w`	`[a-zA-Z0-9_]`	Any word character
`\W`	`[^a-zA-Z0-9_]`	Any non-word character
`\s`	`[ \t\r\n\f]`	Any whitespace
`\S`	`[^ \t\r\n\f]`	Any non-whitespace

Quantifiers

Basic Quantifiers

Quantifier	Meaning	Example
`*`	0 or more	`a*` matches "", "a", "aa", "aaa"
`+`	1 or more	`a+` matches "a", "aa", "aaa"
`?`	0 or 1	`a?` matches "", "a"

Specific Quantifiers

{n}         # Exactly n times
{n,}        # n or more times
{n,m}       # Between n and m times

Examples:

\d{3}       # Exactly 3 digits
\d{3,}      # 3 or more digits
\d{3,5}     # Between 3 and 5 digits

Greedy vs Non-Greedy

.*          # Greedy: matches as much as possible
.*?         # Non-greedy: matches as little as possible
.+?         # Non-greedy: one or more, but as few as possible

Anchors

Position Anchors

Anchor	Meaning	Example
`^`	Start of string/line	`^Hello`
`$`	End of string/line	`world$`
`\b`	Word boundary	`\bword\b`
`\B`	Non-word boundary	`\Bword\B`

Examples

^Hello$     # Matches exactly "Hello"
\bcat\b     # Matches "cat" as a whole word
\d+$        # Matches digits at end of string

Groups and Capturing

Basic Groups

(abc)       # Capturing group
(?:abc)     # Non-capturing group

Backreferences

(hello) \1  # Matches "hello hello"
(\w+) \1    # Matches repeated words like "the the"

Named Groups

(?<name>\w+)        # Named group (some flavors)
(?P<name>\w+)       # Named group (Python)

Lookaheads and Lookbehinds

Positive Lookahead

\d+(?=px)   # Matches digits followed by "px"

Negative Lookahead

\d+(?!px)   # Matches digits NOT followed by "px"

Positive Lookbehind

(?<=\$)\d+  # Matches digits preceded by "$"

Negative Lookbehind

(?<!\$)\d+  # Matches digits NOT preceded by "$"

Flags/Modifiers

Flag	Meaning	Example
`i`	Case insensitive	`/hello/i` matches "Hello"
`g`	Global (find all matches)	`/cat/g` finds all "cat"
`m`	Multiline	`^` and `$` match line breaks
`s`	Dot matches newline	`.` includes `\n`
`x`	Extended (ignore whitespace)	Allows comments in regex

Common Patterns

Email Validation

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Phone Number (US)

^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$

URL

^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$

Password Strength

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

At least 8 characters
At least one lowercase letter
At least one uppercase letter
At least one digit
At least one special character

IP Address

^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

Date (MM/DD/YYYY)

^(0[1-9]|1[0-2])\/(0[1-9]|[12][0-9]|3[01])\/\d{4}$

HTML Tags

<\/?[\w\s]*>|<.+[\W]>

Examples

Extract Information

# Extract email addresses
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

# Extract phone numbers
\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}

# Extract hashtags
#\w+

# Extract URLs
https?:\/\/[^\s]+

Validation

# Validate credit card (basic)
^\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}$

# Validate time (24-hour)
^([01]?[0-9]|2[0-3]):[0-5][0-9]$

# Validate hex color
^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$

Text Processing

# Remove extra whitespace
\s+

# Split on punctuation
[.!?]+

# Find repeated words
\b(\w+)\s+\1\b

# Match quoted strings
"[^"]*"

Best Practices

1. Keep It Simple

Start with simple patterns and build complexity gradually
Break complex patterns into smaller, testable parts

2. Use Character Classes

# Good
[0-9]

# Less clear
(0|1|2|3|4|5|6|7|8|9)

3. Escape Special Characters

# To match literal dots
\\.

# To match literal brackets
\[|\]

4. Use Non-Greedy Quantifiers When Needed

# Greedy (may capture too much)
<.*>

# Non-greedy (better for HTML tags)
<.*?>

5. Optimize Performance

Use anchors (^, $) when possible
Avoid excessive backtracking
Use atomic groups for performance-critical applications

6. Comment Complex Patterns

# Using extended mode (x flag)
(?x)
^                    # Start of string
(?=.*[a-z])         # Must contain lowercase
(?=.*[A-Z])         # Must contain uppercase
(?=.*\d)            # Must contain digit
.{8,}               # At least 8 characters
$                   # End of string

7. Test Thoroughly

Test with both matching and non-matching cases
Consider edge cases (empty strings, special characters)
Validate against real-world data

Tools and Testing

Online Regex Testers

Regex101 (regex101.com) - Comprehensive with explanations
RegExr (regexr.com) - Visual regex builder
RegexPal (regexpal.com) - Simple testing

Programming Language Integration

JavaScript

const pattern = /\d+/g;
const text = "abc 123 def 456";
const matches = text.match(pattern); // ["123", "456"]

Python

import re
pattern = r'\d+'
text = "abc 123 def 456"
matches = re.findall(pattern, text)  # ['123', '456']

PHP

$pattern = '/\d+/';
$text = "abc 123 def 456";
preg_match_all($pattern, $text, $matches);

Common Pitfalls

Catastrophic backtracking - Avoid nested quantifiers
Greediness - Use non-greedy quantifiers when appropriate
Case sensitivity - Remember to use case-insensitive flag when needed
Escaping - Always escape special characters in literals
Testing - Always test with edge cases

This guide provides a comprehensive foundation for working with regular expressions. Practice with real examples and gradually build complexity as you become more comfortable with the syntax and concepts.

Table of Contents

Introduction

When to Use Regex

Basic Syntax

Literal Characters

Case Sensitivity

Metacharacters

Character Classes

Basic Character Classes

Negated Character Classes

Predefined Character Classes

Quantifiers

Basic Quantifiers

Specific Quantifiers

Greedy vs Non-Greedy

Anchors

Position Anchors

Examples

Groups and Capturing

Basic Groups

Backreferences

Named Groups

Lookaheads and Lookbehinds

Positive Lookahead

Negative Lookahead

Positive Lookbehind

Negative Lookbehind

Flags/Modifiers

Common Patterns

Email Validation

Phone Number (US)

URL

Password Strength

IP Address

Date (MM/DD/YYYY)

HTML Tags

Examples

Extract Information

Validation

Text Processing

Best Practices

1. Keep It Simple

2. Use Character Classes

3. Escape Special Characters

4. Use Non-Greedy Quantifiers When Needed

5. Optimize Performance

6. Comment Complex Patterns

7. Test Thoroughly

Tools and Testing

Online Regex Testers

Programming Language Integration

JavaScript

Python

PHP

Common Pitfalls