Unix Text Processing Tools Quickstart Guide

Basic File Viewing

head - View beginning of files

head file.txt          # Show first 10 lines
head -n 5 file.txt     # Show first 5 lines
head -c 20 file.txt    # Show first 20 bytes

tail - View end of files

tail file.txt          # Show last 10 lines
tail -n 15 file.txt    # Show last 15 lines
tail -f logfile.log    # Follow (watch) file in real-time

File Contents Manipulation

cat - Concatenate and display files

cat file.txt           # Display entire file
cat -n file.txt        # Show line numbers
cat file1 file2 > combined.txt  # Combine files

less/more - Pager programs

less largefile.log     # Scroll with arrows/page up-down (q to quit)
more largefile.log     # Basic pager (space for next page)

Key differences:

Feature less more
Navigation Bidirectional (↑/↓, PgUp/PgDn) Forward-only (spacebar)
Exit behavior Stays open after reaching EOF Auto-exits at EOF
Search Supports regex search (/) Basic search
File modification Can follow growing files Static view
Memory usage More efficient with large files Simpler implementation
Scrolling Percentage shown, line numbers Basic line count

When to use which:

  • Use less for most interactive viewing (modern default)
  • Use more for simple forward-only viewing
  • Both support: q to quit, / to search (but less has better search)

Text Processing

wc - Word count

wc file.txt            # Lines, words, characters count
wc -l file.txt         # Count lines only

grep - Pattern searching

grep "error" log.txt           # Search for 'error'
grep -i "warning" log.txt      # Case-insensitive search
grep -v "debug" log.txt        # Invert match (exclude lines)
grep -r "pattern" directory/   # Recursive search

sort - Sort lines

sort file.txt          # Alphabetical sort
sort -n data.txt       # Numerical sort
sort -u file.txt       # Unique sort (remove duplicates)

uniq - Report/omit repeated lines

uniq file.txt          # Remove consecutive duplicates
uniq -c file.txt       # Count occurrences

cut - Remove sections from lines

cut -d',' -f2 data.csv     # Extract second column using comma delimiter
cut -c1-5 file.txt         # Extract first 5 characters

awk - Pattern scanning/processing

awk '{print $1}' file.txt       # Print first column
awk -F: '{print $3}' /etc/passwd  # Split on colon
awk 'NR > 5 && NR < 10' file.txt  # Show lines 6-9

Combining Commands (Pipes)

# Common pipeline example:
grep "ERROR" log.txt | cut -d' ' -f3- | sort | uniq -c | head -n 20

# Breakdown:
# 1. Find lines with "ERROR"
# 2. Extract from 3rd field to end
# 3. Sort results
# 4. Count unique errors
# 5. Show top 20

Tips & Tricks

  1. Combine head/tail: head -n 20 file.txt | tail -n 5 shows lines 16-20
  2. Monitor growing file: tail -f access.log | grep "404"
  3. Count CSV rows: wc -l data.csv
  4. Find unique IPs in logs: cut -d' ' -f1 access.log | sort -u
  5. Sum numbers in column: awk '{sum+=$3} END {print sum}' data.txt