awk Advanced Examples

awk is a powerful text processing language. Here are some advanced use cases:

1. Advanced Field Processing

Input (grades.csv):

Alice,Math,85,A,Science
Bob,Math,92,A,Arts
Charlie,Math,78,B,Science

Command:

# Calculate average score for students with grade A
awk -F',' '$5 == "A" {sum+=$3; count++} 
          END {printf "Average: %.2f\n", sum/count}' grades.csv

Output:

Average: 88.50

2. Pattern Ranges

Input (input.txt):

START
Configuration:
Version: 2.4.5
Timeout: 30
END
Logs:

Command:

# Print lines between START and END markers
awk '/START/,/END/' input.txt

Output:

START
Configuration:
Version: 2.4.5
Timeout: 30
END

3. Associative Arrays

Input (file.txt):

The quick Brown fox jumps over the lazy dog. Brown dog!

Command:

# Count word frequency (case-insensitive)
BEGIN { IGNORECASE=1 }
{
    gsub(/[^[:alnum:]_]/," ")  # Remove punctuation
    for(i=1;i<=NF;i++) {
        words[$i]++
    }
}
END {
    for(word in words) 
        print word, words[word]
}
awk -f word_frequency.awk file.txt

Output:

brown   2
dog     2
fox     1
jumps   1
lazy    1
over    1
quick   1
the     2

4. Advanced Math Operations

Input (numbers.txt):

15
20
25
18

Command:

# Calculate standard deviation
{
    sum += $1
    sumsq += $1^2
    count++
}
END {
    mean = sum/count
    print "Std Dev:", sqrt(sumsq/count - mean^2)
}
awk -f std_dev.awk numbers.txt

Output:

Std Dev: 4.0311

5. Text Transformation

Input (employees.csv):

name,age,email
John Doe,32,[email protected]
Jane Smith,28,[email protected]

Command:

# Convert CSV to JSON
BEGIN { FS=","; print "[" }
NR>1 {
    printf "  {\n"
    printf "    \"name\": \"%s\",\n", $1
    printf "    \"age\": %d,\n", $2
    printf "    \"email\": \"%s\"\n", $3
    printf "  }%s\n", (NR==FNR ? "" : ",")
}
END { print "]" }

Output:

[
  {
    "name": "John Doe",
    "age": 32,
    "email": "[email protected]"
  },
  {
    "name": "Jane Smith",
    "age": 28,
    "email": "[email protected]"
  }
]

6. Multi-file Processing

File1.txt:

Apple
Banana
Orange
Grape

File2.txt:

Apple
Berry
Orange

Command:

# Compare two files line by line
{
    if (getline < "file2.txt" > 0) {
        if ($0 != $0) 
            print "Difference at line", NR
    }
    else 
        print "Extra line in file1:", $0
}
awk -f compare.awk file1.txt

Output:

Difference at line 2
Extra line in file1: Grape

7. Advanced String Functions

Input:

[email protected]
[email protected]

Command:

# Extract domain from email addresses
{
    match($0, /@[[:alnum:].-]+/)
    print substr($0, RSTART+1, RLENGTH-1)
}

Output:

domain.com
server-01.local

8. Custom Functions

Input (numbers.csv):

5
3
7

Command:

# Define and use custom function
function factorial(n) {
    return (n <= 1) ? 1 : n * factorial(n-1)
}
{ print $1 "! = " factorial($1) }
awk -f factorial.awk numbers.csv

Output:

5! = 120
3! = 6
7! = 5040

9. Bitwise Operations

Input:

192.168.1.1
10.0.0.255

Command:

# Convert IP address to integer
{
    split($1, octets, ".")
    ip_int = lshift(octets[1],24) + lshift(octets[2],16) + \
             lshift(octets[3],8) + octets[4]
    print ip_int
}
echo "192.168.1.1" | awk -f ip2int.awk
echo "10.0.0.255" | awk -f ip2int.awk

Output:

3232235777
167772415

10. Advanced Output Formatting

Input (sales.csv):

Alice Johnson,25000
Bob Chen,18450
Maria Gonzalez,36700

Command:

# Generate formatted report
BEGIN {
    printf "%-20s %10s %10s\n", "Name", "Sales", "Commission"
    print "--------------------------------------"
}
{
    comm = $2 * 0.15
    printf "%-20s $%'9.2f $%'9.2f\n", $1, $2, comm
}
awk -f sales_report.awk sales.csv

Output:

Name                 Sales   Commission
--------------------------------------
Alice Johnson      $25,000.00   $3,750.00
Bob Chen           $18,450.00   $2,767.50
Maria Gonzalez     $36,700.00   $5,505.00

Key Advanced Features:

  • Built-in functions: gsub(), gensub(), asort(), mktime()
  • Two-way communication with system commands
  • Bit manipulation functions
  • Time and date processing
  • User-defined namespace functions
  • TCP/IP networking (gawk extension)
  • Profiling and pretty-printing

awk becomes particularly powerful when combined with shell scripting and other Unix tools through pipes. For maximum efficiency, use awk's built-in string functions and avoid calling external processes when possible.