awk Advanced Examples
awk is a powerful text processing language. Here are some advanced use cases:
1. Advanced Field Processing
Input (grades.csv):
Alice,Math,85,A,Science
Bob,Math,92,A,Arts
Charlie,Math,78,B,Science
Command:
# Calculate average score for students with grade A
awk -F',' '$5 == "A" {sum+=$3; count++}
END {printf "Average: %.2f\n", sum/count}' grades.csv
Output:
Average: 88.50
2. Pattern Ranges
Input (input.txt):
START
Configuration:
Version: 2.4.5
Timeout: 30
END
Logs:
Command:
# Print lines between START and END markers
awk '/START/,/END/' input.txt
Output:
START
Configuration:
Version: 2.4.5
Timeout: 30
END
3. Associative Arrays
Input (file.txt):
The quick Brown fox jumps over the lazy dog. Brown dog!
Command:
# Count word frequency (case-insensitive)
BEGIN { IGNORECASE=1 }
{
gsub(/[^[:alnum:]_]/," ") # Remove punctuation
for(i=1;i<=NF;i++) {
words[$i]++
}
}
END {
for(word in words)
print word, words[word]
}
awk -f word_frequency.awk file.txt
Output:
brown 2
dog 2
fox 1
jumps 1
lazy 1
over 1
quick 1
the 2
4. Advanced Math Operations
Input (numbers.txt):
15
20
25
18
Command:
# Calculate standard deviation
{
sum += $1
sumsq += $1^2
count++
}
END {
mean = sum/count
print "Std Dev:", sqrt(sumsq/count - mean^2)
}
awk -f std_dev.awk numbers.txt
Output:
Std Dev: 4.0311
5. Text Transformation
Input (employees.csv):
name,age,email
John Doe,32,[email protected]
Jane Smith,28,[email protected]
Command:
# Convert CSV to JSON
BEGIN { FS=","; print "[" }
NR>1 {
printf " {\n"
printf " \"name\": \"%s\",\n", $1
printf " \"age\": %d,\n", $2
printf " \"email\": \"%s\"\n", $3
printf " }%s\n", (NR==FNR ? "" : ",")
}
END { print "]" }
Output:
[
{
"name": "John Doe",
"age": 32,
"email": "[email protected]"
},
{
"name": "Jane Smith",
"age": 28,
"email": "[email protected]"
}
]
6. Multi-file Processing
File1.txt:
Apple
Banana
Orange
Grape
File2.txt:
Apple
Berry
Orange
Command:
# Compare two files line by line
{
if (getline < "file2.txt" > 0) {
if ($0 != $0)
print "Difference at line", NR
}
else
print "Extra line in file1:", $0
}
awk -f compare.awk file1.txt
Output:
Difference at line 2
Extra line in file1: Grape
7. Advanced String Functions
Input:
[email protected]
[email protected]
Command:
# Extract domain from email addresses
{
match($0, /@[[:alnum:].-]+/)
print substr($0, RSTART+1, RLENGTH-1)
}
Output:
domain.com
server-01.local
8. Custom Functions
Input (numbers.csv):
5
3
7
Command:
# Define and use custom function
function factorial(n) {
return (n <= 1) ? 1 : n * factorial(n-1)
}
{ print $1 "! = " factorial($1) }
awk -f factorial.awk numbers.csv
Output:
5! = 120
3! = 6
7! = 5040
9. Bitwise Operations
Input:
192.168.1.1
10.0.0.255
Command:
# Convert IP address to integer
{
split($1, octets, ".")
ip_int = lshift(octets[1],24) + lshift(octets[2],16) + \
lshift(octets[3],8) + octets[4]
print ip_int
}
echo "192.168.1.1" | awk -f ip2int.awk
echo "10.0.0.255" | awk -f ip2int.awk
Output:
3232235777
167772415
10. Advanced Output Formatting
Input (sales.csv):
Alice Johnson,25000
Bob Chen,18450
Maria Gonzalez,36700
Command:
# Generate formatted report
BEGIN {
printf "%-20s %10s %10s\n", "Name", "Sales", "Commission"
print "--------------------------------------"
}
{
comm = $2 * 0.15
printf "%-20s $%'9.2f $%'9.2f\n", $1, $2, comm
}
awk -f sales_report.awk sales.csv
Output:
Name Sales Commission
--------------------------------------
Alice Johnson $25,000.00 $3,750.00
Bob Chen $18,450.00 $2,767.50
Maria Gonzalez $36,700.00 $5,505.00
Key Advanced Features:
- Built-in functions:
gsub()
,gensub()
,asort()
,mktime()
- Two-way communication with system commands
- Bit manipulation functions
- Time and date processing
- User-defined namespace functions
- TCP/IP networking (gawk extension)
- Profiling and pretty-printing
awk becomes particularly powerful when combined with shell scripting and other Unix tools through pipes. For maximum efficiency, use awk's built-in string functions and avoid calling external processes when possible.