Quick Guide to nawk - Examples, Field Separators, Arrays
Posted on Tuesday, January 10, 2006 at 11:56 PM by Malcolm
Here is a quick guide to nawk.I prefer to use nawk over awk as it has more functionalities. Most systems now would have both programs installed. See alsoTo run nawk
- From command line : nawk 'program' inputfile1 inputfile2 …
- From a file : nawk -f programfile inputfile1 inputfile2 …
Structure of nawk program
- A nawk program can consist of three sections: nawk 'BEGIN{…}{… /* BODY */ …}{END}' inputfile
- Both 'BEGIN' and 'END' blocks are optional and are executed only once.
- The body is executed for each line in the input file.
Field Separators
- The following example adds the field '=' separator, in addition to the blank space separator : nawk 'BEGIN{FS = " *|="}{print $2}' input file.
- For example, if the input file contains the line "Total = 500", then the output will be 500.
Printing Environment Variables
- The following example appends the current path to a list of filenames/directories:
ls -alg | nawk '{print ENVIRON["$PWD"] "/" $8}' - ENVIRON is an array of environment variables index by the individual variable name.
- The variable FILENAME is a string that stores the current name of the file nawk is parsing.
Examples of usage
- To kill all the jobs of the current user : kill -9 `ps -ef | grep $LOGNAME | nawk '{print $2}'`
Multi-dimensional array
- To use 2D or multi-dimensional array, use comma to seperate the array index: matrix[3, 5] = $(i+5)
Another examples
- The example below calculates the averages for 16 items from 10 sets of readings.
- Example of an input line the program is trying to match :
Total elapsed time is 560
BEGIN{ printf("--------- Execution Time -----------\n"); item=16; set=10; } { # all new variables are initialized to 0 for(;j < set;j++) for(i=0;i < item; i++) { # skip input until the second word matches "elapsed" while($2 != "elapsed") getline; # notice the use of array without declaring its # dimension sum[i]+=$5; getline; } if(j==set){ for(i=0;i < item;i++){ # this and the next 2 lines are comments # you can use either print or printf for output # print sum[i]/set; printf("Set %d : %6.3f\n",i,sum[i]/set); } j++; } } END{ printf("-------------- End --------------"); }
Examples from the man page
- Write to the standard output all input lines for which field 3 is greater than 5:
$3 > 5 - Write every tenth line:
(NR % 10) == 0 - Write any line with a substring matching the regular expression:
/(G|D)(2[0-9][[:alpha:]]*)/ - Print any line with a substring containing a G or D, followed by a sequence of digits and characters:
/(G|D)([[:digit:][:alpha:]]*)/ - Write any line in which the second field contains a backslash:
$2 ~ /\\/ - Write any line in which the second field contains a backslash (alternate method). Note that backslash escapes are interpreted twice, once in lexical processing of the string and once in processing the regular expression.
$2 ~ "\\\\" - Write the second to the last and the last field in each line, separating the fields by a colon:
{OFS=":";print $(NF-1), $NF} - Write lines longer than 72 characters:
{length($0) > 72} - Write first two fields in opposite order separated by the OFS:
{ print $2, $1 } - Same, with input fields separated by comma or space and tab characters, or both:
BEGIN { FS = ",[\t]*|[\t]+" }{ print $2, $1 } - Add up first column, print sum and average:
{s += $1 }END{print "sum is ", s, " average is", s/NR} - Write fields in reverse order, one per line (many lines out for each line in):
{ for (i = NF; i > 0; --i) print $i } - Write all lines between occurrences of the strings "start" and "stop":
/start/, /stop/ - Write all lines whose first field is different from the previous one:
$1 != prev { print; prev = $1 } - Simulate the echo command:
BEGIN { for (i = 1; i < ARGC; ++i) printf "%s%s", ARGV[i], i==ARGC-1?"\n":""} - Write the path prefixes contained in the PATH environment variable, one per line:
BEGIN{n = split (ENVIRON["PATH"], path, ":") for (i = 1; i <= n; ++i) print path[i]}
