Skip to content

Multiple files ...

... or how to apply different rules for different input files.

Common practice - NR and FNR

Perhaps one of the most common approach is utilizing the internally predefined variables NR and FNR.

To illustrate this, we can create two files with some distinct data. You can run seq -f "File_1 - Line_%g" 1 3 and seq -f "File_2 - Line_%g" 1 3 and redirect the output to two separate files.

$ seq -f "File_1 - Line_%g" 1 3 
File_1 - Line_1
File_1 - Line_2
File_1 - Line_3

$ seq -f "File_2 - Line_%g" 1 3 
File_2 - Line_1
File_2 - Line_2
File_2 - Line_3

Or use anonymous pipeline to run directly this example:

awk '{print "FNR: "FNR" NR: "NR"  => "$0}' <(seq -f "File_1 - Line_%g" 1 3) <(seq -f "File_2 - Line_%g" 1 3)
FNR: 1 NR: 1  => File_1 - Line_1
FNR: 2 NR: 2  => File_1 - Line_2
FNR: 3 NR: 3  => File_1 - Line_3
FNR: 1 NR: 4  => File_2 - Line_1
FNR: 2 NR: 5  => File_2 - Line_2
FNR: 3 NR: 6  => File_2 - Line_3

Note that FNR keeps the line number in the file, while NR keeps incrementing - i.e. referring to the line number irrelevant of the file.

It is rather common (in the awk community) to treat differently the first file which usually is realized by adding check for NR==FNR that is true only for the first file.

...
NR==FNR { commands } # runs only on the first file
NR!=FNR { commands } # runs on any but the first file
        { commands } # runs on all files
...

FILENAME, ARGIND

  • FILENAME - The name of the current input file. When no data files are listed on the command line, awk reads from the standard input and FILENAME is set to "-". FILENAME changes each time a new file is read.
  • ARGIND - Every time gawk opens a new data file for processing, it sets ARGIND to the index in ARGV of the file name. When gawk is processing the input files, ‘FILENAME == ARGV[ARGIND]’ is always true.

Warning

ARGIND is not working under OS X - you need gawk for this.