Skip to content

2.Teasing with grep

The most common use is using regular expressions. The construction below will search for "pattern" on each line.

$ awk ' /pattern/ {action} ' file1 file2 ... fileN

What about patter matching only particular field? Here is an example of such matching criterium that targets only the first field/column.

$ awk ' $1 ~ /pattern/ {action} ' file1 file2 ... fileN

The ! will negate the result.

$ awk ' ! /pattern/ {action} ' file1 file2 ... fileN

One can match exact text of a field. Matching is case sensitive.

$ awk ' $1 == "text" {action} ' file1 file2 ... fileN

Here is an example of of arithmetic comparison for different fields. In this example the script will match lines with value in the first field larger than 10 AND the second smaller that 7. && is the logical AND, || is for OR.

$ awk ' $1 > 10  &&  $2 < 7 {action} ' file1 file2 ... fileN

You can compare between different fields as well...

$ awk ' ($1+$2 > 10)  &&  ($3 < $4) {action} ' file1 file2 ... fileN

There are two reserved keywords: BEGIN and END. Here is how it works.

$ awk ' BEGIN {action_B}  /pattern/ {action}   END {action_E}' file

BEGIN block gives you the opportunity to do something before you even start reading the file {action_B}. This is perfect for variable initialization or gathering data from another files that is needed for the current script.

Then, awk will run {action} for each line that matches /pattern/.

At the END awk will run {action_E}. Perfect to print the collected data.

Here is a summary of the awk workflow logic.

  $ awk ' BEGIN {action_B}     
         /pattern1/ {action1} 
         /pattern2/ {action2} 
         END {action_E}       
   ' file1 file2 ... fileN
Exercises
  • Can you add a header # metal | weight in ounces | date minted | country of origin | description for the output of the coins older than 1986? Use this shorter # header in the beginning, until you get it working.
  • What will happen if you do not provide file as input to the above exercise?

And here is the teaser ;-). Perhaps I am not very familiar with grep but here is a simple example that is rather tricky to do with grep.

1 2 4 3 5 6
5 3 5 6 8 2
4 3 1 5 7 8
2 5 6 1 9 0
1 4 5 6 7 8

Remove all lines that contain "2" and "3" in any order. These lines are 1 and 2.

Possible way to do it:

$ grep -v "2.*3\|3.*2" numbers.dat

which will grep for the two possible combinations when "2" is before "3" and the opposite case. This works but already presents us with the problem. What happens if we search for more than two matches? One neds to permutate all the possible configurations...

Here is how can be done with awk.

$ awk '!(/2/ && /3/) {print $0}' numbers.dat 

!(/2/ && /3/) let's decrypt the matching rule. && stands for "and" ! negates the logical result

(/2/ && /3/) matches any line that contains 2 and 3, then ! negates the result- line that does not contain them - if so, print them.

Awk has a bit more elaborate matching capabilities, not mentioning the ability of arithmetical operations in the matching criteria expressions.