Skip to content

Advanced data analysis ****

You are given a file with numbers on each row - 5 in this case.

data1

1 2 3 4 5
1 3 5 7 9
1 2 4 5 0
2 6 7 8 9

Then you are given 5 numbers (let's say "1, 3, 5, 6 and 7") and you want to find how many of these numbers are matching a number on each line - think like you are about to check your lottery tickets ;-)

The solution bellow is using an "associative arrays" trick to make it easier to loop over the reference numbers.

Possible solution

Not very elegant but illustrates nicely a convenient use of associated arrays as list - if ($i in n) :

awk 'BEGIN{n[1]=n[3]=n[5]=n[6]=n[7]=1} {count=0; for (i=1;i<=NF;i++){if ($i in n) count++} print count} ' data1

Can you improve the script so it could pick up the numbers from the first line, i.e. the winning numbers are on the first line?

data2

1 3 5 6 7
1 2 3 4 5
1 3 5 7 9
1 2 4 5 0
2 6 7 8 9
Possible solution
awk 'NR==1 {for (i=1;i<=NF;i++) n[$i]=1}    NR>1 {count=0; for (i=1;i<=NF;i++){if ($i in n) count++} print count} ' data2

a bit more readable in a script

#!/bin/awk -f
NR==1 {for (i=1; i<=NF; i++) n[$i]=1}    
NR>1   {count=0; for (i=1; i<=NF; i++) {if ($i in n) count++} print count}