Skip to content

04.Data manipulation **

You have 2 files containing results from two similar experiments.
You want to calculate the difference between the numbers in the second columns.

0.0   0.00
0.1   1.23
0.2   1.34
0.3   1.67
0.4   2.34
0.5   3.17
0.0   0.00
0.1   1.25
0.2   1.24
0.3   1.61
0.4   2.44
0.5   3.27

Try to use awk and any other tools to produce a file that has the results from both files and the difference between the second column in last column in the new file i.e.

0.0  0.00  0.00  0
0.1  1.23  1.25  -0.02
0.2  1.34  1.24  0.1
0.3  1.67  1.61  0.06
0.4  2.34  2.44  -0.1
0.5  3.17  3.27  -0.1
Possible solutions:
$ paste 1.dat 2.dat | awk '{print $1,$2,$4,$2-$4}'

Note: This will not work under OS X

$ awk 'ARGIND==1 {x1=$1;y1=$2; getline < ARGV[2]; printf("%g  %g  %g  %g  %g\n",x1,y1,$1,$2,y1-$2);}' 1.dat 2.dat

Note: This might work under OS X

$ awk 'FILENAME=="1.dat" {x1=$1;y1=$2; getline < ARGV[2]; printf("%g  %g  %g  %g  %g\n",x1,y1,$1,$2,y1-$2);}' 1.dat 2.dat

$ awk 'BEGIN{ while (getline < ARGV[1]) {x1=$1;y1=$2; getline < ARGV[2]; printf("%g  %g  %g  %g  %g\n",x1,y1,$1,$2,y1-$2);}}' 1.dat 2.dat
# contribution by Johan Zvrskovec 2020.08.28
awk 'NR==FNR{n1[$1]=$2;} NR!=FNR{print ($1,n1[$1],$2,$2-n1[$1])}' 1.dat 2.dat
gnuplot tips

Here is an example that visualizes the newly tabulated data.

$ gnuplot
plot "< paste 1.dat 2.dat | awk '{print $1,$2,$4,$2-$4}' " u 1:2 w l, "" u 1:3 w l, "" u 1:2:4  w labels
image1

Or even simpler, since gnuplot can make some calculations on the fly.

$ gnuplot
plot "< paste 1.dat 2.dat " u 1:2 w l, "" u 1:4 w l, "" u 1:2:($2-4)  w labels

Trimming data

Let's have some data file that that contains 60 lines with one number per line. We can think that this is a result of some measurement at every one minute. You can create a file with name 'rnd.dat' containing 60 lines with randomly generated numbers between 0 and 1 with the following awk script (redirect the output to a file).

$ awk 'BEGIN{ for(i=1;i<=60;i++) print rand() }'

0.924046
0.593909
0.306394
...
0.160044
0.438791
0.98955

Note that it is not coincidence that you have generated the same numbers as on this page. It becomes easier to check against the results in the suggested solutions. To randomize the random seed you need to add srand() in the script like this: awk 'BEGIN{ srand(); for(i=1;i<=60;i++) print rand() }'

Can you print/extract the values for every 5th minute/line?

Possible solution
awk 'NR%5 == 0 {print $0}' rnd.dat

0.740133
0.100887
0.426298
0.119752
0.399686
0.956569
0.984306
0.173531
0.84918
0.433507
0.884823
0.98955

Can you print the average for each 5 rows of the data?

Possible solution
awk '{s=s+$1} NR%5 == 0 {print $0/5;s=0}' rnd.dat

0.148027
0.0201774
0.0852596
0.0239504
0.0799372
0.191314
0.196861
0.0347062
0.169836
0.0867014
0.176965
0.19791