5.String manipulation
This is not exactly Awk specific, but it is helpful to know how one can manipulate string fields.
Quite often, you have files with such names, that are alphabetically ordered and it is a bit tricky to put them in order again. Lets assume we have the following data in a file. The lines contain filenames that have common pattern and number that we are interested. Let's "extract" the alternating number i.e 1, 10, 123, ...
H2O-1_1.res
H2O-1_10.res
H2O-1_123.res
H2O-1_21.res
H2O-1_44.res
H2O-1_5.res
H2O-1_7.res
One possible way, is by removing the pattern that remains unchanged. (BASH can do it as well, way easier with the file names than the lines in a file)
$ awk '{ gsub( "H2O-1_|.res", "" , $1); print $1 }' names.dat
This is my preferred way, since it is rather easy to understand and modify.
Another way, could be by using backreferences the way sed will do it.
$ awk '{ print gensub(/H2O-1_(.*).res/,"\\1","g")}' names.dat
How about we fix the annoyance with the alphabetic sort by replacing 1 with 001, 5 with 005, 10 -> 010, etc.
$ awk '{ gsub("H2O-1_|.res","",$1); printf("H2O-1_%03g.res\n", $1) }' names.dat
H2O-1_001.res
H2O-1_010.res
H2O-1_123.res
H2O-1_021.res
H2O-1_044.res
H2O-1_005.res
H2O-1_007.res
Now one can pipe to sort
and get them sorted. You can sort them in awk as well but the OVERHEAD does not worth the effort.
Have a look on other string manipulation functions in awk.
printf
function is pretty much the same as in many other languages.