Skip to content

5.String manipulation

This is not exactly Awk specific, but it is helpful to know how one can manipulate string fields.

Quite often, you have files with such names, that are alphabetically ordered and it is a bit tricky to put them in order again. Lets assume we have the following data in a file. The lines contain filenames that have common pattern and number that we are interested. Let's "extract" the alternating number i.e 1, 10, 123, ...

names.dat
H2O-1_1.res
H2O-1_10.res
H2O-1_123.res
H2O-1_21.res
H2O-1_44.res
H2O-1_5.res
H2O-1_7.res

One possible way, is by removing the pattern that remains unchanged. (BASH can do it as well, way easier with the file names than the lines in a file)

$ awk '{ gsub( "H2O-1_|.res", "" , $1); print $1 }' names.dat

This is my preferred way, since it is rather easy to understand and modify.

Another way, could be by using backreferences the way sed will do it.

$ awk '{ print gensub(/H2O-1_(.*).res/,"\\1","g")}' names.dat

How about we fix the annoyance with the alphabetic sort by replacing 1 with 001, 5 with 005, 10 -> 010, etc.

$ awk '{ gsub("H2O-1_|.res","",$1); printf("H2O-1_%03g.res\n", $1) }' names.dat

H2O-1_001.res
H2O-1_010.res
H2O-1_123.res
H2O-1_021.res
H2O-1_044.res
H2O-1_005.res
H2O-1_007.res

Now one can pipe to sort and get them sorted. You can sort them in awk as well but the OVERHEAD does not worth the effort.

Have a look on other string manipulation functions in awk.
printf function is pretty much the same as in many other languages.