Skip to main content

Awk

·1 min

awk is a language for text parsing and manipulation, it’s often used as a tokenizer in bash pipes but it can do a lot more of that

before starting i write down the most common use case of awk

some_command_that_prints_on_stdout | awk -F'[SEPARATOR]'  '{print $[FIELD]}'

Processing model #

awk starts by loading user defined functions than execute BEGIN block that process text one record at a time (default behavior is line filter) :

flowchart TD A[load functions] B[initial setup\n by running the BEGIN block] C[read line] D[process line] E[clean up\by running END block] A --> B --> C C --> D D --> C D --> E

Syntax #

Blocks are delimited by {} each line contains an isntruction, instructoins can be separated by ;

BEGIN{  }
{  }
END{  }

Regex filters and ~ operator #

lines can be regex parsed using a variable with a regular expression and then filter the input using the ~ operator

BEGIN{
filter="REGEX"
}


$0 ~ filter{
    # operation on matched records
}

Match function #

match regex element and put beckrefs in an array

    match($0, /.* — (.*) ft\. .*/, arr)
    print arr[1]

Oneliners #

  • print all token except first one
awk '{$1=""; print $0}'