Go to the first, previous, next, last section, table of contents.


Comparison Expressions

Comparison expressions compare strings or numbers for relationships such as equality. They are written using relational operators, which are a superset of those in C. Here is a table of them:

x < y
True if x is less than y.
x <= y
True if x is less than or equal to y.
x > y
True if x is greater than y.
x >= y
True if x is greater than or equal to y.
x == y
True if x is equal to y.
x != y
True if x is not equal to y.
x ~ y
True if the string x matches the regexp denoted by y.
x !~ y
True if the string x does not match the regexp denoted by y.
subscript in array
True if array array has an element with the subscript subscript.

Comparison expressions have the value 1 if true and 0 if false.

The rules gawk uses for performing comparisons are based on those in draft 11.2 of the POSIX standard. The POSIX standard introduced the concept of a numeric string, which is simply a string that looks like a number, for example, " +2".

When performing a relational operation, gawk considers the type of an operand to be the type it received on its last assignment, rather than the type of its last use (see section Numeric and String Values). This type is unknown when the operand is from an "external" source: field variables, command line arguments, array elements resulting from a split operation, and the value of an ENVIRON element. In this case only, if the operand is a numeric string, then it is considered to be of both string type and numeric type. If at least one operand of a comparison is of string type only, then a string comparison is performed. Any numeric operand will be converted to a string using the value of CONVFMT (see section Conversion of Strings and Numbers). If one operand of a comparison is numeric, and the other operand is either numeric or both numeric and string, then awk does a numeric comparison. If both operands have both types, then the comparison is numeric. Strings are compared by comparing the first character of each, then the second character of each, and so on. Thus "10" is less than "9". If there are two strings where one is a prefix of the other, the shorter string is less than the longer one. Thus "abc" is less than "abcd".

Here are some sample expressions, how awk compares them, and what the result of the comparison is.

1.5 <= 2.0
numeric comparison (true)
"abc" >= "xyz"
string comparison (false)
1.5 != " +2"
string comparison (true)
"1e2" < "3"
string comparison (true)
a = 2; b = "2"
a == b
string comparison (true)
echo 1e2 3 | awk '{ print ($1 < $2) ? "true" : "false" }'

prints `false' since both $1 and $2 are numeric strings and thus have both string and numeric types, thus dictating a numeric comparison.

The purpose of the comparison rules and the use of numeric strings is to attempt to produce the behavior that is "least surprising," while still "doing the right thing."

String comparisons and regular expression comparisons are very different. For example,

$1 == "foo"

has the value of 1, or is true, if the first field of the current input record is precisely `foo'. By contrast,

$1 ~ /foo/

has the value 1 if the first field contains `foo', such as `foobar'.

The right hand operand of the `~' and `!~' operators may be either a constant regexp (/.../), or it may be an ordinary expression, in which case the value of the expression as a string is a dynamic regexp (see section How to Use Regular Expressions).

In very recent implementations of awk, a constant regular expression in slashes by itself is also an expression. The regexp /regexp/ is an abbreviation for this comparison expression:

$0 ~ /regexp/

In some contexts it may be necessary to write parentheses around the regexp to avoid confusing the awk parser. For example, (/x/ - /y/) > threshold is not allowed, but ((/x/) - (/y/)) > threshold parses properly.

One special place where /foo/ is not an abbreviation for $0 ~ /foo/ is when it is the right-hand operand of `~' or `!~'! See section Constant Expressions, where this is discussed in more detail.


Go to the first, previous, next, last section, table of contents.