Go to the first, previous, next, last section, table of contents.
The functions in this section look at or change the text of one or more strings.
index(in, find)
awk 'BEGIN { print index("peanut", "an") }'
prints `3'. If find is not found, index returns 0.
(Remember that string indices in awk start at 1.)
length(string)
length("abcde") is 5. By
contrast, length(15 * 35) works out to 3. How? Well, 15 * 35 =
525, and 525 is then converted to the string `"525"', which has
three characters.
If no argument is supplied, length returns the length of $0.
In older versions of awk, you could call the length function
without any parentheses. Doing so is marked as "deprecated" in the
POSIX standard. This means that while you can do this in your
programs, it is a feature that can eventually be removed from a future
version of the standard. Therefore, for maximal portability of your
awk programs you should always supply the parentheses.
match(string, regexp)
match function searches the string, string, for the
longest, leftmost substring matched by the regular expression,
regexp. It returns the character position, or index, of
where that substring begins (1, if it starts at the beginning of
string). If no match if found, it returns 0.
The match function sets the built-in variable RSTART to
the index. It also sets the built-in variable RLENGTH to the
length in characters of the matched substring. If no match is found,
RSTART is set to 0, and RLENGTH to -1.
For example:
awk '{
if ($1 == "FIND")
regex = $2
else {
where = match($0, regex)
if (where)
print "Match of", regex, "found at", where, "in", $0
}
}'
This program looks for lines that match the regular expression stored in
the variable regex. This regular expression can be changed. If the
first word on a line is `FIND', regex is changed to be the
second word on that line. Therefore, given:
FIND fo*bar My program was a foobar But none of it would doobar FIND Melvin JF+KM This line is property of The Reality Engineering Co. This file created by Melvin.
awk prints:
Match of fo*bar found at 18 in My program was a foobar Match of Melvin found at 26 in This file created by Melvin.
split(string, array, fieldsep)
array[1], the second piece in array[2], and so
forth. The string value of the third argument, fieldsep, is
a regexp describing where to split string (much as FS can
be a regexp describing where to split input records). If
the fieldsep is omitted, the value of FS is used.
split returns the number of elements created.
The split function, then, splits strings into pieces in a
manner similar to the way input lines are split into fields. For example:
split("auto-da-fe", a, "-")
splits the string `auto-da-fe' into three fields using `-' as the
separator. It sets the contents of the array a as follows:
a[1] = "auto" a[2] = "da" a[3] = "fe"The value returned by this call to
split is 3.
As with input field-splitting, when the value of fieldsep is
" ", leading and trailing whitespace is ignored, and the elements
are separated by runs of whitespace.
sprintf(format, expression1,...)
printf would
have printed out with the same arguments
(see section Using printf Statements for Fancier Printing).
For example:
sprintf("pi = %.2f (approx.)", 22/7)
returns the string "pi = 3.14 (approx.)".
sub(regexp, replacement, target)
sub function alters the value of target.
It searches this value, which should be a string, for the
leftmost substring matched by the regular expression, regexp,
extending this match as far as possible. Then the entire string is
changed by replacing the matched text with replacement.
The modified string becomes the new value of target.
This function is peculiar because target is not simply
used to compute a value, and not just any expression will do: it
must be a variable, field or array reference, so that sub can
store a modified value there. If this argument is omitted, then the
default is to use and alter $0.
For example:
str = "water, water, everywhere" sub(/at/, "ith", str)sets
str to "wither, water, everywhere", by replacing the
leftmost, longest occurrence of `at' with `ith'.
The sub function returns the number of substitutions made (either
one or zero).
If the special character `&' appears in replacement, it
stands for the precise substring that was matched by regexp. (If
the regexp can match more than one string, then this precise substring
may vary.) For example:
awk '{ sub(/candidate/, "& and his wife"); print }'
changes the first occurrence of `candidate' to `candidate
and his wife' on each input line.
Here is another example:
awk 'BEGIN {
str = "daabaaa"
sub(/a*/, "c&c", str)
print str
}'
prints `dcaacbaaa'. This show how `&' can represent a non-constant
string, and also illustrates the "leftmost, longest" rule.
The effect of this special character (`&') can be turned off by putting a
backslash before it in the string. As usual, to insert one backslash in
the string, you must write two backslashes. Therefore, write `\\&'
in a string constant to include a literal `&' in the replacement.
For example, here is how to replace the first `|' on each line with
an `&':
awk '{ sub(/\|/, "\\&"); print }'
Note: as mentioned above, the third argument to sub must
be an lvalue. Some versions of awk allow the third argument to
be an expression which is not an lvalue. In such a case, sub
would still search for the pattern and return 0 or 1, but the result of
the substitution (if any) would be thrown away because there is no place
to put it. Such versions of awk accept expressions like
this:
sub(/USA/, "United States", "the USA and Canada")But that is considered erroneous in
gawk.
gsub(regexp, replacement, target)
sub function, except gsub replaces
all of the longest, leftmost, nonoverlapping matching
substrings it can find. The `g' in gsub stands for
"global," which means replace everywhere. For example:
awk '{ gsub(/Britain/, "United Kingdom"); print }'
replaces all occurrences of the string `Britain' with `United
Kingdom' for all input records.
The gsub function returns the number of substitutions made. If
the variable to be searched and altered, target, is
omitted, then the entire input record, $0, is used.
As in sub, the characters `&' and `\' are special, and
the third argument must be an lvalue.
substr(string, start, length)
substr("washington", 5, 3) returns "ing".
If length is not present, this function returns the whole suffix of
string that begins at character number start. For example,
substr("washington", 5) returns "ington". This is also
the case if length is greater than the number of characters remaining
in the string, counting from character number start.
tolower(string)
tolower("MiXeD cAsE 123") returns "mixed case 123".
toupper(string)
toupper("MiXeD cAsE 123") returns "MIXED CASE 123".
Go to the first, previous, next, last section, table of contents.