A piggy bank of commands, fixes, succinct reviews, some mini articles and technical opinions from a (mostly) Perl developer.

Regex anchors in Perl

A lot of the time it seems that Perl programmers write a regex like

/^foo$/

(where “foo” is some arbitrary regex pattern)

but from the context it seems like the intention of the programmer was to make it so that the string to be searched must match the pattern “foo”, and there must be nothing between the beginning of “foo” and the beginning of the string to be searched, and there must be nothing between the end of “foo” and the end of the string to be searched.

But of course the regex doesn’t do that.

The metacharacter ‘$’ matches not only at the end of the string to be searched but also just before a newline character at the end of the string to be searched. (Of course when the ‘m’ flag is specified, ‘$’ behaves differently. But I’d like to concentrate on the behaviour without the ‘m’ flag for the time being.)

So the above pattern will match “foo” and “foo\n”.

Is that what the programmer really wanted? I think in many cases not.

So how can we make the pattern match exactly at the end of the string to be searched?

The answer is to use the metacharacter ‘\z’. This matches exactly at the end of the string to be searched.

So to make a regex that matches the pattern “foo”, and with the beginning and end of the pattern bound to the end of the string to be searched, we could write:

/^foo\z/

=====
Here endeth the bit about doing the minimum to make the code correctly reflect the intention of the programmer. The rest is about style, personal preference, readability, etc.
=====

Some might say that using ‘^’ to match the beginning of the string to be searched and ‘\z’ at the end is a bit dicey because the meaning of ‘^’ is changed if the ‘m’ flag is used but the meaning of ‘\z’ isn’t. It would be nice if there was a metacharacter which exactly matched the beginning of the string to be searched, regardless of the ‘m’ flag. Fortunately there is, ‘\A’. Using that would give:

/\Afoo\z/

But because ‘\A’ ends with a letter, the regex can be a bit hard to parse if the ‘\A’ is followed by a pattern which begins with a letter. So some might say that it might be a good idea to use the ‘x’ flag to allow whitespace inside the regex. That would give

/ \A foo \z /x

Though I find it a bit hard to read when slash delimiters are combined with a few initial-backslash metacharacters, so I prefer to use a different delimiter. So I would think to use something like:

m{ \A foo \z }x

- by Bill Blunn
See also http://perldoc.perl.org/perlre.html#Regular-Expressions