underscore is a "word char"

Tips on writing regular expressions for searching the post list

Moderators: Quade, dexter

underscore is a "word char"

Postby bobkoure » Thu Nov 08, 2007 12:54 pm

Well, I did a dumb thing and assumed that '_' was a non-word character (and so not in \w but in \W.
Bzzzt - wrong!

This means that, if say I was trying to match variants of, say "yes mom" (a public domain book name I've just made up) I'd be trying to catch things like
yes mom
yes-mom
yes_mom
_yes_mom_
yes--mom
yesmom
and trying to not catch things like
yesterday my mom said

So... I'd been using \b (word boundary) and \W (not in \w) to build
\byes\W+mom\b

which turns out to work fine for everything except
yes_mom
_yes_mom_

Because it's seen as a word char, which means it's not in \W and \b doesn't find the "boundary" between a-z and '_'

so, I should have used
\b[\W_]*yes[\W_]*mom[\W_]*\b

Sigh...
bobkoure
 

Return to Regular Expressions

Who is online

Users browsing this forum: No registered users and 5 guests