special characters don't really understand

Tips on writing regular expressions for searching the post list

Moderators: Quade, dexter

special characters don't really understand

Postby PNPman » Thu Mar 20, 2008 7:04 pm

i'm trying to create a reject filter to reject certain words and any special character used as a spacer

mikeslideshow
mike.slide.show
mike_slide_show
mike_slideshow
etc

Will either of these filters work and what is the difference?

mike.?slide.?show or mike[.]?slide[.]?show
User avatar
PNPman
Active Participant
Active Participant
 
Posts: 88
Joined: Fri May 18, 2001 9:29 pm

Registered Newsbin User since: 04/09/03

Postby Quade » Thu Mar 20, 2008 9:18 pm

"mike.*slide.*show" will catch all of those. It'll catch more more than that though.

How about trying that and see if it does what you want.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 45079
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Postby bobkoure » Thu Jul 10, 2008 10:05 am

If it's a single spacer char (or nothing), maybe use ? rather than *
So...
mike.?slide.?show

Or if you know what the spacer characters might be, put them in brackets rather than using the dot. So, if the spacer chars were - _ and space
mike[-_ ]?slide[-_ ]?show
bobkoure
 

Postby Kiltme » Mon Jul 28, 2008 7:50 pm

You're mixing regex and dos wildcard filename matching which is part of whats confusing.

mike.slide.show
is the regex equivalent of the dos
mike?slide?show
file name search.

mike.+slide.+show
is the regex equivalent of the dos
mike*slide*show
file name search.

mike[ \._-]slide[ \._-]show

is the filter for specific separators space, dot, underline, hyphen.
The dot needs to be escaped with \ for it to be a dot and not a wildcard

Note that none of these will match mikeslideshow (neither the regex or the dos wildcard filters).
User avatar
Kiltme
Seasoned User
Seasoned User
 
Posts: 638
Joined: Mon Jan 05, 2004 2:02 am

Registered Newsbin User since: 01/05/04

Postby bobkoure » Mon Aug 18, 2008 10:12 am

Kiltme wrote:mike.+slide.+show
is the regex equivalent of the dos
mike*slide*show
file name search.

Actually, no. In DOS, (or at least in the current iterations of what used to be the DOS command shell) the character '*' means "zero to any number of any character", and '?' means "one of any character".
Open a command window and try it with dir. I think you'll find that dir f*oo.txt will indeed match foo.txt.
That may be something new-ish and not in the old DOS command shell. It seems to me that the actual DOS shell was incapable of using *o*.txt to find foo.txt, where '*' had meant 'any number of any characters', but would then "swallow" them all. I think that this is the way that many flavors of CP/M worked as well, but they varied a lot. Intel came out with a version that used bits of something like PL/1 as a shell language.

The dot needs to be escaped with \ for it to be a dot and not a wildcard

Again no - not inside brackets. The only characters that need escaping inside brackets is ']' (for obvious reasons) and '\' itself. And I think in some flavors of regex you don't need to escape '\' unless it's directly to the left of one of the characters that, outside of brackets, would have needed escaping.

Note that none of these will match mikeslideshow (neither the regex or the dos wildcard filters).

Actually any of the expressions using ? or * will match that.
? = zero or one character
* = any number of characters, including zero
bobkoure
 

Postby Quade » Mon Aug 18, 2008 11:46 am

You will have to use brackets to escape explicit spaces in the next rev. Spaces are going to mean AND instead of space. I use brackets over \ alot. Since I program, slashes mean more to me so, brackets keep things clearer.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 45079
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97


Return to Regular Expressions

Who is online

Users browsing this forum: No registered users and 2 guests