Page 1 of 1

Regex feature request

PostPosted: Fri Oct 23, 2009 11:10 am
by optiman
Quade can you add in the regex feature to "exclude" results? I think it's the ^operator, so you return all search results that DON'T include the ^[text] string. That would eliminate junk posts that litter results containing that string. I don't want to create a filter for it as it is usually a one-off situation.
Thanks

PostPosted: Fri Oct 23, 2009 11:36 am
by itimpi
As far as I know there is no regex operator to exclude (the ^ operator means anchor to start) which I have always though was a shame. I am sure Quade is just using a standard RegEx engine - not something custom written for Newsbin.

PostPosted: Sat Oct 24, 2009 8:50 am
by ozzii
There is NO exclude for regex :cry:

PostPosted: Sat Oct 24, 2009 12:22 pm
by optiman
You're right, I must have misunderstood regex examples on the web. Too bad, I'd still love to see an "exclude string" capability to eliminate garbage postings.

PostPosted: Sun Oct 25, 2009 6:26 am
by ozzii
Add you word to exclude into the exclude filter !

PostPosted: Wed Oct 28, 2009 5:26 pm
by optiman
As I said in my first post, I don't want a permanent filter for occasional garbage posts.

PostPosted: Tue Nov 24, 2009 12:55 pm
by bobkoure
Why not use lookahead?
negative lookahead:
(?!foo) means not followed by "foo".
^(?!.*foo) means start of line not followed by any number of chars and then "foo"
positive lookahead:
(?=bar) means followed by "bar"
^(?=.*bar) means start of line followed by any number of chars and then "bar" .

Lookahead is a regex "zero width operator", which means (among other things) that you can stack 'em, so...

^(?!.*foo)(?=.*bar)
Will pick up any lines that have "bar" but not "foo".

PostPosted: Wed Nov 25, 2009 4:05 pm
by viking
bobkoure wrote:Why not use lookahead?
negative lookahead:
(?!foo) means not followed by "foo".
^(?!.*foo) means start of line not followed by any number of chars and then "foo"
positive lookahead:
(?=bar) means followed by "bar"
^(?=.*bar) means start of line followed by any number of chars and then "bar" .

Lookahead is a regex "zero width operator", which means (among other things) that you can stack 'em, so...

^(?!.*foo)(?=.*bar)
Will pick up any lines that have "bar" but not "foo".

I was looking for the same thing, with a twist:

I would like to exclude all posts with "foo" except those with "newfoo"?
(The simple negative lookahead above, (?!foo), also excludes "newfoo")

PostPosted: Fri Nov 27, 2009 12:18 pm
by bobkoure
You could try '\b', which is another "zero width" operator - means "word boundary"
\bfoo\b matches foo 1foo 1234foo33 but not newfoo

So... ^(?!.*\bfoo\b) is all posts not including the word foo.

PostPosted: Sat Nov 28, 2009 8:54 pm
by viking
bobkoure wrote:You could try '\b', which is another "zero width" operator - means "word boundary"
\bfoo\b matches foo 1foo 1234foo33 but not newfoo

So... ^(?!.*\bfoo\b) is all posts not including the word foo.

Great. Thanks!!