PDA

View Full Version : Regular expressions


francis
25th Feb 2004, 09:28 pm
I don't know if this is going to help anyone or not, but regular expressions are insanely powerful, and can come in very useful when validating user input, amongst other things. They're the type of things that get used once in a while - infrequently enough that unless you know them inside out, when you need one, you'll end up scratching your head for a good while (as I just have been). I've just stumbled across regexlib.com (http://www.regexlib.com/) which has some excellent information.

There's also regularexpressions.info (http://regularexpressions.info/).

PHP has some excellent regex functions (I've just used preg_replace (http://uk2.php.net/manual/en/function.preg-replace.php) in a project). You'll need to read PHP's pattern syntax (http://uk2.php.net/manual/en/pcre.pattern.syntax.php) and pattern modifiers (http://uk2.php.net/manual/en/pcre.pattern.modifiers.php) before getting too involved.

Phil
28th Feb 2004, 01:37 pm
Agreed. There really isn't much you can't do to text with regExps. Doing it properly can get pretty dense though, i must confess all my regexping is usually a combination of trial and error and lifting existing examples.

"One could spend a lifetime studying regular expressions - and it would not have been a wasted life"

francis
29th Feb 2004, 11:08 am
I find that getting the correct regex can be a bit of a nightmare as well. Still, once it's there, it's a lifesaver. Here's an example using PHP's preg_match function:
$string="123.45";

if (preg_match ("/^\d+(\.\d{2})?$/", $string))
{
echo"<p>correct currency entered</p>";
}
else
{
echo"<p>invalid currency entered</p>";
}
What it's looking for is a correct currency value. It will accept a value of, for example, 123 or 596.87. The decimal point is optional, but if it's present there has to be two numbers after it. It will reject 293.4, for example, as being an invalid amount.

Dreamweaver has pretty good regex support in its find/replace system. Search the help files for a pretty good document. Additionally, I've recently discovered the joy of being able to save find/replace terms in DW. After struggling to get a regex working, it's very nice to be able to save it to use again.

Phil
29th Feb 2004, 04:16 pm
How about this (nicked) for validating an email address from a form

if (eregi("^([_a-z0-9-]+)(\.[_a-z0-9-]+)*@([a-z0-9-]+)(\.[a-z0-9-]+)*(\.[a-z]{2,4})$", $_POST['email']))
{
echo '
email address is fine';
}
else echo '
Please enter a valid e-mail address.';



- must have an @ in it
- must be something before the @
- whatever's after the @ must have a dot in it
- must be more than two letters after the last dot
- must be only alphanumeric
etc etc.

francis
29th Feb 2004, 04:42 pm
Mmm, not sure about that. Have a look at regex lib's email patterns (http://www.regexlib.com/DisplayPatterns.aspx) - they're somewhat more complex as they cover IP addresses as well.

In the past I've used regexes to fix pages in DW by adding quotes to name/value pairs created by a MS app (eg: height=50 becomes height="50") using:

FIND: height=(\d+)
REPLACE: height="$1"

Phil
29th Feb 2004, 05:24 pm
oh lord,
^((\"[^\"\f\n\r\t\v\b]+\")|([\w\!\#\$\%\&\'\*\+\-\~\/\^\`\|\{\}]+(\.[\w\!\#\$\%\&\'\*\+\-\~\/\^\`\|\{\}]+)*))@((\[(((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9])))\])|(((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9])))|((([A-Za-z0-9\-])+\.)+[A-Za-z\-]+))$

it's interesting as a proof of concept, but how many emails *really* have ip addresses as their domain?

personally i try to avoid putting my real mail out there anyway. "abc@def.com" usually gets me through

Tom
29th Feb 2004, 06:36 pm
Have you two gone mad? Or are you talking in code?

francis
29th Feb 2004, 07:09 pm
Not mad, but talking pattern matching, which is what a regex does. Take my example from above:

^\d+(\.\d{2})?$

^ is the beginning of a pattern to be searched for
\d is any digit (0-9)
+ means 1 or more of the preceding item, so \d+ means one or more digits
\. is a dot (a . on its own means something different, so we have to escape it with a \ if we mean dot)
{2} means exactly and only 2 of the preceding item, so \d{2} means 2 digits
? means that the preceding item is optional - it doesn't have to there. Putting the \.\d{2} in brackets followed by a ? means that the entire bracketed section is optional
$ is the end of a pattern to be searched for

So, putting it all together the pattern is:

1 or more digits followed by an optional dot and 2 digits. If there is a dot then it must be followed by only 2 digits; not none, one, three or more. If there isn't a dot then that's okay, just a number will do.

And, as we haven't specified anything but digits and a dot, anything else (letters, currency symbols, commas, dashes, spaces etc etc) will cause an error/the validation to fail.

Tom
1st Mar 2004, 06:59 am
Thank you: yet another language I would like to have time to learn. If you have spare time for coding, or if I have need for code in a language I don't know, we should both go to Rent A Coder (http://www.rentacoder.com/RentACoder/default.asp). For translation to and from spoken languages there is a similar facility at Translators Cafe (http://www.translatorscafe.com/cafe/Cookies.asp?Failed=Yes).

francis
1st Mar 2004, 07:15 am
There's really only one definitive reference book (http://www.amazon.co.uk/exec/obidos/ASIN/1565922573/ref=sr_aps_books_1_1/026-9393819-5757257) on regular expressions, although I just seen that SAMS have a "teach yourself" book coming out. Not exactly sure that you can learn this subject in 10 minutes, but some of their books are pretty good.

francis
22nd Mar 2004, 10:56 pm
Just found this regular expression basics (http://evolt.org/article/rating/20/22700/index.html) article on Evolt, if anyone wants to get their feet wet. Poor nerd humour is also present:

What regex are you most likely to see at Christmas?

[^L]

Why couldn't Chris try out the regular expressions he created until he left home?

His mom wouldn't let him play with matches.

francis
3rd Apr 2004, 06:31 pm
I downloaded ActivePerl (http://www.activestate.com/Products/ActivePerl/) for Windows this afternoon and was having a mooch around the ActiveState site. I came across their regular expression area and found this (http://aspn.activestate.com/ASPN/Cookbook/Rx/Recipe/59864) - possibly the longest regex in the history of, well, things.

I have noticed that this forum will automatically turn typed URLs into links ( eg: http://www.example.com ) if you just plonk them in without using the HTTP button. Maybe that's how they do it.

francis
12th Sep 2005, 12:42 pm
Oooh, regular expressions in Word (http://office.microsoft.com/en-us/assistance/HA010873041033.aspx)! MS' regex syntax is strange to say the least - naturally it would have been impossible for them to use the standard Perl syntax... Still, I'm very glad they have it at all.

ps - I tried to use MS' "contact us with a suggestion" feature to suggest using Perl syntax, but you have to have a Microsoft Passport account to do that. Pffft.