Jump to content
xisto Community
Sign in to follow this  
Galahad

Perl Regular Expresions Regex Learn how to use regular expressions

Recommended Posts

I've searched the Tutorials section, but haven't found a RegEx tutorial, so I thought I'd add one, since it's very usefull to know regular expressions if you're a programmer... If I overlooked a tutorial explaining RegEx, my bad, just erase this tutorial...

Ok, first off, regular expressions are a great functionality of Perl, but the can also be found in other languages and environments, such as linux shells, or PHP... I'm guessing most people would be interesten in useing regular expressions in PHP...

Let's go on with a complex matching regular expresions

my $var = "This is some text here that we need to search.";if ($var =~ m/th/) {  print "I found a th\n";}
In the following example, regular expression will locate a 'th' in 'that', and not in 'This', because matching is case sensitive. If we wanted case insensitive matching, we would write this:
my $var = "This is some text here that we need to search.";if ($var =~ m/th/i) {  print "I found a th\n";}
And now, match will be found at 'This'. Regular expressions always try to find a match as soon as possible, and to make it as large as possible.

Summary of regular expression matching
m/search_text/ - Find search_textm/^search_text/ - Match search_text but only at the begining of the line. Operator ^ does thism/search_text$/ - Match search_text but only at the end of the line. Operator $ does this.m/^search_text$/ - Match search_text, but only if it's the entire textm/search_text/i - Match search_text, but case insensitive

Of course, if regular expressions were this easy, there wouldn't be a need for a tutorial. Regular expressions are quite powerfull, and can match anything in the text. For example, these wildcards can be used in regular expressions to find anything:
. - Match any character\w - Match words (alphanumeric characters and "_")\W - Match non-words\s - Match whitespace character\S - Match non-whitespace character\d - Match digit character\D - Match non-digit character\t - Match tab\n - Match newline\r - Match return\f - Match formfeed\a - Match alarm characted (bell, beep, and others)\e - Match escape\O45 - Octal characters match; in this case, it's 45 octal; Replace O with 0... I had to do this because PHP parses  as nothing, as you can see :)\x6fa - Hexadecimal character match; in this case. it's 6FA hex
Also, combined with these wildcards, you can use repetition operators:
* - Match 0 or more times+ - Match 1 or more times? - Match 0 or 1 times{n} - Match exactly n times{n,} - Match at least n times{n,m} - Match at least n, but not more than m times

Ok, I'll add a few examples for this so far:
$var =~ m/\+\d{1,3}\ \(\d{1,3}\)\ \d{3,4}-\d{3,4}/; # This example will match a telephone number in the following format +381 (21) 123-456 or +381 (21) 1234-567$var =~ m/^Hello/; # This example will match "Hello, world", but not " Hello, world" or "hello, world", because search is case sensitive, and requires the line to begin with Hello$var =~ m/galahad/i; # This line would match wherever a Galahad or galahad is found in text; search is case insensitive
Also, note how I escaped a space, a +, and brackets. I did this because they are also used by regular expressions, and escaping them makes regular expressions treat them as a common text. You escape a character with backslash (\)...Slashes also have to be escaped.

Ok, we're half way there... Now we go on to character groups and character classes...
What character groups do, is allow alternative phrases to be used. In the next example, it would be a match if we had a Susan, Marie, or Jennifer in the text
$var =~ m/(Susan|Marie|Jennifer)/;
Character groups also allow for retrieval of selected text, when used in selections, and placing them in scalars $1, $2, .. Buit I will cover that a bit later.

Character classes allow for character ranges. For example, this short line would match if we have names starting with A through N:
$var =~ m/^[A-N]/;
Character classes consist of one character, and one character only. The following will NOT work:
$var =~ m/^[Ab-Ne]/;

As per experience of others, character classes can be a bit quirky, so avoid using them, since character groups will almost always give you what you need. And now off to:

Selections AKA Parsing

Ok, we established that regular expressions are a mighty thing, but so far, they don't do anything spectacular. I mentioned character groups a bit earlier, and mentioned they can be used to retrieve selections. And here's how, and were regular expressions excell and get very usefull.

Say we have a phone number +381 (21) 123-456. Country code is 381, and area code is 21. And let's say we need all these in separate variables. Here's what we would do:
my $phone="+381 (21) 123-456";$phone =~ /\+(.+)\ \((.+)\)\ (.+)/;my $country = $1; # $country will contain 381my $area = $2; # $area will contain 21my $num = $3; # $num will contain 123-456
Pretty powerfull, huh? This is probably the best thing about regular expressions..

And one more thing you can do with regular expressions is...

Substitutions:

These are quite simple to master:
my $var = "Xisto sucks";$var =~ s/sucks/rules/; # $var now contains "Xisto rules"

Other things to note:
- If you want to make your search case insensitive, just add an i at the end of the regular expression, eg. m/match/i
- If you want to change all instances of a word, add an g at the end of the regular expression, eg. s/to_replace/replacer/g
- You can combine i and g, and have s/to_replace/replacer/gi, or s/blank//gi; The last one replaces all occurences of blank, with nothing ("")
- =~ means matches
= !~ means does not match

And voila, you now have sufficient knowledge to make rather powerfull regular expressions, and incorporate them in your PHP scripts, or Perl scripts, or wherever. I hope you found this tutorial usefull. Also, don't hesitate to experiment with regular expressions, because, that's the best way to learn something. And of course, don't hesitate to ask questions, if any of this was unclear...

Share this post


Link to post
Share on other sites

awesome tutorial! I love regex!Best thing to use when u fetch remote content :)But in perl it is a little harder than php, in my opinion, but this tutorialmakes it so clear and easy!Thanks!

Share this post


Link to post
Share on other sites

You're very welcome...Perl can be scary by itself, I know I was frustrated with how it works... It's completely different than conventional programming languages, but then, it's the same... If you catch my drift... An regular expressions can be particularly scary and frustrating... I still haven't got the full hang of it, but every day I get to know Perl a little better... It's aprticularly usefull to know Perl because here at Xisto we have full cgi support, so we can make Perl scripts that do complex tasks... I plan on series of tutorials for Perl, from how to make a CGI script, connect to a MySQL database, to some other stuff... Just now, I'm working on a primitive mail junk filter... And thanks to RegEx, it's a breeze... It's not a smart filter, it doesn't have the ability to learn, but with a few good rules, whitelists and blacklists, it get's the job done much better and quicker than, for example, SpamAssasin... I suppose I could put it here, and make a tutorial out of it... There's an idea :) I always like to help beginners get a hang of something new, and help them avoid stuff that made me cry with frustration and anger :D

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.