Regular Expression For A Particular Url?

Mordent · November 6, 2009

Hi folks, me again with yet another JavaScript-y problem. Firstly, I have next to no experience with regular expressions, so apologies for any slow-wittedness on my part with this particular topic.

I'm trying to use regular expressions (in JavaScript, if it's at all relevant) to work out whether a particular string (the current page URL, not that it really matters) is of the following form:

http://(www. optional)example.com(anything else here)

I've done a good few searches of Google, but anything that it turns up is either for checking if a string is in a URL format (not for a particular site) or full of strange, spooky regex runes that mean nothing to me.

One other parameter for my expression that I should point out: if the URL is a subdomain of example.com (so http://forums.xisto.com/no_longer_exists/ or http://forums.xisto.com/no_longer_exists/) then I want it seen as invalid. I'm likely to be using the .search() method for comparison, if it makes any difference to the regular expression.

Anyone care to help out the regex newbie? Extra cookies for anyone who explains their expression, as regular expressions are something that I'd really like to learn a bit of. Thanks in advance!

Edited November 6, 2009 by Mordent (see edit history)

rob86 · November 6, 2009

Here's what I've got, it probably has a lot of flaws (heck, it might be complete garbage) but I tried. It does work for some simple tests. Regex is really useful,but it gives me a headache trying to understand it. I'm not very good with regex but I need the practice so I'm going to attempt this.

\bhttp://(www\.)?([a-zA-Z_]\w+)?example\.com/?(.*)\b

Match http:// (literal string)
Optionally, match www. (the ? around something means optional)
Then, optionally match anything that starts with a-z, A-Z or _, and then has letters or numbers (\w),
Then, match example.com and optionally include anything else after .com/
and make sure it's one "word/string" with no whitespace with \b

As I explained this, I realize it has errors. For one, it doesn't support funny characters like - or _ really. Two, at the end, it DOES support anything, so "%@#%@#%!$!@%$&*%$^@#%)()@!$*@$" or something equally non-english would be valid. Three, I'm not too sure about the \b, I tried a few different things and that seemed to work the best, but I'm not too confident \b is the best method depending on your uses. I'm not sure how it works in JS, you might have to take both them out. I know this isn't exactly what you want, but it's a start...

You really should check out http://www.regular-expressions.info/, it's a great little lesson on reg-ex (which is surprisingly complicated) and download some kind of reg ex testing software. On Windows, there's one called RegExBuddy which is just excellent, but it costs money so I only tried it for a few minutes. It explains everything to you in plain english, like I did above, but much more clear. I'm using the program called "Kiki" on ubuntu to check my reg ex. Not as good as Regexbuddy, but it's free. There are probably other ones, or maybe even some online ones. Something to test your regular expressions is very useful.
Edited November 6, 2009 by rob86 (see edit history)

jlhaslip · November 7, 2009

Why not simply look for 'example.com; or 'example.com; and ignore the ones with a subdomain completely? No need finding the ones with a subdomain is what I am suggesting, so don't even search for those ones.

nooc9 · November 7, 2009

I guess you want something like this:

function urlIsOk(url){   return url.match( /^http:\/\/(www\.)?example.com/i )) != null;}

Mordent · November 7, 2009

Oooh, lots of responses. That's what I like to see!

Here's what I've got, it probably has a lot of flaws (heck, it might be complete garbage) but I tried. It does work for some simple tests. Regex is really useful,but it gives me a headache trying to understand it. I'm not very good with regex but I need the practice so I'm going to attempt this.
\bhttp://(www\.)?([a-zA-Z_]\w+)?example\.com/?(.*)\b
Match http:// (literal string)
Optionally, match www. (the ? around something means optional)

Then, optionally match anything that starts with a-z, A-Z or _, and then has letters or numbers (\w),

Then, match example.com and optionally include anything else after .com/

and make sure it's one "word/string" with no whitespace with \b

As I explained this, I realize it has errors. For one, it doesn't support funny characters like - or _ really. Two, at the end, it DOES support anything, so "%@#%@#%!$!@%{:content:}amp;*%{:content:}amp;#^@#%)()@!$*@{:content:}quot; or something equally non-english would be valid. Three, I'm not too sure about the \b, I tried a few different things and that seemed to work the best, but I'm not too confident \b is the best method depending on your uses. I'm not sure how it works in JS, you might have to take both them out. I know this isn't exactly what you want, but it's a start...

You really should check out http://www.regular-expressions.info/, it's a great little lesson on reg-ex (which is surprisingly complicated) and download some kind of reg ex testing software. On Windows, there's one called RegExBuddy which is just excellent, but it costs money so I only tried it for a few minutes. It explains everything to you in plain english, like I did above, but much more clear. I'm using the program called "Kiki" on ubuntu to check my reg ex. Not as good as Regexbuddy, but it's free. There are probably other ones, or maybe even some online ones. Something to test your regular expressions is very useful.

Cheers for the detailed overview there, but I get the impression you misinterpreted my bit about subdomains (I don't want a subdomain URL to be valid). Still, thanks for the suggestions and I'll definitely be looking some of them up!

Why not simply look for 'http://example.com/' or 'http://www.example.com/' and ignore the ones with a subdomain completely?

No need finding the ones with a subdomain is what I am suggesting, so don't even search for those ones.

Fair point, though I mentioned it for clarification in case anyone tried to incorporate it in to their example.

I guess you want something like this:

function urlIsOk(url)   {	  return url.match( /^http:\/\/(www\.)?example.com/i )) != null;   }

Looks pretty much the sort of thing I'm after, although what some of that syntax means is a mystery to me. Why the "\/\/"? I'm assuming that backslash is some sort of escape character? Also, nice find with the .match() method. I was simply planning on using .search() and checking if it returned greater than or equal to 0, but that's definitely more the sort of thing that I'm after.

Looks like I better go dig up some regexp tutorials and try to decipher some of these expressions! As always, you folks at Trap seem to have the answer I'm looking for, so cheers for that. I'll let you know what I come up with as a final expression when I do. Thanks to all of you!

nooc9 · November 7, 2009

Why the "\/\/"? I'm assuming that backslash is some sort of escape character?

Yes, indeed. I'm escaping the slashes because they have special meaning here, ie. you construct a regex by using /expression/flags.

Mordent · November 7, 2009

Right, after a little bit of trial and error (as well as using various tutorial websites) I came up with the following expression:

^http://(www\.)?example\.com/.*$

Seems to work like a charm. It also hasn't got the crazy "\/\/" at the beginning, because as far as I could tell the forward-slash doesn't need escaping. It worked, so if I'm wrong then clearly it's right in some quirky way. Also, I'm not sure why you had a forward slash at the start of your expression and an i at the end, nooc9. As far as I can see they have no significance in regular expressions. Care to comment, or just a slip-up?

Any comments on my expression?

nooc9 · November 8, 2009

No, not a slip up. Its just another regex constructor seen a lot in perl and it allows you to pass flags. In my expression I had the i flag (case insensitive).

If you omit the $ at the end then you don't need ".*" either.

Mordent · November 8, 2009

No, not a slip up. Its just another regex constructor seen a lot in perl and it allows you to pass flags. In my expression I had the i flag (case insensitive).

If you omit the $ at the end then you don't need ".*" either.

Ah, gotcha. Yeah, I see how the $ thing isn't needed now, as what's after the last forward slash is pretty much irrelevant. Cheers for pointing that one out.

Sign In

Regular Expression For A Particular Url?

Recommended Posts

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Important Information