Mordent 0 Report post Posted November 6, 2009 (edited) Hi folks, me again with yet another JavaScript-y problem. Firstly, I have next to no experience with regular expressions, so apologies for any slow-wittedness on my part with this particular topic. I'm trying to use regular expressions (in JavaScript, if it's at all relevant) to work out whether a particular string (the current page URL, not that it really matters) is of the following form: http://(www. optional)example.com(anything else here)I've done a good few searches of Google, but anything that it turns up is either for checking if a string is in a URL format (not for a particular site) or full of strange, spooky regex runes that mean nothing to me. One other parameter for my expression that I should point out: if the URL is a subdomain of example.com (so http://forums.xisto.com/no_longer_exists/ or http://forums.xisto.com/no_longer_exists/) then I want it seen as invalid. I'm likely to be using the .search() method for comparison, if it makes any difference to the regular expression. Anyone care to help out the regex newbie? Extra cookies for anyone who explains their expression, as regular expressions are something that I'd really like to learn a bit of. Thanks in advance! Edited November 6, 2009 by Mordent (see edit history) Share this post Link to post Share on other sites
rob86 2 Report post Posted November 6, 2009 (edited) Here's what I've got, it probably has a lot of flaws (heck, it might be complete garbage) but I tried. It does work for some simple tests. Regex is really useful,but it gives me a headache trying to understand it. I'm not very good with regex but I need the practice so I'm going to attempt this. \bhttp://(www\.)?([a-zA-Z_]\w+)?example\.com/?(.*)\b Match http:// (literal string)Optionally, match www. (the ? around something means optional)Then, optionally match anything that starts with a-z, A-Z or _, and then has letters or numbers (\w), Then, match example.com and optionally include anything else after .com/and make sure it's one "word/string" with no whitespace with \bAs I explained this, I realize it has errors. For one, it doesn't support funny characters like - or _ really. Two, at the end, it DOES support anything, so "%@#%@#%!$!@%$&*%$^@#%)()@!$*@$" or something equally non-english would be valid. Three, I'm not too sure about the \b, I tried a few different things and that seemed to work the best, but I'm not too confident \b is the best method depending on your uses. I'm not sure how it works in JS, you might have to take both them out. I know this isn't exactly what you want, but it's a start... You really should check out http://www.regular-expressions.info/, it's a great little lesson on reg-ex (which is surprisingly complicated) and download some kind of reg ex testing software. On Windows, there's one called RegExBuddy which is just excellent, but it costs money so I only tried it for a few minutes. It explains everything to you in plain english, like I did above, but much more clear. I'm using the program called "Kiki" on ubuntu to check my reg ex. Not as good as Regexbuddy, but it's free. There are probably other ones, or maybe even some online ones. Something to test your regular expressions is very useful. Edited November 6, 2009 by rob86 (see edit history) Share this post Link to post Share on other sites
jlhaslip 4 Report post Posted November 7, 2009 Why not simply look for 'example.com; or 'example.com; and ignore the ones with a subdomain completely? No need finding the ones with a subdomain is what I am suggesting, so don't even search for those ones. Share this post Link to post Share on other sites
nooc9 0 Report post Posted November 7, 2009 I guess you want something like this: function urlIsOk(url){ return url.match( /^http:\/\/(www\.)?example.com/i )) != null;} Share this post Link to post Share on other sites
Mordent 0 Report post Posted November 7, 2009 Oooh, lots of responses. That's what I like to see! Here's what I've got, it probably has a lot of flaws (heck, it might be complete garbage) but I tried. It does work for some simple tests. Regex is really useful,but it gives me a headache trying to understand it. I'm not very good with regex but I need the practice so I'm going to attempt this. \bhttp://(www\.)?([a-zA-Z_]\w+)?example\.com/?(.*)\b Match http:// (literal string)Optionally, match www. (the ? around something means optional) Then, optionally match anything that starts with a-z, A-Z or _, and then has letters or numbers (\w), Then, match example.com and optionally include anything else after .com/ and make sure it's one "word/string" with no whitespace with \b As I explained this, I realize it has errors. For one, it doesn't support funny characters like - or _ really. Two, at the end, it DOES support anything, so "%@#%@#%!$!@%{:content:}amp;*%{:content:}amp;#^@#%)()@!$*@{:content:}quot; or something equally non-english would be valid. Three, I'm not too sure about the \b, I tried a few different things and that seemed to work the best, but I'm not too confident \b is the best method depending on your uses. I'm not sure how it works in JS, you might have to take both them out. I know this isn't exactly what you want, but it's a start... You really should check out http://www.regular-expressions.info/, it's a great little lesson on reg-ex (which is surprisingly complicated) and download some kind of reg ex testing software. On Windows, there's one called RegExBuddy which is just excellent, but it costs money so I only tried it for a few minutes. It explains everything to you in plain english, like I did above, but much more clear. I'm using the program called "Kiki" on ubuntu to check my reg ex. Not as good as Regexbuddy, but it's free. There are probably other ones, or maybe even some online ones. Something to test your regular expressions is very useful. Cheers for the detailed overview there, but I get the impression you misinterpreted my bit about subdomains (I don't want a subdomain URL to be valid). Still, thanks for the suggestions and I'll definitely be looking some of them up! Why not simply look for 'http://example.com/' or 'http://www.example.com/' and ignore the ones with a subdomain completely? No need finding the ones with a subdomain is what I am suggesting, so don't even search for those ones. Fair point, though I mentioned it for clarification in case anyone tried to incorporate it in to their example. I guess you want something like this: function urlIsOk(url) { return url.match( /^http:\/\/(www\.)?example.com/i )) != null; } Looks pretty much the sort of thing I'm after, although what some of that syntax means is a mystery to me. Why the "\/\/"? I'm assuming that backslash is some sort of escape character? Also, nice find with the .match() method. I was simply planning on using .search() and checking if it returned greater than or equal to 0, but that's definitely more the sort of thing that I'm after. Looks like I better go dig up some regexp tutorials and try to decipher some of these expressions! As always, you folks at Trap seem to have the answer I'm looking for, so cheers for that. I'll let you know what I come up with as a final expression when I do. Thanks to all of you! Share this post Link to post Share on other sites
nooc9 0 Report post Posted November 7, 2009 Why the "\/\/"? I'm assuming that backslash is some sort of escape character?Yes, indeed. I'm escaping the slashes because they have special meaning here, ie. you construct a regex by using /expression/flags. Share this post Link to post Share on other sites
Mordent 0 Report post Posted November 7, 2009 Right, after a little bit of trial and error (as well as using various tutorial websites) I came up with the following expression: ^http://(www\.)?example\.com/.*$Seems to work like a charm. It also hasn't got the crazy "\/\/" at the beginning, because as far as I could tell the forward-slash doesn't need escaping. It worked, so if I'm wrong then clearly it's right in some quirky way. Also, I'm not sure why you had a forward slash at the start of your expression and an i at the end, nooc9. As far as I can see they have no significance in regular expressions. Care to comment, or just a slip-up?Any comments on my expression? Share this post Link to post Share on other sites
nooc9 0 Report post Posted November 8, 2009 No, not a slip up. Its just another regex constructor seen a lot in perl and it allows you to pass flags. In my expression I had the i flag (case insensitive). If you omit the $ at the end then you don't need ".*" either. Share this post Link to post Share on other sites
Mordent 0 Report post Posted November 8, 2009 No, not a slip up. Its just another regex constructor seen a lot in perl and it allows you to pass flags. In my expression I had the i flag (case insensitive). If you omit the $ at the end then you don't need ".*" either. Ah, gotcha. Yeah, I see how the $ thing isn't needed now, as what's after the last forward slash is pretty much irrelevant. Cheers for pointing that one out. Share this post Link to post Share on other sites