Honesty Rocks! truth rules.

Need Help About Regular Expressions

HOME      >>       Programming

Erdemir

Hi,

In my website, I don't want to allow writing links to another sites but except some sites.

//This is the text which is sent by the guest$variable='Some texts <a href="https://www.google.de/?gfe_rd=cr&ei=BwkjVKfAD8uH8QfckIGgCQ&gws_rd=ssl href="https://www.microsoft.com/de-de; '; //And the following line replaces all <a html tags to "no links allowed" text.$variable = preg_replace("!<a[^>]*(http|www|mailto)(.*)</a>!siU", "no links allowed", $variable);echo($variable);

I want to allow only a few sites: google.com, Xisto.com, yahoo.com, etc.

I want to disallow microsoft.com, hotmail.com and any other sites.

What are your suggestions?

What regular expression should I use? or other opinions?


jlhaslip

If you have a short list of acceptable links which you will allow, maybe a switch/case structure would be easier? Based on an Array of suitable values?Regular expressions can very resource intensive on the server.


Erdemir

If you have a short list of acceptable links which you will allow, maybe a switch/case structure would be easier? Based on an Array of suitable values?
Regular expressions can very resource intensive on the server.

Ok, my allowed links array is here
$allowedlinks = array ("google.com", "Xisto.com", "yahoo.com", "dmoz.org");
Now, how can we integrate switch/case with preg_replace or without preg_replace?

Thanks...

galexcd

I know I said I wasn't good at regular expressions, however I've been poking around with your problem for the past hour and found something that may work:

<a href=["']?((?!((http://(www\.)?wikipedia\.org)|(http://(www\.)?google\.com)|(http://(www\.)?Xisto\.com))).)+["']?>(.*?)</a>

This should match all links that aren't wikipedia.org, google.com and Xisto.com.

Erdemir

I know I said I wasn't good at regular expressions, however I've been poking around with your problem for the past hour and found something that may work:

<a href=["']?((?!((http://(www\.)?wikipedia\.org)|(http://(www\.)?google\.com)|(http://(www\.)?Xisto\.com))).)+["']?>(.*?)</a>

This should match all links that aren't wikipedia.org, google.com and Xisto.com.
Sorry, I couldn't use that directly in php. But your code was too helpful. I will try to use it in preg_replace(). Thanks.
Any more suggestions?

jlhaslip

A small note found on the php.net manual pages for the preg_match function:

Tip
Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.



galexcd

A small note found on the php.net manual pages for the preg_match function:

Interesting find... However, I believe he wishes to replace the bad links with "No Links" or something similar. I think as you said previously the regexp engine is pretty resource heavy on the server and takes a bit longer to process. Perhaps a non regular expression method would be the best way. I might see if I can whip one up for you, however until then I think we still need to hear from one of our regular expression experts... *cough* rvalkass *cough*

truefusion

The following should get the results you desire:

<?php$allowedlinks = array("google.com", "Xisto.com", "yahoo.com", "dmoz.org");function site_filter($matches){ global $allowedlinks; if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?$/", $matches[1])){ return "Link not allowed."; } else { return $matches[0]; }}$variable='Some texts <a href="https://www.google.de/?gfe_rd=cr&ei=BwkjVKfAD8uH8QfckIGgCQ&gws_rd=ssl href="https://www.microsoft.com/de-de; ';$variable = preg_replace_callback("/<a href=[\"']([^<]+)[\"']>[^<]+<\/a>/", "site_filter", $variable);echo $variable;?>
Would have been easier to work with if each anchor element was on its own line.

Erdemir

The following should get the results you desire:

<?php$allowedlinks = array("google.com", "Xisto.com", "yahoo.com", "dmoz.org");function site_filter($matches){ global $allowedlinks; if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?$/", $matches[1])){ return "Link not allowed."; } else { return $matches[0]; }}$variable='Some texts <a href="www.google.de/?gfe_rd=cr&ei=BwkjVKfAD8uH8QfckIGgCQ&gws_rd=ssl href="microsoft.com/de-de/default.aspx; ';$variable = preg_replace_callback("/<a href=[\"']([^<]+)[\"']>[^<]+<\/a>/", "site_filter", $variable);echo $variable;?>
Would have been easier to work with if each anchor element was on its own line.
Thank you very much TrueFusion. I tried your code and worked good. However I tried for this variable;

$variable='Some texts <a href="google.com/subdirector/level2/page3.html href="microsoft.com/sub/win/page5.html; ';
and that function disallowed (didn't allowed) all links and wrote "No links allowed" for each links. This means the code isn't working if the allowed domain link has a subdirectories.

 

So I tried to edit your code and changed your 8th line and I replaced ?$/ to ?([^\/]+)/i

 

This is replaced:

if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?$/", $matches[1])){

Replaced to:
if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?([^\/]+)/i", $matches[1])){
By replacing this now working even the link has a subdirectory.

 

 

//Edit: I have developed the code and the latest is here, no mistakes detected yet, blocking unwanted sites:

$allowedlinks = array("google.com", "Xisto.com", "yahoo.com", "wikipedia.org");function site_filter($matches){ global $allowedlinks; if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?([^\/]+)/i", $matches[2])){ return " Link not allowed. "; } else { return $matches[0]; }}$variable='Some texts <a href="google.com/subdirectory/level2/page3.html&%2334; target="_blank">Google</a><br><a href="microsoft.com/sub/win/page5.html;<a href="http://subdomain.killwithme.com" target="_blank"><font color="Red"><u>Inside</u></font><font color="Blue"><u>tags are no problem now</u></font></a><br><a href="mailto:try@try.com" target="_blank">email</a>';$variable = preg_replace_callback("!<a[^>]*(http|www|mailto)(.*)</a>!siU", "site_filter", $variable);echo $variable;

Thanks again TrueFusion, your code is very good. Good job...