Erdemir 0 Report post Posted August 16, 2008 (edited) Hi, In my website, I don't want to allow writing links to another sites but except some sites. //This is the text which is sent by the guest$variable='Some texts <a href="https://www.google.de/?gfe_rd=cr&ei=BwkjVKfAD8uH8QfckIGgCQ&gws_rd=ssl href="https://www.microsoft.com/de-de; '; //And the following line replaces all <a html tags to "no links allowed" text.$variable = preg_replace("!<a[^>]*(http|www|mailto)(.*)</a>!siU", "no links allowed", $variable);echo($variable); I want to allow only a few sites: google.com, Xisto.com, yahoo.com, etc. I want to disallow microsoft.com, hotmail.com and any other sites. What are your suggestions? What regular expression should I use? or other opinions? Edited August 16, 2008 by Erdemir (see edit history) Share this post Link to post Share on other sites
jlhaslip 4 Report post Posted August 16, 2008 If you have a short list of acceptable links which you will allow, maybe a switch/case structure would be easier? Based on an Array of suitable values?Regular expressions can very resource intensive on the server. Share this post Link to post Share on other sites
Erdemir 0 Report post Posted August 16, 2008 (edited) If you have a short list of acceptable links which you will allow, maybe a switch/case structure would be easier? Based on an Array of suitable values?Regular expressions can very resource intensive on the server.Ok, my allowed links array is here$allowedlinks = array ("google.com", "Xisto.com", "yahoo.com", "dmoz.org");Now, how can we integrate switch/case with preg_replace or without preg_replace?Thanks... Edited August 16, 2008 by Erdemir (see edit history) Share this post Link to post Share on other sites
galexcd 0 Report post Posted August 16, 2008 I know I said I wasn't good at regular expressions, however I've been poking around with your problem for the past hour and found something that may work: <a href=["']?((?!((http://(www\.)?wikipedia\.org)|(http://(www\.)?google\.com)|(http://(www\.)?Xisto\.com))).)+["']?>(.*?)</a> This should match all links that aren't wikipedia.org, google.com and Xisto.com. Share this post Link to post Share on other sites
Erdemir 0 Report post Posted August 16, 2008 I know I said I wasn't good at regular expressions, however I've been poking around with your problem for the past hour and found something that may work: <a href=["']?((?!((http://(www\.)?wikipedia\.org)|(http://(www\.)?google\.com)|(http://(www\.)?Xisto\.com))).)+["']?>(.*?)</a> This should match all links that aren't wikipedia.org, google.com and Xisto.com.Sorry, I couldn't use that directly in php. But your code was too helpful. I will try to use it in preg_replace(). Thanks.Any more suggestions? Share this post Link to post Share on other sites
jlhaslip 4 Report post Posted August 16, 2008 A small note found on the php.net manual pages for the preg_match function: TipDo not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster. Share this post Link to post Share on other sites
galexcd 0 Report post Posted August 16, 2008 A small note found on the php.net manual pages for the preg_match function:Interesting find... However, I believe he wishes to replace the bad links with "No Links" or something similar. I think as you said previously the regexp engine is pretty resource heavy on the server and takes a bit longer to process. Perhaps a non regular expression method would be the best way. I might see if I can whip one up for you, however until then I think we still need to hear from one of our regular expression experts... *cough* rvalkass *cough* Share this post Link to post Share on other sites
truefusion 3 Report post Posted August 17, 2008 The following should get the results you desire: <?php$allowedlinks = array("google.com", "Xisto.com", "yahoo.com", "dmoz.org");function site_filter($matches){ global $allowedlinks; if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?$/", $matches[1])){ return "Link not allowed."; } else { return $matches[0]; }}$variable='Some texts <a href="https://www.google.de/?gfe_rd=cr&ei=BwkjVKfAD8uH8QfckIGgCQ&gws_rd=ssl href="https://www.microsoft.com/de-de; ';$variable = preg_replace_callback("/<a href=[\"']([^<]+)[\"']>[^<]+<\/a>/", "site_filter", $variable);echo $variable;?>Would have been easier to work with if each anchor element was on its own line. Share this post Link to post Share on other sites
Erdemir 0 Report post Posted August 17, 2008 (edited) The following should get the results you desire: <?php$allowedlinks = array("google.com", "Xisto.com", "yahoo.com", "dmoz.org");function site_filter($matches){ global $allowedlinks; if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?$/", $matches[1])){ return "Link not allowed."; } else { return $matches[0]; }}$variable='Some texts <a href="www.google.de/?gfe_rd=cr&ei=BwkjVKfAD8uH8QfckIGgCQ&gws_rd=ssl href="microsoft.com/de-de/default.aspx; ';$variable = preg_replace_callback("/<a href=[\"']([^<]+)[\"']>[^<]+<\/a>/", "site_filter", $variable);echo $variable;?>Would have been easier to work with if each anchor element was on its own line.Thank you very much TrueFusion. I tried your code and worked good. However I tried for this variable; $variable='Some texts <a href="google.com/subdirector/level2/page3.html href="microsoft.com/sub/win/page5.html; ';and that function disallowed (didn't allowed) all links and wrote "No links allowed" for each links. This means the code isn't working if the allowed domain link has a subdirectories. So I tried to edit your code and changed your 8th line and I replaced ?$/ to ?([^\/]+)/i This is replaced: if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?$/", $matches[1])){ Replaced to:if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?([^\/]+)/i", $matches[1])){By replacing this now working even the link has a subdirectory. //Edit: I have developed the code and the latest is here, no mistakes detected yet, blocking unwanted sites: $allowedlinks = array("google.com", "Xisto.com", "yahoo.com", "wikipedia.org");function site_filter($matches){ global $allowedlinks; if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?([^\/]+)/i", $matches[2])){ return " Link not allowed. "; } else { return $matches[0]; }}$variable='Some texts <a href="google.com/subdirectory/level2/page3.html&%2334; target="_blank">Google</a><br><a href="microsoft.com/sub/win/page5.html;<a href="http://subdomain.killwithme.com" target="_blank"><font color="Red"><u>Inside</u></font><font color="Blue"><u>tags are no problem now</u></font></a><br><a href="mailto:try@try.com" target="_blank">email</a>';$variable = preg_replace_callback("!<a[^>]*(http|www|mailto)(.*)</a>!siU", "site_filter", $variable);echo $variable; Thanks again TrueFusion, your code is very good. Good job... Edited August 17, 2008 by Erdemir (see edit history) Share this post Link to post Share on other sites