HOME       >>       Programming

Need Help About Regular Expressions


Erdemir

Hi,

In my website, I don't want to allow writing links to another sites but except some sites.

//This is the text which is sent by the guest$variable='Some texts <a href="https://www.google.de/?gfe_rd=cr&ei=BwkjVKfAD8uH8QfckIGgCQ&gws_rd=ssl href="https://www.microsoft.com/de-de; '; //And the following line replaces all <a html tags to "no links allowed" text.$variable = preg_replace("!<a[^>]*(http|www|mailto)(.*)</a>!siU", "no links allowed", $variable);echo($variable);

I want to allow only a few sites: google.com, Xisto.com, yahoo.com, etc.

I want to disallow microsoft.com, hotmail.com and any other sites.

What are your suggestions?

What regular expression should I use? or other opinions?


jlhaslip

If you have a short list of acceptable links which you will allow, maybe a switch/case structure would be easier? Based on an Array of suitable values?Regular expressions can very resource intensive on the server.


Erdemir

If you have a short list of acceptable links which you will allow, maybe a switch/case structure would be easier? Based on an Array of suitable values?
Regular expressions can very resource intensive on the server.

Ok, my allowed links array is here
$allowedlinks = array ("google.com", "Xisto.com", "yahoo.com", "dmoz.org");
Now, how can we integrate switch/case with preg_replace or without preg_replace?

Thanks...

galexcd

I know I said I wasn't good at regular expressions, however I've been poking around with your problem for the past hour and found something that may work:

<a href=["']?((?!((http://(www\.)?wikipedia\.org)|(http://(www\.)?google\.com)|(http://(www\.)?Xisto\.com))).)+["']?>(.*?)</a>

This should match all links that aren't wikipedia.org, google.com and Xisto.com.

Erdemir

I know I said I wasn't good at regular expressions, however I've been poking around with your problem for the past hour and found something that may work:

<a href=["']?((?!((http://(www\.)?wikipedia\.org)|(http://(www\.)?google\.com)|(http://(www\.)?Xisto\.com))).)+["']?>(.*?)</a>

This should match all links that aren't wikipedia.org, google.com and Xisto.com.
Sorry, I couldn't use that directly in php. But your code was too helpful. I will try to use it in preg_replace(). Thanks.
Any more suggestions?

jlhaslip

A small note found on the php.net manual pages for the preg_match function:

Tip
Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.



galexcd

A small note found on the php.net manual pages for the preg_match function:

Interesting find... However, I believe he wishes to replace the bad links with "No Links" or something similar. I think as you said previously the regexp engine is pretty resource heavy on the server and takes a bit longer to process. Perhaps a non regular expression method would be the best way. I might see if I can whip one up for you, however until then I think we still need to hear from one of our regular expression experts... *cough* rvalkass *cough*

truefusion

The following should get the results you desire:

<?php$allowedlinks = array("google.com", "Xisto.com", "yahoo.com", "dmoz.org");function site_filter($matches){ global $allowedlinks; if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?$/", $matches[1])){ return "Link not allowed."; } else { return $matches[0]; }}$variable='Some texts <a href="https://www.google.de/?gfe_rd=cr&ei=BwkjVKfAD8uH8QfckIGgCQ&gws_rd=ssl href="https://www.microsoft.com/de-de; ';$variable = preg_replace_callback("/<a href=[\"']([^<]+)[\"']>[^<]+<\/a>/", "site_filter", $variable);echo $variable;?>
Would have been easier to work with if each anchor element was on its own line.

Erdemir

The following should get the results you desire:

<?php$allowedlinks = array("google.com", "Xisto.com", "yahoo.com", "dmoz.org");function site_filter($matches){ global $allowedlinks; if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?$/", $matches[1])){ return "Link not allowed."; } else { return $matches[0]; }}$variable='Some texts <a href="www.google.de/?gfe_rd=cr&ei=BwkjVKfAD8uH8QfckIGgCQ&gws_rd=ssl href="microsoft.com/de-de/default.aspx; ';$variable = preg_replace_callback("/<a href=[\"']([^<]+)[\"']>[^<]+<\/a>/", "site_filter", $variable);echo $variable;?>
Would have been easier to work with if each anchor element was on its own line.
Thank you very much TrueFusion. I tried your code and worked good. However I tried for this variable;

$variable='Some texts <a href="google.com/subdirector/level2/page3.html href="microsoft.com/sub/win/page5.html; ';
and that function disallowed (didn't allowed) all links and wrote "No links allowed" for each links. This means the code isn't working if the allowed domain link has a subdirectories.

 

So I tried to edit your code and changed your 8th line and I replaced ?$/ to ?([^\/]+)/i

 

This is replaced:

if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?$/", $matches[1])){

Replaced to:
if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?([^\/]+)/i", $matches[1])){
By replacing this now working even the link has a subdirectory.

 

 

//Edit: I have developed the code and the latest is here, no mistakes detected yet, blocking unwanted sites:

$allowedlinks = array("google.com", "Xisto.com", "yahoo.com", "wikipedia.org");function site_filter($matches){ global $allowedlinks; if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?([^\/]+)/i", $matches[2])){ return " Link not allowed. "; } else { return $matches[0]; }}$variable='Some texts <a href="google.com/subdirectory/level2/page3.html&%2334; target="_blank">Google</a><br><a href="microsoft.com/sub/win/page5.html;<a href="http://subdomain.killwithme.com" target="_blank"><font color="Red"><u>Inside</u></font><font color="Blue"><u>tags are no problem now</u></font></a><br><a href="mailto:try@try.com" target="_blank">email</a>';$variable = preg_replace_callback("!<a[^>]*(http|www|mailto)(.*)</a>!siU", "site_filter", $variable);echo $variable;

Thanks again TrueFusion, your code is very good. Good job...


VIEW DESKTOP VERSION REGISTERGET FREE HOSTING

Xisto.com offers Free Web Hosting to its Members for their participation in this Community. We moderate all content posted here but we cannot warrant full correctness of all content. While using this site, you agree to have read and accepted our terms of use, cookie and privacy policy. Copyright 2001-2019 by Xisto Corporation. All Rights Reserved.