Jump to content
xisto Community
Erdemir

Need Help About Regular Expressions

Recommended Posts

Hi,

In my website, I don't want to allow writing links to another sites but except some sites.

//This is the text which is sent by the guest$variable='Some texts <a href="https://www.google.de/?gfe_rd=cr&ei=BwkjVKfAD8uH8QfckIGgCQ&gws_rd=ssl href="https://www.microsoft.com/de-de; '; //And the following line replaces all <a html tags to "no links allowed" text.$variable = preg_replace("!<a[^>]*(http|www|mailto)(.*)</a>!siU", "no links allowed", $variable);echo($variable);

I want to allow only a few sites: google.com, Xisto.com, yahoo.com, etc.

I want to disallow microsoft.com, hotmail.com and any other sites.

What are your suggestions?

What regular expression should I use? or other opinions?

Edited by Erdemir (see edit history)

Share this post


Link to post
Share on other sites

If you have a short list of acceptable links which you will allow, maybe a switch/case structure would be easier? Based on an Array of suitable values?Regular expressions can very resource intensive on the server.

Share this post


Link to post
Share on other sites

If you have a short list of acceptable links which you will allow, maybe a switch/case structure would be easier? Based on an Array of suitable values?
Regular expressions can very resource intensive on the server.

Ok, my allowed links array is here
$allowedlinks = array ("google.com", "Xisto.com", "yahoo.com", "dmoz.org");
Now, how can we integrate switch/case with preg_replace or without preg_replace?

Thanks...
Edited by Erdemir (see edit history)

Share this post


Link to post
Share on other sites

I know I said I wasn't good at regular expressions, however I've been poking around with your problem for the past hour and found something that may work:

<a href=["']?((?!((http://(www\.)?wikipedia\.org)|(http://(www\.)?google\.com)|(http://(www\.)?Xisto\.com))).)+["']?>(.*?)</a>

This should match all links that aren't wikipedia.org, google.com and Xisto.com.

Share this post


Link to post
Share on other sites

I know I said I wasn't good at regular expressions, however I've been poking around with your problem for the past hour and found something that may work:

<a href=["']?((?!((http://(www\.)?wikipedia\.org)|(http://(www\.)?google\.com)|(http://(www\.)?Xisto\.com))).)+["']?>(.*?)</a>

This should match all links that aren't wikipedia.org, google.com and Xisto.com.
Sorry, I couldn't use that directly in php. But your code was too helpful. I will try to use it in preg_replace(). Thanks.
Any more suggestions?

Share this post


Link to post
Share on other sites

A small note found on the php.net manual pages for the preg_match function:

Tip
Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.


Share this post


Link to post
Share on other sites

A small note found on the php.net manual pages for the preg_match function:

Interesting find... However, I believe he wishes to replace the bad links with "No Links" or something similar. I think as you said previously the regexp engine is pretty resource heavy on the server and takes a bit longer to process. Perhaps a non regular expression method would be the best way. I might see if I can whip one up for you, however until then I think we still need to hear from one of our regular expression experts... *cough* rvalkass *cough*

Share this post


Link to post
Share on other sites

The following should get the results you desire:

<?php$allowedlinks = array("google.com", "Xisto.com", "yahoo.com", "dmoz.org");function site_filter($matches){	global $allowedlinks;	if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?$/", $matches[1])){		return "Link not allowed.";	} else {		return $matches[0];	}}$variable='Some texts <a href="https://www.google.de/?gfe_rd=cr&ei=BwkjVKfAD8uH8QfckIGgCQ&gws_rd=ssl href="https://www.microsoft.com/de-de; ';$variable = preg_replace_callback("/<a href=[\"']([^<]+)[\"']>[^<]+<\/a>/", "site_filter", $variable);echo $variable;?>
Would have been easier to work with if each anchor element was on its own line.

Share this post


Link to post
Share on other sites

The following should get the results you desire:

<?php$allowedlinks = array("google.com", "Xisto.com", "yahoo.com", "dmoz.org");function site_filter($matches){	global $allowedlinks;	if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?$/", $matches[1])){		return "Link not allowed.";	} else {		return $matches[0];	}}$variable='Some texts <a href="www.google.de/?gfe_rd=cr&ei=BwkjVKfAD8uH8QfckIGgCQ&gws_rd=ssl href="microsoft.com/de-de/default.aspx; ';$variable = preg_replace_callback("/<a href=[\"']([^<]+)[\"']>[^<]+<\/a>/", "site_filter", $variable);echo $variable;?>
Would have been easier to work with if each anchor element was on its own line.
Thank you very much TrueFusion. I tried your code and worked good. However I tried for this variable;

$variable='Some texts <a href="google.com/subdirector/level2/page3.html href="microsoft.com/sub/win/page5.html; ';
and that function disallowed (didn't allowed) all links and wrote "No links allowed" for each links. This means the code isn't working if the allowed domain link has a subdirectories.

 

So I tried to edit your code and changed your 8th line and I replaced ?$/ to ?([^\/]+)/i

 

This is replaced:

if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?$/", $matches[1])){

Replaced to:
if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?([^\/]+)/i", $matches[1])){
By replacing this now working even the link has a subdirectory.

 

 

//Edit: I have developed the code and the latest is here, no mistakes detected yet, blocking unwanted sites:

$allowedlinks = array("google.com", "Xisto.com", "yahoo.com", "wikipedia.org");function site_filter($matches){	global $allowedlinks;	if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?([^\/]+)/i", $matches[2])){		return " Link not allowed. ";	} else {		return $matches[0];	}}$variable='Some texts <a href="google.com/subdirectory/level2/page3.html&%2334; target="_blank">Google</a><br><a href="microsoft.com/sub/win/page5.html;<a href="http://subdomain.killwithme.com" target="_blank"><font color="Red"><u>Inside</u></font><font color="Blue"><u>tags are no problem now</u></font></a><br><a href="mailto:try@try.com" target="_blank">email</a>';$variable = preg_replace_callback("!<a[^>]*(http|www|mailto)(.*)</a>!siU", "site_filter", $variable);echo $variable;

Thanks again TrueFusion, your code is very good. Good job...
Edited by Erdemir (see edit history)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.