Jump to content
xisto Community
Sign in to follow this  
CarlowOlson

How To Use Regular Expressions To Block Websites?

Recommended Posts

Hello Friends,

 

Regular expressions use special characters, or meta characters, to describe possible combinations of text in strings. I know that much only. I don't know how to use regular expression to block website. If anyone having any knowledge regarding this topic, please share with me. I am waiting for your reply.

 

Thanks in advance

Carlow Olson

Share this post


Link to post
Share on other sites

Hello Friends,

 

Regular expressions use special characters, or metacharacters, to describe possible combinations of text in strings. For example, you could create a regular expression that matches all strings that begin with the letter "d" and end with the file extension ".avi." You can use regular expressions with a security appliance device, intrusion protection system or other filtering proxy to block incoming or outgoing website URLs according to conditions that you specify.

 

1. Determine which type of website or file URL that you want to block. Some network administrators block EXE and BIN files in order to keep unwanted programs from installing on machines in the network. You can also choose to block any page from a site domain such as social networking or file sharing site.

 

2. Organize your regular expression(s) using metacharacters. Metacharacters provide instructions on how to sort the letters in the URL string. The "." serves as a replacement for a single character; "c.t" will match "cat," "cut" and "cot" or any string containing "c_t" such as "acute."

 

The "|" operates as a logical "or" and parenthesis separate a substring from the rest of the regular expression. The "*" after a character or substring means that a string containing zero or more instances of the section will match while a question mark indicates that a string with zero or one case of the given substring will match.

 

Use square brackets to denote a set of acceptable characters; [a-z] matches any lowercase letter. The "^" is used to show the start of a line.

 

3. Create your final regular expression. The regular expression ".youtube.com" will block any YouTube website address. The regular expression ".*.([Dd][Oo][Cc]|[Xx][Ll][ss]|[Pp][Pp][Tt])" will block any website address ending with ".doc," ".xls" or ".ppt" and block the download or opening of these files from a web browser. The regular expression ".*.[bin|exe]" will block any Windows executables ending in ".bin" or ".exe." Use these regular expressions as a blueprint to create any regular expressions you need.

 

4. Edit your settings to add a filtering rule for each regular expression. Procedures differ for each device or system but follow the same basic process. Cisco and H3C are two of the major manufacturers of intrusion protection systems. To add the regular expression to a Cisco device, click "Configuration" from the menu bar of the software, click "Firewall," then "Objects" then "Regular Expressions." Click "Add" on the right side of the pop-up box and enter a name for the regular expression rule and then the expression itself. Click "OK."

 

For an H3C device, click "URL Filtering" in the navigation tree and click "URL Policies." Click "Add" and enter the name for the filtering rule. Click "User-Defined URL Rule" and click "Add." Select the "By regular expression" radio button under "Domain Name Filtering" or "URL Filtering." Enter the regular expression in the box and click "Apply." Click "Apply" again to save changes.

 

Check the manufacturer's guide for your system to determine the exact process because it is different for each model, although it will likely follow the basic process that Cisco and H3C use.

 

Thanks and Regards

Tony Mccallum

Edited by jlhaslip
add quote tags (see edit history)

Share this post


Link to post
Share on other sites

Regular Expressions are part of most of the programming languages today.

 

Regex is usually something that was started by Perl.

 

Most of the Programming languages have tried to embrace Perl regex style.. as much as possible.

 

For blocking sites based on REGEX - you have 2 options :-

 

1. Web Server level Blocking

 

In apache (http server) you have mod_rewrite to do just the thing for you.

 

If you are using Apache Server - you must edit your .htaccess file in your website root OR www folder and put the following instructions in it.


If you have a virtual Server - you may put the following instructions inside the httpd.conf file also.


Block traffic from a single referrer:

RewriteEngine on

# Options +FollowSymlinks

RewriteCond %{HTTP_REFERER} badsite.com [NC]

RewriteRule .* - [F]

Block traffic from multiple referrers

RewriteEngine on

# Options +FollowSymlinks

RewriteCond %{HTTP_REFERER} badsite.com [NC,OR]

RewriteCond %{HTTP_REFERER} anotherbadsite.com

RewriteRule .* - [F]

 


2. Web Application level Block

RESOURCE INTENSIVE - for small traffic only.

 

Inside your Program, you analyse the HTTP header to take decision eg: check HTTP_REFERRER and block.

 

In both the above cases, a type of Regex would be involved for matching.

Talking in PHP, If you put the following code at the top of your PHP index file, it will do the trick for you.

<?php

$referrer = $_SERVER['HTTP_REFERER'];

if (preg_match("/SITE_I_DO_NOT_LIKE.com/i",$referrer)) {

// you can show him a message and end the story

// exit('sorry-you cannot see my website');

 

// or send him to another website or webpage

header('Location: http://forums.xisto.com/no_longer_exists/');

}

// Otherwise, your website loads as normal

// rest of your website

// code goes here....

?>


Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.