Jump to content
xisto Community
BCD

How To Control Search Engine Indexing For A Cms Based Site

Recommended Posts

How can one restrict the type of urls being indexed by search engines like Google for a Content Management System like Drupal?

In our site, google is indexing links like these:

http://example.com/forum/unanswered?forum=23http://example.com/forum/active?order=totalcount&sort=aschttp://example.com/user/login?destination=forum%2F18
I am amazed about this. Why is google indexing such type of links when there are valid clean urls for each type of forums and pages. The site is running on Drupal CMS. For seo urls two modules pathauto and path redirect modules are being used. Path aliases module is doing its job fine by creating friendly urls as and when new content is created but still doesnt get indexed. In some cases where the clean urls are indexed, still they appear far below in the search results. Has anyone have experience dealing with such type of issues? Any help would be greatly appreciated.
Edited by BCD (see edit history)

Share this post


Link to post
Share on other sites

I suggest you to use a robots.txt file to restrict or allow access to parts of your site, also you could use meta tags to control your robots like google.For google you cwan use a sitemap, google offers google sitemaps, you can show links you want that would be indexed, there are plenty of modules for different forums and cms to generate those sitemaps automatically, just google it.Google sitemaps might be called differently like google developer tools or something like that, I think they merged it.

Edited by Quatrux (see edit history)

Share this post


Link to post
Share on other sites

Thanks for the reply. I am trying to exclude certain url types using robots.txt disallow method. Since there are a lot of different types of urls being generated the only way is to manually add each type as and when it is found.

But I am unable to figure how to disable url of type like "/forum/unanswered?forum=3". Since I can not bock the word "forum" directory path, I need a way so that the it blocks only the urls starting with "unanswered?[something]". I checked out the documentation on robotstxt, but could not find a way to do so. Are there any alternatives?

Now I also added the sitemap module for Drupal called as XMLsitemap. I will wait for a few days to see the results.

Share this post


Link to post
Share on other sites

If those links are still being indexed then there are two possibilities - either those links are still scattered around your site (especially the forums) and need replacing with new ones, or Google just hasn't scanned your site recently and picked up the new links. As a matter of interest, what action is being taken when one of the 'old' URLs is visited? Is the page served normally or is the browser sent a permanent redirection header? Sending a permanent redirection to the new style URL means that all members will only see the new URLs, and Google will cache the new URLs rather than the old ones. It also eliminates penalties for apparent duplicate content, and the old URL will be ignored by the search engines.

Share this post


Link to post
Share on other sites

http://example.com/forum/unanswered?forum=23http://example.com/forum/active?order=totalcount&sort=aschttp://example.com/user/login?destination=forum%2F18

The first and second type of links show drupal page not found error and the third one access denied error even when authenticated. As you tell, may be they will be replaced by the friendly links on the next search indexing.

 

Now I also found urls which are working and are friendly urls generated by pathauto like:

/forums/introductions?order=title&sort=asc

Why should google take pain in indexing the urls with those ? and = included? Why wouldn't it just index the proper url like "/forums/introductions". Is it a the flaw or the way Drupal works? I am worried about these things, because the site is still a new one, and once the content starts to roll in I would want to see the site appearing beautifully in the search results.

 

Update:

I found that the path auto module and aliases module does not generate friendly urls for sorting urls like "unanswered" "active" "order". These type of urls are taking priority adn appearing at the top compared to friendly urls.

Edited by BCD (see edit history)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.