BCD 1 Report post Posted October 31, 2009 (edited) How can one restrict the type of urls being indexed by search engines like Google for a Content Management System like Drupal?In our site, google is indexing links like these: http://example.com/forum/unanswered?forum=23http://example.com/forum/active?order=totalcount&sort=aschttp://example.com/user/login?destination=forum%2F18I am amazed about this. Why is google indexing such type of links when there are valid clean urls for each type of forums and pages. The site is running on Drupal CMS. For seo urls two modules pathauto and path redirect modules are being used. Path aliases module is doing its job fine by creating friendly urls as and when new content is created but still doesnt get indexed. In some cases where the clean urls are indexed, still they appear far below in the search results. Has anyone have experience dealing with such type of issues? Any help would be greatly appreciated. Edited October 31, 2009 by BCD (see edit history) Share this post Link to post Share on other sites
Quatrux 4 Report post Posted October 31, 2009 (edited) I suggest you to use a robots.txt file to restrict or allow access to parts of your site, also you could use meta tags to control your robots like google.For google you cwan use a sitemap, google offers google sitemaps, you can show links you want that would be indexed, there are plenty of modules for different forums and cms to generate those sitemaps automatically, just google it.Google sitemaps might be called differently like google developer tools or something like that, I think they merged it. Edited October 31, 2009 by Quatrux (see edit history) Share this post Link to post Share on other sites
BCD 1 Report post Posted October 31, 2009 Thanks for the reply. I am trying to exclude certain url types using robots.txt disallow method. Since there are a lot of different types of urls being generated the only way is to manually add each type as and when it is found. But I am unable to figure how to disable url of type like "/forum/unanswered?forum=3". Since I can not bock the word "forum" directory path, I need a way so that the it blocks only the urls starting with "unanswered?[something]". I checked out the documentation on robotstxt, but could not find a way to do so. Are there any alternatives?Now I also added the sitemap module for Drupal called as XMLsitemap. I will wait for a few days to see the results. Share this post Link to post Share on other sites
rvalkass 5 Report post Posted October 31, 2009 If those links are still being indexed then there are two possibilities - either those links are still scattered around your site (especially the forums) and need replacing with new ones, or Google just hasn't scanned your site recently and picked up the new links. As a matter of interest, what action is being taken when one of the 'old' URLs is visited? Is the page served normally or is the browser sent a permanent redirection header? Sending a permanent redirection to the new style URL means that all members will only see the new URLs, and Google will cache the new URLs rather than the old ones. It also eliminates penalties for apparent duplicate content, and the old URL will be ignored by the search engines. Share this post Link to post Share on other sites
BCD 1 Report post Posted October 31, 2009 (edited) http://example.com/forum/unanswered?forum=23http://example.com/forum/active?order=totalcount&sort=aschttp://example.com/user/login?destination=forum%2F18 The first and second type of links show drupal page not found error and the third one access denied error even when authenticated. As you tell, may be they will be replaced by the friendly links on the next search indexing. Now I also found urls which are working and are friendly urls generated by pathauto like: /forums/introductions?order=title&sort=ascWhy should google take pain in indexing the urls with those ? and = included? Why wouldn't it just index the proper url like "/forums/introductions". Is it a the flaw or the way Drupal works? I am worried about these things, because the site is still a new one, and once the content starts to roll in I would want to see the site appearing beautifully in the search results. Update: I found that the path auto module and aliases module does not generate friendly urls for sorting urls like "unanswered" "active" "order". These type of urls are taking priority adn appearing at the top compared to friendly urls. Edited October 31, 2009 by BCD (see edit history) Share this post Link to post Share on other sites