rockarolla 0 Report post Posted February 13, 2008 Hi,I have made my web site solely stored in a SQL database...that will say if I need to load a page I take it our from the data base and then display it. My question is: is any searcg engine able to ``crawl'' into my web site content?I would appreciate some info so that I can change the way my web works. Share this post Link to post Share on other sites
Sten 0 Report post Posted February 13, 2008 Provided this is what you mean, yes.So I gather you've stored everything in a database and using PHP to get a part of it and show it, just like a CMS?PHP is server side, it does everything on the server and then displays it to the user as normal HTML, which if a search engine finds, it will show. If you're not using search engine friendly URLs though then it probably wouldn't find it since it's only picking up the index.php page. Share this post Link to post Share on other sites
yordan 10 Report post Posted February 13, 2008 By the way, pages needing passwords cannot be crawled, because the robots cannot guess your passwords.So, pages you protected or scripts needing you to login before entering make the things not searchable by robots. Share this post Link to post Share on other sites
rockarolla 0 Report post Posted February 14, 2008 By the way, pages needing passwords cannot be crawled, because the robots cannot guess your passwords.So, pages you protected or scripts needing you to login before entering make the things not searchable by robots. Thats my point. The database table has an user/pass so it might not be accessible for a bot or it might be...I think they ``crawl'' a particular site by ``virtually opening'' it and then check the content...at least this is my experience...as I really need to know this I'm making some crawling around too to become an expert... see how if I'll share it ...here are the advises given by G:https://support.google.com/webmasters/?answer=40349 Share this post Link to post Share on other sites
yordan 10 Report post Posted February 14, 2008 Hi, RockaRolla,When I do crawling for fun, it simply opens the php scripts, so I get files named somehthing.php down to my PC, nothing more. For instance, in 4gallery photo galleries, I don't have the pictures. However, I have the php scripts, so I can re-create the website. But of course I cannot connect to the database (for instance at asta it's your database so, even guessing the passwords I could not connect) so I cannot reach dynamic pages which are displayed using a particular user's rights and name and passwords. Share this post Link to post Share on other sites
Quatrux 4 Report post Posted February 14, 2008 Well, if your pages are dynamic, it would be very good for you to send the right headers to the bot/crawler and even to the browser, due to when apache or as I know any other http server sends a simple html file with the right content length and etc. but when a PHP file is generated/parsed apache can't really know the length of it, but it can be done with some php output buffering and sending the header with php, I think google is full of these kind of suggestions. Share this post Link to post Share on other sites
Athlon1600 0 Report post Posted August 7, 2008 that's what I thought too at first don't worry about it, every extension is indexable to some point.here is the useful tool:http://tools.seochat.com/tools/search-spider-simulator/enter your websites url and you will see your page exactly as google search engine spider would see. Share this post Link to post Share on other sites
FirefoxRocks 0 Report post Posted August 7, 2008 My thought on this is that pages needing POST form data (user logins, not database logins) are not accessible by search engines. This is because robots cannot guess the input required to access the content (passwords, email addresses, etc). If you have links to content outputted by PHP from a database, it should be searchable. Share this post Link to post Share on other sites
toby 0 Report post Posted August 7, 2008 You can tell google a user/pass, and sitemaps, but generally what it can't find a link to, it won't index. Share this post Link to post Share on other sites
rnd-am 0 Report post Posted February 7, 2009 Neither Google, nor any other search engine will not crawl ( and, hence, index) pages, that hadn't link on it on other, previously crawled, pages. So you can have entire universe in your DB, but if there is no link to those particular generated pages, then those pages are invisible for SE.BTW ordinar forums are frequently set to settings which allow to browse pages even for not loggid in user, i.e. crawler, indexing robot. Share this post Link to post Share on other sites
Ahsaniqbalkmc 0 Report post Posted February 10, 2011 By the way, pages needing passwords cannot be crawled, because the robots cannot guess your passwords.So, pages you protected or scripts needing you to login before entering make the things not searchable by robots.Websites designed on CMS like wordpress also have a password on them I mean the databases always have passwords on them and yet they are crawled by search engine bots. I don't get the concept of password protected pages not being crawled by search engines because on the internet today, almost everything is protected by password. Can anyone help me get the concept behind this.Thankyou in advance. Share this post Link to post Share on other sites