Jump to content
xisto Community
rockarolla

Is A Php File Searchable?

Recommended Posts

Hi,I have made my web site solely stored in a SQL database...that will say if I need to load a page I take it our from the data base and then display it. My question is: is any searcg engine able to ``crawl'' into my web site content?I would appreciate some info so that I can change the way my web works.

Share this post


Link to post
Share on other sites

Provided this is what you mean, yes.So I gather you've stored everything in a database and using PHP to get a part of it and show it, just like a CMS?PHP is server side, it does everything on the server and then displays it to the user as normal HTML, which if a search engine finds, it will show. If you're not using search engine friendly URLs though then it probably wouldn't find it since it's only picking up the index.php page.

Share this post


Link to post
Share on other sites

By the way, pages needing passwords cannot be crawled, because the robots cannot guess your passwords.So, pages you protected or scripts needing you to login before entering make the things not searchable by robots.

Share this post


Link to post
Share on other sites

By the way, pages needing passwords cannot be crawled, because the robots cannot guess your passwords.So, pages you protected or scripts needing you to login before entering make the things not searchable by robots.


Thats my point. The database table has an user/pass so it might not be accessible for a bot or it might be...I think they ``crawl'' a particular site by ``virtually opening'' it and then check the content...at least this is my experience...as I really need to know this I'm making some crawling around too to become an expert... ;) see how if I'll share it ...

here are the advises given by G:

https://support.google.com/webmasters/?answer=40349

Share this post


Link to post
Share on other sites

Hi, RockaRolla,When I do crawling for fun, it simply opens the php scripts, so I get files named somehthing.php down to my PC, nothing more. For instance, in 4gallery photo galleries, I don't have the pictures. However, I have the php scripts, so I can re-create the website. But of course I cannot connect to the database (for instance at asta it's your database so, even guessing the passwords I could not connect) so I cannot reach dynamic pages which are displayed using a particular user's rights and name and passwords.

Share this post


Link to post
Share on other sites

Well, if your pages are dynamic, it would be very good for you to send the right headers to the bot/crawler and even to the browser, due to when apache or as I know any other http server sends a simple html file with the right content length and etc. but when a PHP file is generated/parsed apache can't really know the length of it, but it can be done with some php output buffering and sending the header with php, I think google is full of these kind of suggestions. ;)

Share this post


Link to post
Share on other sites

My thought on this is that pages needing POST form data (user logins, not database logins) are not accessible by search engines. This is because robots cannot guess the input required to access the content (passwords, email addresses, etc). If you have links to content outputted by PHP from a database, it should be searchable.

Share this post


Link to post
Share on other sites

Neither Google,  nor any other search engine will not crawl ( and, hence, index) pages, that hadn't link on it on other, previously crawled, pages. So you can have entire universe in your DB, but if there is no link to those particular generated pages, then those pages are invisible for SE.BTW ordinar forums are frequently set to settings which allow to browse pages even for not loggid in user, i.e. crawler, indexing robot.

Share this post


Link to post
Share on other sites

By the way, pages needing passwords cannot be crawled, because the robots cannot guess your passwords.So, pages you protected or scripts needing you to login before entering make the things not searchable by robots.

Websites designed on CMS like wordpress also have a password on them I mean the databases always have passwords on them and yet they are crawled by search engine bots. I don't get the concept of password protected pages not being crawled by search engines because on the internet today, almost everything is protected by password.
Can anyone help me get the concept behind this.
Thankyou in advance.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.