Jump to content
xisto Community
Sign in to follow this  
marijnnn

Making Your Own Search Engine

Recommended Posts

well, haven't seen too much topics about se technology in this forum, however it's named that way :Pbut here we go:for school, i had to make a search engine. i got lucky because it was in asp.net on a windows machine, so i could just use the indexing server included in windows. you have extra filters for pdf files and it searches through doc, xls,...but let's say i wasn't that lucky and had to do it in php on a linux server... how could one do it? i mean: a little search bar that returns wether a file containts the searchwords or not?my idea was to open every file periodicaly and make a database, but that would use a lot of space. and what to do with database-driven sites, that get their site-content from the db?having all your pages crawled every time a search is executed seems a bad idea too...i'm out of inspiration :P how would you do it? got any good scripts?

Share this post


Link to post
Share on other sites

Is it possible anyhow by using a combination of the "locate/slocate" command with grep ??? Might be worth a try.. locate & slocate do exactly what the indexing service in windows does... it creates a db of filenames/folders using the command "updatedb" and helps you find out files... and grep of course, is a regular pattern/expression matcher for files.. umm... I don't see any reason why it can't be done...though I'm not sure how. I'm pretty new to php myself.

Share this post


Link to post
Share on other sites

silly me, should've thought of that myself. i'll look into how one can search in the database :Pwonder if you can do it on a shared server. i think the database would contain documents of all hosted sites :sanyway, it's a nice idea, i'll have to look into it one day soon.

Share this post


Link to post
Share on other sites

silly me, should've thought of that myself. i'll look into how one can search in the database :P

wonder if you can do it on a shared server. i think the database would contain documents of all hosted sites :s

 

anyway, it's a nice idea, i'll have to look into it one day soon.

<{POST_SNAPBACK}>


Yup.. the locate db should ideally contain a complete list of files of all the hosted sites.. even though a normal user wouldn't be able to get the complete listing using locate... maybe if you create a user with locate executtion privileges and cut out the rest of the system privs - it could be run on a shared hosting system... but then again i dont know if any admin would be ready to do it.. it's almost a gaping security hole that you're setting up by opening the whole directory structure to the world.

 

Think you should be able to come up with some workaround though..

Share this post


Link to post
Share on other sites

I agree I wouldn't bother with it, somethings are hard to pursue, if tyou mean this for a small internal search engine for a website, i am pretty sure their are ones that exist pre-made or you can use google search, as long as google has added you to their engine. other search engines like yahoo and whatnot may also give you that ability like google. :)

Share this post


Link to post
Share on other sites

Yah, there are also other free search engines that you can get for your site, like:

http://www.freefind.com/

The catch is that they put ads on the results page.

Also, I think that if you want to design a serach engine that indexes the world's website, its very tough because your database would be so large that the costs of running it is very high.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.