Jump to content
xisto Community
Sign in to follow this  
jglw22

Excerpt From Final Year Project On Searching The Web

Recommended Posts

Tell me if you think it makes good reading.The History of Searching the Web?The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. And we're a long, long way from that.? - Larry Page In this section I will be discussing the history of searching the Web up until the release of Google. This will include the preceding applicable technologies, the reasons as to why the Web revolutionized search on the whole and what search engines became popular and for what reason.Document RetrievalGerard Salton is often thought of as being the father of modern search technology. His teams at Harvard and Cornell developed the Salton?s Magic Automatic Retriever of Text (SMART) informational retrieval system in the 1960s. SMART included important concepts like the vector space model, Inverse Document Frequency (IDF), Term Frequency (TF), term discrimination values, and relevancy feedback mechanisms. In this context a term can mean a single word or a phrase.The vector space model is an algebraic model for representing text documents as vectors of terms. In this manner a document is represented by a vector in a number of dimensions. Each dimension corresponds to a separate term that appears in the document. From this relevancy ranking can be calculated by seeing how close the cosine of the angle between two document vectors is to one or zero. In the case of it being zero, the vectors are orthogonal and there is no match. Conversely in the case of it being one, this means exactly all the terms that appear in one document appear in the other. Inverse Document Frequency and Term Frequency are weightings used in deciding on the length of the vectors in the model. These weights are a statistical measure used to evaluate how important a word is to a document in a corpus (a collection of documents). The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. Term discrimination values is a similar method to TF and IDF identification but it is tailored toward retrieval. This methods weights relevance by looking at how dense an area is in a given vector space for a certain term. It is proposed by Salton that the more dense a given space is, the less efficient the retrieval process will be. The assumption that being able to find specific and relevant data is difficult follows from this, but is improved by searching in spaces where the term is relevant but the occurrence is sparse.Finally, relevancy feedback is the simple idea that once a query has returned a result the user may give explicit feedback on the result and this can be stored as a factor to improve searches for the same term. Gopher ProtocolThe first attempts to make the data on the Internet more accessible by providing information on its location were known as Gophers. A Gopher is a search and retrieval protocol used over the Internet in order to find distributed documents. They allowed server based text files to be hierarchically organized and easily viewed by using Gopher applications on remote client computers. These systems were popularized by the development of three systems known as Archie, Veronica and Jughead which allowed users to search across resources stored in Gopher file hierarchies on a global basis using the simple technology of using regular expressions to match file names. However, the growth in popularity began to lose steam with the rate at which the World Wide Web grew in the mid 1990s. The Hypertext Transfer Protocol (HTTP) and its browser Mosaic, the first browser, which was released in 1993, could deliver functionality above and beyond the Gopher and its applications.As We May ThinkOur ineptitude in getting at the required data is largely caused by the artificiality of the systems of indexing. Having found one item, moreover, one has to emerge from the system and re-enter on a new path. The human mind does not work this way. It operates by association.[] This was a point made in 1945 by a scientist urging his fellow scientist to work together to help build a body of knowledge for all mankind. This was perhaps the first time the underlying principle of Hypertext was discussed in the limelight of a scientific community. The first time the term Hypertext itself was used was surprisingly as early as 1963 by Ted Nelson during his work on the eventually failed Xanadu project.HTTP and WWWThe World Wide Web was created in 1989 by Sir Tim Berners-Lee, whilst working at CERN in Geneva, Switzerland. The resources that are available on the Web are retrieved by making a HTTP request from a client browser to a server. The HTTP request is made by typing HTTP:// followed by the IP address of the server. However, as it is impractical for users to remember specific IP addresses they usually type in a domain name as a URL, which is then sent to a domain name system (DNS), which is a distributed Internet database that will resolve the domain name to an IP address. The object that is returned is a file containing text written in a markup language that may be parsed by the browser and constructed into a viewable page. Most pages will contain Hyperlinks, which can be clicked to instantiate a new HTTP request to any server on the Web. With this interlinking came an implicit indication of relevance that eventually proved to be the driving force behind the development of new search methodologies for finding information on the Internet.Primitive Search EnginesWith the innovation of Hyperlinked documents came a new way to traverse the search space that is the World Wide Web. In June 1993, Matthew Gray introduced the World Wide Web Wanderer. This was a robot that made a HTTP request to a given start point then made a breadth first traversal to domains mined from that domain. The process continued indefinitely in this manner logging which servers were active. By the end of that year, three full fledged robot driven search engines had surfaced. These were Jumpstation, the World Wide Web Worm, and the Repository-Based Software Engineering Spider (RSBE). The first two of these used a simple linear search to gather URLs and associated header information. However, they did not implement any ranking scheme, whereas the RSBE spider did.DirectoriesA directory is essentially a collection of human organized favorites made available on line for people to use. The first of these directories began to crop up in 1994, the most noteworthy of which is Yahoo!. What set Yahoo! apart from other directories is that each entry came with a human compiled description. Once the popularity of Yahoo! exploded, they began to charge companies for inclusion. Four years later this concept was undermined by the launch of the Open Directory Project (also known as DMOZ). The principle of which was that it was free to add a website to it and the directory itself could be downloaded for anyone to use.WebCrawlerBrian Pinkerton of the University of Washington released WebCrawler on April 20th, 1994. It was the first crawler which indexed entire pages as well as being the first engine to provide full text search. Since a couple of corporate take overs in the last two decades it has become a meta search engine. The principle of a meta search engine is that quality of results are proportional to the size of the index an engine maintains. Therefore, rather than keeping its own index, it queries all the other popular engines and collates the results. LycosLycos was the next major search development. It was designed at Carnegie Mellon University around July of 1994 and was developed by Michale Mauldin. Lycos went public with a catalog of 54,000 documents. In addition to providing ranked relevance retrieval, Lycos provided prefix matching and word proximity bonuses. Lycos' main difference was the sheer size of its catalog. By August 1994, Lycos had identified 394,000 documents; by January 1995, the catalog had reached 1.5 million documents; and by November 1996, Lycos had indexed over 60 million documents, which was more than any other Web search engine at the time. Today Lycos has become an Web portal, which is basically a web page designed to be used as a home page that serves as an access point to information in a variety of ways. AltaVistaAltaVista, created by researches at the Digital Equipment Corporation, also debuted in July 1994 and with it brought many important features to the Web scene. At launch, the service had two innovations which set it ahead of the other search engines. It used a fast, multi-threaded crawler, called Scooter and an efficient back-end running on advanced hardware. As well as this they had nearly unlimited bandwidth (for that time) and were the first to allow inbound link checking, natural language queries and users to add or delete their own URL within 24 hours. AltaVista also provided numerous search tips and advanced search features.OvertureOverture was originally released under the name GoTo in 1998 by Bill Gross. Overture is thought to be the pioneer of paid search. Gross is quoted to have said, 'I realized that the true value of the Internet was in its accountability. Performance guarantees had to be the model for paying for media.' The main innovation, which is mirrored by the now giant Google, was pay per click advertising and sponsored results. While Overture was very successful, there were two major reasons that prevented them from taking Google's market position. Firstly, Bill Gross decided not to overgrow the Overture brand name because he feared that would cost him distribution partnerships. When AOL selected Google as an ad partner, in spite of Google massive brand strength, this became the nails in the coffin for Overture becoming a premiere search ad platform. Secondly, the advertising was nowhere nearly as well target as Google does it today.

Share this post


Link to post
Share on other sites
Privacy preserving queries on encrypted data Excerpt From Final Year Project On Searching The Web

hello, can u help me ?

my project title is "privacy preserving queries on encrypted data  " . I could not able to proceed further .. I just know , how to convert from plain text to cipher text .. But my main aim is to encrypt entire database using sql server .. Pls help me if u know about it 

thank u 

-question by ravindran

 

Share this post


Link to post
Share on other sites
ideas about website oriented final year projectsExcerpt From Final Year Project On Searching The WebThis is my final semestar.I have to do project. I have an idea of doing own project.There are four members in my team.We got a project in a company but we don't know how to proceed with that.The project is nothing but the "smart card function" particularly needed for that company attendence. They asked us to do with the kit. But our HOD adviced us it will be quit difficult so search another.So we are planning to do own project. It might be a website.So can anyone help us by giving a problem or any ideas about the website projects. I expect it soon.With regards,Ezhil-question by ezhilarasi

Share this post


Link to post
Share on other sites
Catalog Management SystemExcerpt From Final Year Project On Searching The Web

 Hello,

     I am now pursing Engineering final year,I have undergone with a mini project named Catalog Management System, it's a completely system used in big stores and departmental stores.It is used to control/maintains the activities happening in big super market.

 I have completely implemented the project but I don't know to prepare document report.Actually I don't know the  idea of proposed system how it is, and what features I have to add to future.

          I hope that you will help me out of this issue...

                                      Thanking You,

  with regards:

         Mohammed Aleem

Share this post


Link to post
Share on other sites
Proposal about Mechanical EngineeringExcerpt From Final Year Project On Searching The Web

Hello!

At first, I am very pleasant to joint with you. As you know, at the present I want to make a proposal for PhD degree about Mechanical Engineering and it was very difficult for me  think about idea  in my proposal. If  impposible, can u help me to write the proposal as soon as possible. Once again, thanks u for your help.

-reply by Do Thanh Nho

Share this post


Link to post
Share on other sites
Final years project Excerpt From Final Year Project On Searching The Web

I am a computer science student, and currently facing a problem, which is to think a final years project, and now I still no have any idea about the project title that I will doing on my final semester. Can anyone help me by share or think a new idea for me? Thank you

-reply by Ng Cian Hau

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.