mamer 0 Report post Posted January 27, 2013 I was impressed by the way Google suggests alternatives for my search key words and always wondered how do they do that. Now I have a general idea how they do it. Recently I came across that function in PHP code that checks the similarity of 2 strings based on what's calld Levenshtein Distance algorithm. Well, the Levenshtein Distance algorithm will check the sequence of characters in your original string and compares it against a given string to calculate the gap in character sequence and tell you how close or different the two strings are. One might ask, why on earth should I care about that. Well, if you're evaluating a user input in a form or getting an answer to a given question and you would like to count for misspelling without rejecting the input totally you would definitely use that algorithm. For more information about that see the Wikipedia Also see it in the PHP manual Share this post Link to post Share on other sites
k_nitin_r 8 Report post Posted January 27, 2013 Mamer,Good to see more posts from you on the forum.While the Levenshtein algorithm is useful in suggesting corrections, I doubt Google uses the Levenshtein algorithm as-is for determining the nearest match. Give the Levenshtein algorithm a try, and you will see that Google 'qualifies' (for the lack of a better word) each of the words that are supposed to be a close match before finally suggesting the one that is close to what you intended. I am guessing that they may be establishing context through geographical location, past searches, trending search terms, e-mails from GMail, chat history from GoogleTalk, and possibly social networking information from Google+.The Levenshtein algorithm is definitely one of the simpler algorithms out there that one can implement, but more complex algorithms exist that are based on linguistics. Think along the lines that although we may drive Lexus or Mercedes cars, there exist Rolls Royce vehicles that we would not even bother considering for a purchase because of the price tag. It's one of those things... after I created a program to read the bitmap format, but when I thought about the MPEG format, everyone I asked would say, MPEG decoding is not something you would write as an individual because of the effort involved in coding it - it is rather something that you would buy as a library or use one of the free open source alternatives. Share this post Link to post Share on other sites
mamer 0 Report post Posted January 27, 2013 Thank you for the reply. I agree with you about what google is using. It couldn't be just Levenshtein. I'm really interested in any algorithm that enables me to evaluate students' responces witout rejecting them just based on typo's. Do you think Regular Expression is a good solution for that? Share this post Link to post Share on other sites
k_nitin_r 8 Report post Posted February 3, 2013 Mamer, Using regular expressions is actually not better than using Levenshtein because of the overhead involved. There is a lot more computation involved when using regular expressions. It is, however, possible to use regular expressions for validation of things that have to be in a specific format, and then use levenshtein's algorithm to match against possible values if the format is invalid. It has very specific uses, but if there's a scenario that warrants it, it's the combination of the two will serve the purpose well. Share this post Link to post Share on other sites