Jump to content
xisto Community
mamer

What Is Levenshtein

Recommended Posts

I was impressed by the way Google suggests alternatives for my search key words and always wondered how do they do that. Now I have a general idea how they do it.

 

Recently I came across that function in PHP code that checks the similarity of 2 strings based on what's calld Levenshtein Distance algorithm.

 

Well, the Levenshtein Distance algorithm will check the sequence of characters in your original string and compares it against a given string to calculate the gap in character sequence and tell you how close or different the two strings are.

 

One might ask, why on earth should I care about that. Well, if you're evaluating a user input in a form or getting an answer to a given question and you would like to count for misspelling without rejecting the input totally you would definitely use that algorithm.

 

For more information about that see the Wikipedia

 

Also see it in the PHP manual

Share this post


Link to post
Share on other sites

Mamer,Good to see more posts from you on the forum.While the Levenshtein algorithm is useful in suggesting corrections, I doubt Google uses the Levenshtein algorithm as-is for determining the nearest match. Give the Levenshtein algorithm a try, and you will see that Google 'qualifies' (for the lack of a better word) each of the words that are supposed to be a close match before finally suggesting the one that is close to what you intended. I am guessing that they may be establishing context through geographical location, past searches, trending search terms, e-mails from GMail, chat history from GoogleTalk, and possibly social networking information from Google+.The Levenshtein algorithm is definitely one of the simpler algorithms out there that one can implement, but more complex algorithms exist that are based on linguistics. Think along the lines that although we may drive Lexus or Mercedes cars, there exist Rolls Royce vehicles that we would not even bother considering for a purchase because of the price tag. It's one of those things... after I created a program to read the bitmap format, but when I thought about the MPEG format, everyone I asked would say, MPEG decoding is not something you would write as an individual because of the effort involved in coding it - it is rather something that you would buy as a library or use one of the free open source alternatives.

Share this post


Link to post
Share on other sites

Thank you for the reply.

 

I agree with you about what google is using. It couldn't be just Levenshtein.

 

I'm really interested in any algorithm that enables me to evaluate students' responces witout rejecting them just based on typo's.

 

Do you think Regular Expression is a good solution for that?

Share this post


Link to post
Share on other sites

Mamer,

 

Using regular expressions is actually not better than using Levenshtein because of the overhead involved. There is a lot more computation involved when using regular expressions. It is, however, possible to use regular expressions for validation of things that have to be in a specific format, and then use levenshtein's algorithm to match against possible values if the format is invalid. It has very specific uses, but if there's a scenario that warrants it, it's the combination of the two will serve the purpose well.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.