Jump to content
xisto Community
Sign in to follow this  
vujsa

Automated Product Suggestion Script Compare user lists and suggest related items based on pattern matching

Recommended Posts

I recently got an idea for a project and one of the features I wanted the project to have was an automated suggestion service. If anyone has been to Amazon, it would work much like their recommended product feature.What I want to do is take several users lists of whatever but for this example, I'll use web links like from the browser history.I would want to suggest links to a user based on common links in many other users lists.User A: Amazon, Ebay, Excite, Google, Yahoo, MySpace, WalmartUser B: Amazon, Ebay, Google, Yahoo, You Tube, MySpace, CVSUser C: Amazon, Excite, Google, Yahoo, You Tube, MySpace, Home DepotThe three users have similar lists and as a result, they probably have similar website preferences. So the script would recognize the similarities (All have Amazon, Google, and MySpace), eliminate the uncommon items (Walmart, CVS, Home Depot), and then suggest to each user what ever common link they are missing:User: SuggestionA: You TubeB: ExciteC: EbayThis is a very simplified example but the theory would be the same except for the following:There will likely be more than one common pattern in a user's list like so:User A: Amazon, Ebay, Excite, Google, Yahoo, MySpace, WalmartUser D: Google, Yahoo, Walmart, Applebees, Overstock, PayPalUser E: Ebay, Google Yahoo, Walmart, PayPal, Circuit City, Best BuyAll share Google, Walmart, and Yahoo so that is the pattern match. Ebay should be suggested to User D since User A and User E share that as well. PayPal should be suggested to User A since User D and User E have that in common. Best Buy, MySpace, and Overstock should not be considered.The script should be prepared to search hundreds or even thousands of user lists to find match patterns and make suggestions based on those matches. The script should be able to provide suggestions based on the best matches possible.Ideally, the script would work with nearly any kind of list such as books, links, songs, etc...So the question, does anyone know where I can find a similar script or have any suggestion on how to begin developing such a script.I had considered using a percentage comparison instead of a true pattern matching method. User A's list is compared to every other list in the database using a query that would return lists that are most like it. Using the top 5 or 10 similar lists (Based on how well it match percentage wise). This would use the same technique as the search engines do when they list search results by relevance and show how relevant it is by a percentage value. Then eliminating all of the completely uncommon items from the list, and then removing all of the completely common links would give a list of items that might be suitable to suggest to each of the lists mentioned as long as the suggestion wasn't already on that particular list.So, given that this is the much easier method, does anyone have suggestions for this?Thanks for any help.vujsa

Share this post


Link to post
Share on other sites

Hi vujsa,The key here is in addition to maintaining a list (in the database) of each user's (hereby referred to as "key") items (hereby referred to as "value"), maintain a separate "master list" of all values associated with all the keys, sorted by their occurence. Hence, whenever a suggestion is needed to be made for a particular key, find a value in the master list that is not present in the key's list of values with the highest occurence.Note that this is global, all keys are considered regardless of how high key X and key Y are related. To make the system more specific and tailored to each user, the existing system as described above needs to be slightly modified and extended, and will be described in detail below. Be forewarned that, if improperly designed, this will take a lot of database space since you are attempting to match every user against every user; that would be N^2 matches for N keys.To reduce complexity, maintain a highest relation list that contains two fields: the key A, and the other key B most associated with A. You would need to provide an algorithm that gives a proper association value for two keys; a suggestion would be to take the number of matching values, and subtract it with the number of non-matching values to give the assocativity rating. This is very simple and fast to compute, but may not give the desired and most exact behavior depending on your data.Once we have a highest relation list, we use the master list to be the arbitrator when we have to make a suggestion for key A. The system will suggest a value which has the highest occurence in the master list, and has to be a value for key B, and not a value for key A. If there are no matches (i.e. keys A and B have the exact same values), the system may suggest a new value depending on an algorithm; a suggestion would be to take the value with the highest occurence in the master list which are not values of both key A and B.In cases where a key has changed values, we must recompute the highest relation list. This is less costly than having to compute the relation of arbitrary groups of users; while it can be done, it is best employed with a very fast back-end with good processing power, and even then, it would not scale well. The proposed design here is simple and effective, without requiring extreme amounts of power and storage.What do you think?

Edited by altimit (see edit history)

Share this post


Link to post
Share on other sites

Thank you for your reply. I think that I have a method to use for this idea. Since all of the date will be in a MySQL database, I can utilize some of the built in functions to some pattern matching.For example, if each user's record included a list (text list maybe comma separated), then using a FULL TEXT search of user A's list against the users database table would return the closest user matches to user A. Then a number of "matching" records can be used to make suggestions from. This along with manual pattern suggestions would offer the greatest probability of providing the user with suggestions they would be interested in. To make the system more complex, I think that providing the user with a wish list that could also be searched against would provide an even greater number of accurate product suggestions.So what I thought, since I will rely on MySQL to do most of the work is cycle through each "matching" list and eliminate any items that are the same in the query list. That would only leave items that might be suggested from the "matching" list. Then we store that list of possible suggestions and move to the next "matching" list.When done with each of the "matching" lists, we will have an array of suggested lists. We then eliminate any item in each list that is not found in any other list which provides us with a list of items that might be good suggestions.The key would be to focus the suggestions based on how relevant it is. For example, if the number of matching lists returned is quite high, they can be ordered by how closely they match and only the top 20 could be used. Then only add an item to the suggestions if it appears in 3 or more lists.Unfortunately, I have so many projects right now I can't start on the system. Not to mention, I still need to create a CMS to base future systems on. Seems like every time I think I have a chance to catch up, another freelance job comes up.vujsa

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.