Jump to content
xisto Community
Sign in to follow this  
OpaQue

Auto-KeyWord Generating Script.

Recommended Posts

Well, for better search engine optimization, I had created a AUTO-KEYWORD generating script. The following was the procedure that I followed..Read the entire contents.Converted it into String.Removed the HTML Tags and the Whitespace charactersFiltered out only valid words without special characters ( only hypen permited )Once all the filtering process was done, General words like "the", "a" etc was removedAfter that, the string was converted into Array.Using the callback of the following function, I calculated the Frequency of the words.[array_count_values() returns an array using the values of the input array as keys and their frequency in input as values. ]Then The Top 15 words were selected, converted into string and printed out.They were embedded in BOLD characters automatically!Now the **** thing that happened was, All the spelling mistakes and things like LOL and OK and hundreds of diffrent words started comming up at the top 20. Any suggestions in which I can make only top valid english meaningful words as the keyword.

Share this post


Link to post
Share on other sites

hm sounds difficult. only two ways i can think of right now, first is to store all the "bad" words and get the engine to exclude them. second is to have a dictionary and get the engine to validate words based on whether they can be found in the dictionary.

Share this post


Link to post
Share on other sites

Considering that most of the abbreviations that people use are three letters and there are few words that people will actually type into a search engine that are three letters, you could block all words that are three letters or shorter. This is not the best idea because people will want to search for things like "cat" or "dog." You can definitely block words that are two or one letter, because those are only inconsequential words. Yuo can also have a vowel/consonant checker, and if a word contains no vowels, no consonants, three or more vowels in a row, or three or more consonants in a row, block it. I may be able to help more if you explain to me what the purpose of diaplying the top 15 searched for words are, sometimes understanding the reasoning leads to interesting solutions.

Share this post


Link to post
Share on other sites

I dont use that script now..The main reason is.. it puts a lot of load on the server.Removing words below 4 characters is a good idea.Comparing results with dictionary is not a good idea..puts load again :-(Also, many a times, the word which is repeated is not necessarily topic relevant. I now use The topic titles as "Page titles" for and also as Headings. This title is again repeated on the page to improve keyword density :)

Share this post


Link to post
Share on other sites

I would recommend excluding all words under 4 characters.Its a good idea, but as you said it puts a lot of load on the server. Maybe using a table, each post is parse for the keywords, the added to the table if field exists the add to a count, if it doesnt insert a new row.Then use the caching feature to cache the top words in the forum. That would mean the load would be minimal as the query is added to a query already in use :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.