Jump to content
xisto Community
Sign in to follow this  
Pavarr

Operating On Google News How to getuse Google news headlines

Recommended Posts

First of all, we have to store our Google news in variable:

$googlenews = file("http://news.google.com/news/en/us/world.html");

Then, we get ourselves a table, containing news' link with title, popularity on Google News, how old is the news, and where it has been found by Google:

$popularity = 0; // table indexfor($i = 46; $i < count($googlenews); $i++){ // real news start at line 46$all = explode("<font size=",$googlenews[$i]); // it makes it easier to retrieve headersfor($j = 0; $j < count($all); $j++){$act = $all[$j]; // actual current chunk// a bit of cleaning up$act = str_replace("</tr>","",$act);$act = str_replace("</td>","",$act);$act = str_replace("</table>","",$act);$act = str_replace("</b>","",$act);$act = str_replace("</font>","",$act);$act = str_replace("<nobr>","",$act);$act = str_replace(" ","",$act);$act = str_replace("<br>","",$act);// enough cleaningif(stristr($act, "-1>") && stristr($act, "<font color=#6f6f6f>")){ // checking for markers of _real_ news	$where_time = str_replace("-1><font color=#6f6f6f><b>","",$act); // getting where and time as one string	$gdzie_czas = str_replace("</nobr>","",$gdzie_czas); // another cleaning routine	$where_time_arr = explode("- ",$where_time); // dividing to time and source	$popularity++; // getting current table index                $where = $where_time_arr[0];        $time = $where_time_arr[1];        // we know now where news was found, let's get news title & link	$news = explode('<td valign=top>',$all[$j-1]);  // right things be right :)        $true_news = $news[1];	$news_array[$jak_dawno] = $where.'|'.$time.'|'.$true_news; // table input}}}

Now, that we have got the array with pure information, we can do virtually everything with it, for example:

foreach($news_array as $value){       $values_arr = explode("|",$value);       $where = $values_arr[0];       $time = $values_arr[1];       $news = $values_arr[2];       echo "$news - found in $where $time<br/>";}

We can also search it for keywords, or do with it something like that.

I have to add, I haven't yet developed a simple and always-working way to exclude the link from the news title - sometimes there is unknown bug and there are residual chars form link :/ But even without it, it's fully functional.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.