Jump to content
xisto Community
Sign in to follow this  
derouge

Possible To Do? Reading a remote websites data ..

Recommended Posts

What I need to know is how to do the following? I'm completely confused, although I sort of have a general feel of what I need to do, I just have no clue how to do it.

 

I need to be able to take some data from one page, displayed on their site as an HTML table, and then output certain pieces of it to my site. I will disclaim here and now that what I intend to do does not violate any laws, as I'm not stealing a persons content. I am using this to act as an online counter, to see how long someone is online. I also have emailed the company to ask if what I'm doing would be frowned on by them, and they've said as long as I source where the data is from, it's fine.

 

So anyways, an example.

 

Site A has a table that looks like the following ..

| NAME | PLACE | COMMENT |

What I need to do is get that info (Name, Place, Comment) and store it in a database. I'll need the code to loop every 20 minutes by itselfs, updating the database each time. The specific trouble I have is how to recognize where that info starts and ends, and then to be able to recognize multiple rows as seperate entries. I really dunno where to start, even, on the coding side. So, if someone could just give me a little line to start with, that'd be appreciated greatly. If you feel like helping more, thats also appreciated! I am not expecting someone to write out lines of code here that I can copy and paste, I'd much rather be given some resources and a few hints so I can plug away myself.

 

If you need more specifics, just ask. :mellow:

 

Thanks in advance!

Share this post


Link to post
Share on other sites

Ok, so where exactly is your problem...in extracting the infromation from the remote site,or in parsing it to suit your needs?i assume its extracting it from the remote site,u can do it using a php script,but it has to be run by someone, to update the database,or you can make a local python script that sits on the server and does it for you.if u are ok with the php script, u can look for info about php sockets in http://forums.xisto.com/no_longer_exists/ u need help with python.. ask...

Share this post


Link to post
Share on other sites

Now, in regards to Python .. err, yeah. Well, whats that!?! xD I suppose I'd need to do a bit of research into that, as well?

1064325441[/snapback]


to give you a kick start,

python is a relatively new programming language (scripting lanugage).

when i say relatively i mean relative to c++ and all.. :mellow:

 

its quite powerfull and can be used for alot of "quick tool creation" tasks,

which could require tons of time in C/C++...

 

the reason i mentioned it is, if you have physical access to the webserver, u could write a python script to update ur database whenever and however u want it to.

 

dont know of any good python tutorials but the interpreter is at:

https://www.python.org/

Share this post


Link to post
Share on other sites

I'm not a PHP pro, but as far as I know you need a CronJob to complete a repetetive task every 20 minutes and CronJobs cost money.But you'd better not rely on me. My intuition tells me that you would be better off with another language. As far the reading part, you can get the page with file_get_contents, and if it is a static page (except for the table data) it will be easy to extract the code using the various string functions.Good luck,Ruben

Share this post


Link to post
Share on other sites

Hi I will give you some ideas of how getting and examining the resluts in php.

 

<?php

// Open the url, read only mode. For example Google.

$url=file_get_contents('https://www.google.de/?gfe_rd=cr&ei=BwkjVKfAD8uH8QfckIGgCQ&gws_rd=ssl');

echo $url;

 

// Divide url content into table sections using the " " sting to divide (you could use also "div" or "td" or whatever else

$urlparts = explode (" ",$url);

 

//Now you have got an array containg all the url content, divided by the " " strings

// You can get it using the $urlparts[n] variable, where n is the number of divisions made

// for example. In order to view the number of values stored in the array $urlparts.

 

$counter= count ($urlparts);

echo $counter;

 

// Now you want to find some string to get the next one. We will search for "images"

// We will write a loop to do this task. You'll use a loop to check the variable.

// Firstly I include some html code to introduce it.

echo ("<br>");

echo ("<br>");

echo ('Those are the "$chaincounter" values during the loop:');

echo ("<br>");

 

// Now the loop

$chaincounter==0;

while ($chaincounter <= $counter) {

echo $chaincounter;

echo " ";

$chaincounter++;

}

 

// Hehe! Well, I printed into your screen the values, but you can do whatever you want with the array

 

?>

 

At this point the code is finished.

 

You'll have to analyze the strings.

Check these functions:

 

count_chars

ereg

implode

join

ltrim

similar_text

split

str_replace

str_word_count

strchr

strip_tags

stristr

strlen

 

As you see, php contains lots of functions. These are string functions, you'll have to check array functions too.

 

Get the string that identifies the content you want and apply it to find the cells to extract its content.

 

As you've seen you can write html script using the "echo" function. This means that in a loop, you can "echo" tables too. For example:

 

I have a variable called $results that contains the values of a database field. I will put it on a table using a variable called $values.

 

....

echo "<table border='1'>\n";

while ($rows=mysql_fetch_assoc($results)) {

echo "<tr>\n";

foerach ($rows as $value) {

echo "<td>\n";

echo $values;

echo "</td>\n";

}

echo "</tr><br>\n";

}

echo "</table\n";

......

 

I don't know your php knowledge but it isn't a difficult language to write.

 

About the other question.

 

In order to get realtime update you should run your own server (however maybe there are internet services that could call your page, I don't know).

 

So, your pages will be updated every time you have a visitor that runs the code.

 

Besides, you should store the data into a database like MySql.

 

I don't know how to create a script capable of updating itself by the other hand, and it's time to go to dinner now. 21:55 ups! Maybe in javascript, but javascript and users of Windows XP SP2 have troubles sometimes.

 

Ok. hope you get that page. I had a similar idea and still have to develop an application similar to your idea. But first I must finish updating my webpage and an online store. ups. No time for now. Work must be done too.

 

Cya :)

Share this post


Link to post
Share on other sites

Okay, here we go. Read this first and try to understand.
http://forums.xisto.com/topic/83250-topic/?findpost=

As you see, you'll probably need some help with the regular expressions used to extract your specific data.

scherzi's sytem will work as long as the target page never changes format. His system takes all of the words from the web page and builds an array from it. Then using the magic of PHP string functions you'd have to rebuild the data you want word by word. It is a good idea but may be more confussing for most people and requires some advanced string function use.

The easiest way that I know of is in the link above. Here is a general overview of what that does.

Reads the target file.
Searches that file for a very specific trigger.
Uses the trigger as the starting point of the dynamic data you want to extract.
Places the extracted data into it's own variable.
Stops extracting data at the second specific trigger.

Now this would need to repeat for each item you wanted to extract.

If all of your information was in the same area, it would be best to extract the entire area first then pull the individual data out from that.

That's what I would do for the data extraction. Here is how I would optimize it.
If the data is not 100% real time, there is no reason to read the target page every single time your website loads. If you do it that way, and you start getting a lot of traffic; the owner of the target page may get irritated by the load on his server and ask you to stop leeching. As a result, you probably want to add a buffer to the script where the extracted data is stored on your server and can be quickly accessed and outputted.
Basically, create a new file on you system to hold the data. When you actually go to the target page and extract the data, then write all of that data to your storage file in PHP form:
my_data.txt

$data1 = "Whatever the data is";$data2 = "Even more data to store";$data3 = "Yet more data to store";

Once you have the data stored locally, you can use it instead of accessing the remote website over and over. You can decide whether or not to use the stored data or remote data based on the age of the local data file. All you do is use the PHP data modified function to determine the exact time that the file was last modified and compare that to the current time. If the difference between the two is greater than your preset buffer time offset, read from the remote page otherwise read from the local data file.

This will prevent you from having to try and use a cronjob to update your data. Also, the data will only be updated if it is requested, so you won't have a script constantly updating the data when it isn't needed.

If you still need further assistance, please include the specific data you are trying to extract and the address of the page it is located on. I highly doubt anyone here is going to steal your idea but the information would make finding a solution for your problem a lot easier. We'll need to see where your triggers are located. here is what we are looking for.
<tr><td id="name1">First name you want</td><td id="location1">First location you want</td><td id="comment1">First comment you want</td><tr>
Now we probably won't get lucky enough to have each table cell identified for us but that would be helpful.

hope this helps out. :)

vujsa

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.