Saturday, December 8, 2007

Openframeworks : Eating data at Eyebeam

-


I had the pleasure to join a workshop at Eyebeam host by Zach Lieberman and the openframeworks team.

The theme of the event was Eating data, and the main idea behind it was to access data published online bypassing the common APIs and capturing them directly from the html source.

The process involved Firebug, a Firefox extension, and XPath in order to analyze the page structure to discover where the interesting data is put.

for example let's try to capture the latest cnn headlines from cnn.com:

go to cnn.com

enable Firebug and start Inspecting the html document just moving your mouse around the page where the data you are interested in is put; in our case the Latest news are contained in a div with class name cnnT2s.

now open the Firebug console and try typing $x("//a") and press enter, you should get a list of all the a tags contained in the document.

thanks to this function is possible to ask question to the document using the XPath syntax.

so let's access the div which contains the latest news:

$x("//div[@class='cnnT2s']")

and all the a tags it contains:

$x("//div[@class='cnnT2s']//a")

you should get a list of all the "a" elements relative to the latest news.

Firebug is a useful tool to analyze the page, but cannot extract and manipulate the data; once you have understood which XPath question is useful to get the data you want is time to use another application to extract and work with the data, something like openframeworks!

---

thanks to ofScraper is possible to get the data from the html page into the openframeworks application and use them as you want. The flow of operation is the following:

Create a connection
Request the html page
Convert the page source into a string
Convert the string into a Xml node
Analyze the node using XPath
Get the results as a std vector
Use the vector as you want!


The OF version with scraping enabled is available here MAC and PC
in the app folder there is an example application to look at.

I really enjoyed the workshop, Zach is a great teacher, the OF team is nice and we had a great time at Eyebeam!

-

No comments: