Sun, 30 Nov 2008

RDFa for the Debian Package Tracking System?

I was happy to read Zack's post about adding machine-readable metadata for the Package Tracking System. Not only can one query it via SOAP, he also provides XPath recipes for how to screen-scrape data out of the web pages.

He writes:

Well, on top of [SOAP] I've implemented something along the lines microformats, that just make a clever use of ingredients already available in XHTML like classes and unique identifiers.

This is awesome. In fact, as you can see when he links to the SOAP backend, the SOAP interface is implemented using that XPath screen-scraping!

What I think would be even more awesome would be to present the data to a machine user of the web page as RDFa, "RDF in attributes." RDF (short for Resource Description Framework) is a standard for metadata statements. Although it is involved in early versions of RSS used for web site syndication, in general it has nothing to do with that.

A sample few RDF statements might be:

<http://www.asheesh.org> has license <http://creativecommons.org/licenses/by-sa/3.0/>
<http://packages.debian.org/alpine> has maintainer <http://qa.debian.org/developer.php?login=asheesh@asheesh.org>

RDF generally uses URIs to represent information (though you can still literal values like numbers where appropriate). This allows different users to create namespaced terminology. That way, when Debian defines what "maintainer" means, Fedora choose if they want to use the Debian term meaning "maintainer."

If they do, then Fedora people and Debian people could use the same query (on a different set of data) to answer the same question. And if they choose to use a different term, the two data sets can co-exist; the namespacing prevents any conflict.

As for the term URI: URIs ("Uniform Resource Indicators") are just like URLs, except that instead of names of locations, they are just identifiers. So it's true that every URI is a URL, but you aren't necessarily intended to be able to wget every URI; they're just names.

Ben Adida and Mark Birbeck wrote a fantastic RDFa primer that explains the concepts and implementation, peppering it with diagrams where they might help. The key is that using RDFa gives you the ability to automatically interoperate with the world of RDF-aware tools, including query and reasoning systems, and it is architected in a way that anyone can add RDFa data to any page without possibly stepping on the toes of other extra-metadata technologies. (Microformats don't have most of these benefits.) Ben and Michael Hausenblas at W3C also wrote a document listing some further use cases for machine-readable web pages.

When I have some spare time, I'd be happy to help. But first I hope to make Zack and others aware that there is a standard for machine-readable metadata, designed with use cases like ours in mind!

[] permanent link and comments

Comment form

The following HTML is supported: <a href>, , , , <blockquote>, , , <abbr>, <acronym>, <big>, <cite>, <code>, <dfn>, <kbd>, <pre>, , , , <tt>, <var>
I do not display your email address. It is for my personal use only.

Name:
Your email address:
Your website:

Comment:

Asheeshworld Notes you will like

Sun, 30 Nov 2008

RDFa for the Debian Package Tracking System?