Skip to main content.

Sun, 30 Nov 2008

As for single people,

"I don't know, try eating chocolate cake," he said.

-- Pastor Young (source).

[/scribble/rhetoric] permanent link

Scrape the Web: Strategies for programming websites that don't expect it

I just received this email:

From: Greg Lindstrom
To: Tutorial List

Hello,

On behalf of the PyCon Tutorial Selection Committee, I'd like to inform you that you have been selected to present at least 1 tutorial at PyCon 2009 in Chicago. We had 50 proposals for the 32 available slots and many good proposals had to be rejected.

It means:

I get to stand in front of people I don't know in Chicago and talk about web scraping!

(Unless the talk is canceled because no one signs up.)

So I guess it means I'll be going to PyCon 2009! It's in Chicago, from March 25 to April 2. If you'll be there, too, drop me a line!

[/note/debian] permanent link

RDFa for the Debian Package Tracking System?

I was happy to read Zack's post about adding machine-readable metadata for the Package Tracking System. Not only can one query it via SOAP, he also provides XPath recipes for how to screen-scrape data out of the web pages.

He writes:

Well, on top of [SOAP] I've implemented something along the lines microformats, that just make a clever use of ingredients already available in XHTML like classes and unique identifiers.

This is awesome. In fact, as you can see when he links to the SOAP backend, the SOAP interface is implemented using that XPath screen-scraping!

What I think would be even more awesome would be to present the data to a machine user of the web page as RDFa, "RDF in attributes." RDF (short for Resource Description Framework) is a standard for metadata statements. Although it is involved in early versions of RSS used for web site syndication, in general it has nothing to do with that.

A sample few RDF statements might be:

RDF generally uses URIs to represent information (though you can still literal values like numbers where appropriate). This allows different users to create namespaced terminology. That way, when Debian defines what "maintainer" means, Fedora choose if they want to use the Debian term meaning "maintainer."

If they do, then Fedora people and Debian people could use the same query (on a different set of data) to answer the same question. And if they choose to use a different term, the two data sets can co-exist; the namespacing prevents any conflict.

As for the term URI: URIs ("Uniform Resource Indicators") are just like URLs, except that instead of names of locations, they are just identifiers. So it's true that every URI is a URL, but you aren't necessarily intended to be able to wget every URI; they're just names.

Ben Adida and Mark Birbeck wrote a fantastic RDFa primer that explains the concepts and implementation, peppering it with diagrams where they might help. The key is that using RDFa gives you the ability to automatically interoperate with the world of RDF-aware tools, including query and reasoning systems, and it is architected in a way that anyone can add RDFa data to any page without possibly stepping on the toes of other extra-metadata technologies. (Microformats don't have most of these benefits.) Ben and Michael Hausenblas at W3C also wrote a document listing some further use cases for machine-readable web pages.

When I have some spare time, I'd be happy to help. But first I hope to make Zack and others aware that there is a standard for machine-readable metadata, designed with use cases like ours in mind!

[/note/debian] permanent link

Tue, 18 Nov 2008

Obama's digital writing

The New York Times says this about Obama:

"His messages to advisers and friends, they say, are generally crisp, properly spelled and free of symbols or emoticons."

There's hope for me still!

[/note/me] permanent link

Sat, 01 Nov 2008

Mouse cursors

John Goerzen was surprised by a mouse pointer change. His mouse changed from X.org's class black mouse pointer to the new GNOME translucent set. Upset, he wrote:

I noticed that my beloved standard X11 cursors had been replaced by some ugly antialiased white cursor theme. I felt as if XP had inched closer to taking over my machine.

Windows users seem to place similar importance on that clicky thing. A recent PC Magazine article writes, "Few things are more important in Windows than the mouse pointers." Dave Taylor discussed mouse pointers once, showing this picture of Windows XP's mouse pointers:

Windows XP's mouse pointer, then, doesn't look like the one John Goerzen got. They look like a bent version of the normal X11 pointers with inverted colors. Windows Vista's mouse cursors do look like GNOME's (via a BlogIsEverything post):

For this reason, Windows Vista feels like a cheap knock-off of GNOME to me whenever I use it.


[/note/debian] permanent link

Two Girls, One Cupid

<elver> If lesbians hook up on okcupid.com, do we call that situation 2girls1cupid?
<Chris_B> hahaha. ok elver you actulally got me to laugh
<Chris_B> I dont dislike you today

[/scribble/hash-joiito] permanent link