Skip to main content.

Tue, 22 Dec 2009

PyCon 2010: "Scrape the Web," and a poster session

"Scrape the Web," my PyCon tutorial on web scraping is back this year! Plus I'll be leading a conversation on how to get involved with Free Software from my poster at the poster session.

This year's Python conference takes place February 19-21 in Atlanta, Georgia, USA.

Poster session

This year is the first year PyCon is holding a poster session. My poster is on open source and Free Software for the Python community, focusing on how you can get involved.

It's a plenery session. This means, for 90 minutes, there will be a dozen of us presenters standing in front of our posters hoping PyCon attendees will talk to us. Everyone at PyCon will be milling about, since there will be no talks during the poster session. So stop by!

Web scraping tutorial

I had lots of fun last year talking to a packed room about programming the web. The World-Wide Web is the world's most widely-used distributed computing system; if you're only using it from a web browser, you're missing out. It's a tutorial, which is a paid three-hour course (with refreshments) in a classroom setting. Based on what last year's attendees said afterward at lunch, it seemed the attendees enjoyed themselves too!

From Python, there's a host of choices for pulling information from the web, and a few choices for pushing data back (usually through forms). Here are some topics we'll cover:

I think the most exciting part is the discussion of getting around anti-scraping countermeasures. This is where the rubber hits the road. We'll:

Last year's version is online as a video. If you missed it last year, register for PyCon and sign up for my tutorial, "Scrape the Web." You're likely to learn a lot, and I'm always happy to answer questions during and afterward.

Brian Gershon, one of last year's attendees, explained best:

Why use an API, when you can just grab it off the page? :)

[] permanent link and comments

Comment form

  • The following HTML is supported: <a href>, <em>, <i>, <b>, <blockquote>, <br/>, <p>, <abbr>, <acronym>, <big>, <cite>, <code>, <dfn>, <kbd>, <pre>, <small> <strong>, <sub>, <sup>, <tt>, <var>
  • I do not display your email address. It is for my personal use only.

Your email address: 
Your website: