Skip to main content.

Tue, 22 Dec 2009

PyCon 2010: "Scrape the Web," and a poster session

"Scrape the Web," my PyCon tutorial on web scraping is back this year! Plus I'll be leading a conversation on how to get involved with Free Software from my poster at the poster session.

This year's Python conference takes place February 19-21 in Atlanta, Georgia, USA.

Poster session

This year is the first year PyCon is holding a poster session. My poster is on open source and Free Software for the Python community, focusing on how you can get involved.

It's a plenery session. This means, for 90 minutes, there will be a dozen of us presenters standing in front of our posters hoping PyCon attendees will talk to us. Everyone at PyCon will be milling about, since there will be no talks during the poster session. So stop by!

Web scraping tutorial

I had lots of fun last year talking to a packed room about programming the web. The World-Wide Web is the world's most widely-used distributed computing system; if you're only using it from a web browser, you're missing out. It's a tutorial, which is a paid three-hour course (with refreshments) in a classroom setting. Based on what last year's attendees said afterward at lunch, it seemed the attendees enjoyed themselves too!

From Python, there's a host of choices for pulling information from the web, and a few choices for pushing data back (usually through forms). Here are some topics we'll cover:

I think the most exciting part is the discussion of getting around anti-scraping countermeasures. This is where the rubber hits the road. We'll:

Last year's version is online as a video. If you missed it last year, register for PyCon and sign up for my tutorial, "Scrape the Web." You're likely to learn a lot, and I'm always happy to answer questions during and afterward.

Brian Gershon, one of last year's attendees, explained best:

Why use an API, when you can just grab it off the page? :)

[/note/preso] permanent link and comments

Sun, 20 Dec 2009

Anti-depressants and personality shift

Lisa pointed me to a Science News article discussing Paxil, a medicine prescribed for depression. The important bit:

“We propose that modern antidepressants work partly by correcting the long-term personality risk factors for depression,” Tang says.

The article explains that, even after "accounting for the extent to which each treatment diminished standard measures of depression," taking Paxil makes you less neurotic and less introverted.

Recently we learned that the placebo effect is getting stronger. What this research makes me wonder is, If we helped these people adjust these personality traits via e.g. cognitive therapy, and then gave them placebo, would they have the same high success in defeating depression as the Paxil takers?

Please understand that I do believe the lived experience of depressed people is terrible. I don't mean to diminish their suffering. I'm wondering here about ways how we can help people be happier.

[/note/drugs] permanent link and comments

Fri, 18 Dec 2009

Diversity in Free Software: South Asians as an example

As someone born in India, I sometimes look around and wonder, Where are the Indians (and other South Asians) in Free Software?

(I don't mean to exclude South Asians from other countries, so I will lump us together. I believe that we are more similar than we are different, although I know more about India than about the rest of South Asia.)

There is no shortage of Indians performing information technology jobs in the United States. The same is true in academia; the Computing Research Association uses National Science Foundation data to show about 15% of computer science bacholor's degrees are awarded to "Asians or Pacific Islanders." These are not precise numbers targeted at South Asians in particular, but they confirm a general feeling that plenty of technologists in the United States are from that part of the world.

South Asia is quite a populous region, coming in at over one billion people. It, too, has plenty of technology workers. So much FLOSS conversation happens in English, and India is well-suited to handle this; English is an "official language". Indian academia reports that there are 350 million English users and about 90 million English speakers.

So let's visually compare the Debian developers map for South Asia (over one billion people) and that of New Zealand, a country of four million.

India:

New Zealand:

These two countries have about the same number of Debian developers (at least, who have marked their location in the Debian LDAP database). About four.

South Asians comprise about one sixth of the world's population. There are about one thousand Debian developers; we represent at best 1% of that. These numbers are comparable to the under-representation of women in Free Software, especially when you compare the figure to South Asians' over-representation in the rest of information technology.

That makes me sad.

Take a look at the Debian developer map again. You'll see that Debian is certainly not an Americans-only project, or even an English-speakers-only project. South America has a respectable dotting of developers, and Western- to Central-Europe are packed.

I have strong feelings about Free Software. It emerges from an ethos of personal empowerment, and with open source it has become a dominant force in computing. Yet there are plenty of sharp people -- at least women and South Asians -- who, somehow, become culturally excluded from participating.

Why care about diversity?

Consider the diversity of contributors we already have. Some contribute to Free Software because of particular business needs, such as what caused Avi Kivity to write KVM, the new leader in Linux-based virtualization. Everaldo's art background gave us the "Crystal" icon set that set the standard for sharp-looking icons on the Free Desktop for years. Josh Coalson knew about compressing sound, and his Free Lossless Audio Codec is now the standard in high quality audio.

We already have a great deal of diversity. We should be celebrating!

Back in 2001, FLAC's users were celebrating. In that year, I decided to ditch proprietary operating systems because I felt I could achieve all my computing needs in the Free world. A happy user of FLAC myself, I lurked on the mailing list as I watched grateful people thank Josh for the great software he wrote.

Different contributions will excite different sorts of users. The more different people we have improving FLOSS, the more happy users we can make. Happy users of FLOSS are Free users. Happy users can become contributors, putting forth code, documentation, translations, and word-of-mouth marketing.

The first reason to improve diversity in FLOSS is to better suit our users' needs. The more diversity we have in our contributors, the more chance we have of tickling our users in the ways that please them the most. I wish to see an end to software that restricts users' freedom, so I want to see us build the tools that users want.

One thing that pleases me is when I see other people contributing who seem similar to me. When I went to Debconf, I was thrilled to be surrounded by people who cared about software freedom and technical excellence. I had even more fun being social, chatting about rainforests, mutual friends, websites, and music. I might have had the most fun playing the card game Mao.

A second reason, then, to improve diversity in FLOSS is to increase contributor retention by increasing joy. Mao was an example of a cultural bond I happened to share with a handful of Debianites. The more diversity we have, the more frequent these sorts of coincidences will be.

The final, most obvious, reason to reach out to groups of people who do not typically contribute is that we can increase our numbers. That by itself is so valuable. Ubuntu sees 100 new bugs per week, even after the bug squad's efforts. If we can do a better job of recruiting new contributors, the raw numbers give us more strength in creating and maintaining world-class software as well as letting the world know about it.

Changing the balance

I believe that there are plenty of South Asians quite capable of contributing to FLOSS. I believe the same of women. I believe the same of men.

Back to the topic at hand. Why do the South Asians vanish when we look at Free Software, not tech in general?

There are plenty of reasons I can dream up, based on my experience with Indians.

It's tough for FLOSS advocates to work directly on these distant issues. But I think we can focus some problems we can help solve. Crucially, awareness of Free Software spreads best by social circles. I learned about Linux from a friend at a summer camp. I'll repeat that:

So if you want to spread that awareness, try to be a bridge.

If you meet someone from an unusual background for open source who needs support or mentorship, try to help. That is an investment in the diversity and growth of Free Software. Those people can now unlock more "open source minorities."

What success looks like

Google Summer of Code helps some new contributors get started and provides that mentorship. Rachel McCreary was invited to the SciPy conference after a successful summer. Her father left a comment explaining how her sisters participated in FLOSS via Google's Highly Open Participation (GHOP) Contest:

Rachel was inspired and motivated by BOTH of her little sisters, each completing six GHOP tasks (if memory serves).
GHOP and GSOC has been a game-changer for these girls. Rachel's younger sister is applying to schools such as MIT with an interest in a science major. The youngest daughter now has a Caltech poster on her wall with the intent to eventually attend.
Their proud Dad

Soon, these stories will be commonplace. Until then, we have work to do.

(I'm still researching these topics. If you can help me find any sort of data to help me learn more about diversity in FLOSS, even if it seems like I wouldn't like it, leave a comment.)

[/note/debian] permanent link and comments

Tue, 15 Dec 2009

Two questions

A few weeks ago, I was listening to an R.E.M. album. All I knew about the song I was hearing was that R.E.M. recorded it, and I liked hearing it.

Raffi's ears perked up. He asked, "Is this R.E.M. covering the Velvet Underground?"

I asked, "Is that true? I didn't know that."

Now Raffi knows what I knew, and I know what he knew.



[/note/communication] permanent link and comments

Thu, 10 Dec 2009

OpenHatch tracking bite-size bugs

Cross-posted to asheesh.org from the OpenHatch blog. (OpenHatch is my current project.)

"How do I get involved in free and open source software?"
"How do I encourage people to join my project?"
Gregory Wilson, a CS professor at UToronto, cites some recent successes at answering these questions with respect to students: "Google Summer of Code and UCOSP have both shown that it's easier for students to get into open source projects if there’s a pile of tiny tickets for them to start with." We believe that these "bite-size" bug lists can benefit all sorts of new contributors, student or not. What you might not know is that more than a hundred projects have these small tasks tagged and waiting for you in their bug trackers.
To make it easier to find these, the OpenHatch volunteer opportunity finder allows you to browse, in one place, nearly 1000 bite-size bugs.
Do you know another project with bite-size opportunities we should index? You can get your project involved. Check out the list of bug trackers we index. If you contribute to a project, go to your bug tracker right now and label a few as bite-size. Then add your bug tracker to our index.
I want to especially thank GNOME for the ongoing GNOME Love effort that is our inspiration and the source of hundreds of these bite-size opportunities.
Happy hacking!

If this sort of thing is interesting to you, take a look at OpenHatch and subscribe to our blog or @openhatchery on Identi.ca or Twitter.

[/note/debian] permanent link and comments

Wed, 09 Dec 2009

A big machine that nothing can stop

A couple of weeks ago, Lucas Nussbaum wrote about his experience at the Ubuntu Developer Summit. Two things stuck out at me:

In the summer of 2004, my laptop (an iBook G4) ran Debian GNU/Linux on PowerPC. I tried out Ubuntu that autumn, and was very impressed. It's been five years since then. My desktop still runs Debian, and I'm proud to be a "two-distro" community member.

At the same time, this juggernaut nature raises concern. When Ubuntu releases ship with significant flaws, I quietly sigh and wonder, Are we doing good service to the people coming to GNU/Linux for the first time and seeing Ubuntu? This bugginess has bit me a few times (even for upgrades between one release and the next), and it pushed a friend of mine to switch a lot of his computing to DragonflyBSD.

Lucas points out that Ubuntu's done a fantastic job of becoming visible. A look at Google Trends shows that people search for "Ubuntu" about as much as they search for "Linux" at all.

Richard Stallman complains when people refer to GNU/Linux as just "Linux." But today, the most popular name for an operating system based on GNU is probably "Ubuntu." For Stallman, having his hard work associated with someone else's "Linux" project must be frustrating. As a Debian contributor, it would be easy to succumb to the same feeling with regard to Ubuntu.

Lucas is well aware of that. He asks,

Debian does have users! I was feeling a bit disheartened about my maintenance work on alpine lately, when out of the blue I received an email from a fellow at MIT asking me if I would accept some help with maintenance from him. Just seeing the note was a relief; it's very nice to be reminded that I have users to take care of.

Debian is not as popular as Ubuntu on the desktop (or laptop), it's true. But technically sharp people do frequently still choose it. For example, the FreeNAS project recently announced a switch from FreeBSD to Debian (and a new name as coreNAS).

But we could end up a "package supermarket". To stave this off, we must do good releases and make sure people know about them. Doing all that work takes us all time. We must recruit new contributors, help them find things to work on, and make sure they feel welcome. And sometimes-sloppy maintainers like me ought to fix our packages, and get help from others where necessary.

I promise to work toward that.

[/note/debian] permanent link and comments

Fri, 04 Dec 2009

Behind glass

"To an engineer of the day, using valuable computer time to simply enter text in an interactive text editor was unthinkable. It was the hackers who understood that interacting with the computer, rather than treating it as a hands-off behind-glass computation resource, was the way of the future."

-- "The Culture of the Programmer".

[/scribble/code] permanent link and comments

Sun, 15 Nov 2009

Choosing something to work in Free Software is hard

Debian is primiarly organized into mostly one-person "projects," namely packaging a piece of software. But in the past few days on Planet Debian, there has been discussion about how existing packages in Debian need more help. Tim Retout wrote:

There's a huge role for non-DDs to play in getting fixes into Debian, but as far as I remember, the emphasis of the mentors documentation is on packaging rather than bug fixing.

I think this is a serious problem across Free Software. Open source contributors are grouped into projects. If you're burning out on what you're working on lately, or you are looking to become a new contributor, it's hard to know what projects need your help. As someone who knows Python and Debian packaging, it would be nice to be able to search Debian for Python-language packages that need help.

I also think that Tim hit on a really important social point later in his post:

It seems to me that the mentor relationship works better when DDs get to know particular people

Tim's right that personal relationships make a big difference. I got serious about my NM application when I asked someone I already knew to sponsor my packages.

But not everyone can be so lucky as to have a Debian Developer as a friend. And many projects aren't as friendly as Debian; there's no official mentorship available.

I think that we need to start paying a lot more attention to personal relationships all across Free Software.

(I'm working on a web tool in this space. Join me in #openhatch on irc.freenode.net to discuss it, or just stay tuned until I write it up tomorrow.)

[/note/debian] permanent link and comments

I travel (like a food truck)

I end up traveling quite a bit. Sometimes I forget to tell friends in the places I'm going that I'm going to be there. This results in a tragedy: we end up missing each other!

So now, when I'm going to be traveling, I'm going to write updates that you can easily follow. Here are some reasons you might want to subscribe:

To subscribe, here are your choices:

(And thanks to Chris for suggesting I do this, and helping me give it a name.)

Parker wondered how I'm keeping these all up to date. Here's what I do:

[/note/debian] permanent link and comments

Tue, 27 Oct 2009

Will the last to leave kindly turn out the light? / geociti.es

Today is Monday, October 26. Someone at Yahoo will go home tonight and, on the way out, turn off geocities.com.

Update: Tue, 27 Oct 2009 15:17:50 -0400 Geocities is finally offline. Pages say:

Sorry, the GeoCities web site you were trying to reach is no longer available.

To commemorate it, I bought geociti.es. I intend to do what I can to keep the Geocities pages on the web. I am part of the Archive Team, an independent group of amateur archivists racing to rescue the web from destruction at its own hand.

In late 1994, Geocities began offering free web hosting as Beverly Hills Internet. A decade ago, Yahoo bought Geocities. In December 1998, one-third of all web users visited the website. As recently as March 2009, 11.5 million unique visitors arrived there. Today, according to Alexa Site Info, Geocities ranks somewhere between the New York Times and the Washington Post in pageviews. And today, Geocities.com shouts:

GEOCITIES IS CLOSING ON OCTOBER 26, 2009.

Tomorrow, Geocities' website will be closed for good if Yahoo sticks to that promise.

The amount the Archive Team has downloaded is around one terabyte. That's all we seem to be able to reach; many pages were deleted months ago when the archiving effort began. The archiving is continuing as I write this.

Think of it. Fifteen years of history, memories for millions of people, the birth of a generation on the web. More personal embarrassment than all the POG games put together. It fits on an $80 piece of storage equipment -- at least, that's what we managed to find before Yahoo erases it all.

Initially, when I met Jason Scott of the Archive Team, he told me he wanted to download Geocities and share it by mailing hard drives around. I told him I wanted to hoist it back on the Web. He came around, and we and the rest of the Archive Team have put Geocities back online.

Geociti.es is not the greatest website in the world, no. This is just a tribute.

P.S. Major thanks to John Joseph Bachir for the paperwork assist.

[/note/debian] permanent link and comments

Thu, 22 Oct 2009

Craigslist is not responsible for users' posting of erotic ads (finally)

Seven months ago, Thomas Dart (Sheriff of Cook County) sued Craigslist, alleging that Craigslist is liable for illegal sex ads posted by its users. In particular, he claims that Cragislist's "adult services" section is by definition synonymous with illegal activity. For that reason, Dart argues, Craigslist should be liable for illegal posts by its users.

To quote Tuesday's ruling by District Court for the Northern District of Illinois:

Plaintiff is simply wrong

The ethics of erotic services are, euphemistically, very interesting. At worst (a situation exacerbated by the underground nature of sex workers), powerful people pressure or coerce others into serving as their workers. Sex trafficking is a serious problem that ruins lives, and it is heartbreaking to even think about it. A quick search of the web finds plenty of news stories involving law enforcement finding sex trafficking operations using Craigslist to sell their victims.

These are serious crimes, and they ought to be investigated.

Craigslist often stands in the middle of this; in the case of one New York City-based prostitution ring (not necessarily involved in sex trafficking), Craigslist was the "sole vehicle through which the company operated". When Thomas Dart sues Craigslist, one can sympathize with a desire to make it harder to operate these prostitution rings. When I first heard about this case, I was torn; I do not relish the forced exploitation of people.

A quick look through Philadelphia's "adult services" section finds a lot of women looking to offer men interactive pornography through web cams. That Craigslist is a venue for this does not seem particularly dangerous; the same services can be found by searching the web. Prostitution is (obviously) outlawed by the Cragislist terms of service, though clearly it can be found there.

To return to the story of Thomas Dart, judges must answer legal questions, not necessarily ethical ones. Craigslist is not a person; it is an automatic web site. Congress wrote a law in 1996 that clarifies who is publishing these advertisements: the users' of Craigslist, not the service itself. They are responsible for their actions. (You can read more at the Electronic Frontier Foundation's Guide to Section 230 Legal Protections.) This definition seems sensible; Craigslist is like a physical bulletin board with clear rules telling users what to do. As quoted by the EFF, the court wrote:

Intermediaries are not culpable for "aiding and abetting" their customers who misuse their service to commit unlawful acts.

This is important; illegal ads for the same illegal services could just as easily have been posted in a bulletin board supposedly about "Computers and Software". Such a post would have been abuse, just like illegal posts in Craigslist's adult services section. This realization helped me come to my personal conclusion about this case: Craigslist, like a physical bulletin board, ought not to be liable for the actions of its users. Like a cafe, it is a meeting place. One could stamp out all the overt bulletin boards and meeting places in the world and create huge harm to our ability to find people we want to hang out with or do business with and yet still have a world with illegal prostitution.

I was relieved to see that the final ruling in this case preserves our ability to have meeting-places that do not screen the people who enter them for the chance that they will commit crimes while visiting. I was further relieved to discover that a 1996 portion of the much-maligned Communications Decency Act codified this years before I started worrying about it.

P.S. Humorously, in the CNN.com article linked above, the writer makes the word "pornographic" link directly to http://topics.cnn.com/topics/Internet/.

[/note/law] permanent link and comments

Sun, 11 Oct 2009

Microsoft loses all Sidekick users' data. Lesson: Make backups.

Some tens or hundreds of thousands of T-Mobile USA customers probably just lost the contacts, photos, and notes "on their phone" forever. Those data are primarily stored on servers run by Danger, a subsidiary of Microsoft; rather, they were, until data loss destroyed them. Many customers' phones do not have a complete copy of their data; some have no data at all.

T-Mobile texted the affected customers with a link to this message:

Personal information stored on your device - such as contacts, calendar entries, to-do lists or photos - that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger.

Long story short: No backups.

The story is a sad one, told many times

Microsoft chose to make no backups, reports Sidekick news website Hiptop3.com. It seems the team at Microsoft in charge of user data storage allowed an upgrade to the storage system with no backup in place. When the upgrade failed, the data was toast.

This sad story of "no backups" is something that sysadmins -- myself included -- find ourselves in from time to time. Toward the end of 2006, the "Leafycaust" (disk failure with no backups at leafyhost.com) destroyed lots of data for Students for Free Culture. I had my own catastrophe in mid-2007 when two disks connected in RAID-1 (which is not a backup system) failed within two days of each other. (I now run nightly off-site backups for all the systems I maintain.)

People who rely on computers ought to make sure the people responsible are doing those backups, and that the backups are actually usable. Students for Free Culture was a Leafyhost customer; T-Mobile is a Microsoft customer. It's somewhat humorous that Microsoft and Leafyhost provide the same level of assurance.

"im about to cry!"

In the Hiptop3.com article above, users tell their own sad stories and give advice.

Invest the time. It's good advice.

Everyone: Please make backups. Don't trust any one thing - your IT staff, your personal computer, or your paid service provider, to keep your data safe. (For more technical readers: Don't rely purely on your DBMS, your filesystem, or one particular location. You know the drill; now practice it.)

I'll conclude with a quote. This one is about the Leafycaust, but it could easily be about Microsoft/Danger:

Essentially, these notjobs burn through several thousand dollars of people’s money for the service of deleting all their critical (and personal) data.

[/note/debian] permanent link and comments

Thu, 24 Sep 2009

Award for the best clickable button in a mobile app

I just saw a screenshot of one of my favorite Android apps - at least, as far as user interface design goes.

A simple interface, and a single button that creates a Blue Screen of Death. My compliments to the chef.

(Found via [http://forum.xda-developers.com/showthread.php?t=563891 the XDA-Developers forum. It appears to be a UI-improved version of someone else's vulnerability tester called BSODroid.)

[/note/software] permanent link and comments

"Surrounded"

Spontaneous pneumothorax, pneumomediastinum, and pneumopericardium in a 16-year-old drug-abusing motorcyclist surrounded by a pack of coyotes.

Read more on Pubmed.

[/note/people] permanent link and comments

Wed, 12 Aug 2009

Easy bugs in Python

Today, I was looking into how easy it might be to contribute fixes to Python itself. I found one really interesting one:

"htmllib quote parse error within a <script>"

So I followed the link! But the bug renders as blank.



[/note/debian] permanent link and comments

Wed, 22 Jul 2009

I'm a Debian Developer!

Wow. As of Wednesday morning, I'm a Debian Developer.

It really makes me proud. To me, Debian has always stood out for its unique attention to quality, breadth, software freedom, and community involvement. Now I'm officially a part of that team.

When Debian lenny, the most recent release, shipped, I was also very happy about my contributions. Packages I maintain were shipped with it, and I spent some time fixing release-critical bugs.

All this work on Debian also makes me humbled. I want to send my deepest thanks to everyone in Debian who makes it happen — Developers, Debian Maintainers, other package contributors, users who file sharp bug reports, companies who sponsor ongoing improvements to Debian, translators, web page maintainers, Debian Developers who volunteer for positions to make sure the organization works as well as it does, the Debconf local teams, and everyone else who contributes in another way. Even those outside Debian help make it great; feedback and competition in the long run always improve a project.

Right now, I want to personally thank the key people who have guided me through the New Maintainer process. In order of appearance:

People are what make Debian great. Everyone involved across the New Maintainer process deserves thanks, too! These are just the people who stuck out in my memory of my own experience.

Now I will stand on my laurels and help make the next great Debian release!

[/note/debian] permanent link and comments

Sat, 06 Jun 2009

MUST HAVE EXTENSIVE PYTHON

"MUST HAVE EXTENSIVE PYTHON"

"MUST HAVE EXTENSIVE DJANGO"

-- A JOB LISTING.

[/note/debian] permanent link and comments

Sat, 16 May 2009

What is open source (and Free Software) missing? / Moving to Atlanta

Tim and my mother are both neonatologists at the Golisano Children's Hospital inside the University of Rochester. Earlier today, they had this conversation:

My mother: My son Asheesh is moving to Atlanta.
Tim: Is it for a girl?
My mother: I don't really know what the kids are doing as far as girls, but no....

In fact, I'm moving to Atlanta because a venture capital firm there funded me, Nelson Pavlosky, and my friend Raphael Krut-Landau to start a company to improve interactions in the open source / free software world. We get enough money to live in Atlanta from May 18 to August 6, and after that, we have to seek more funding.

This has led to a series of ironies. The first is that I am working on a startup. The second is that I left San Francisco to do it.

But I have already moved out of San Franisco, and I have left my job at Creative Commons. (Feel free to get in touch with me (outside my website's comments) about filling my shoes there.) Thanks, Nathan and Mike, for giving me the chance to contribute to CC, an organization and project that I have always had a great passion for.

For a while, I may seem vague about the project I am about to undertake; it's because I still want to nail down some details between the three of us. When Nelson, Raphael, and I arrive in person, we're going to kick into gear.

I've been chatting with a few of you over the past few months about ideas, and I do want to especially thank Karl Fogel and Mako Hill for helping the three of us think through what could be done.

Some questions for readers:

Feel free to email me (asheesh at asheesh.org) if you'd rather not comment publicly. I have a few ideas of my own, and I hope to be tossing them up for everyone to bat at soon!

P.S. Noisebridge, I will miss you!

[/note/debian] permanent link and comments

Sun, 26 Apr 2009

Comments

What if there were comments on asheesh.org?

Discuss.

[/note/sysop] permanent link and comments

Sat, 25 Apr 2009

from "Emerging Infectious Diseases", a Centers for Disease Control publication

"There are also cocks, which are extraordinary size, and have their crests not red as elsewhere, or at least in our country, but have the flower-like coronals...." more

[/scribble] permanent link and comments

Explainer: "Why do some URLs have www in them, and what difference does it make?"

Katy (who I know from the CC internship in 2006) asked me this question recently:

Why do different pages show up depending on whether there's a www or not in the URL?

To understand, I have to explain how a browser gets a web page from the Internet. When a browser is asked to load a URL like <a href="http://www.asheesh.org/scribble/enlightened-but-confused.html> http://www.asheesh.org/scribble/enlightened-but-confused.html</a>, it breaks it apart into components.

HTTP, the "scheme", tells the browser what protocol (or network language) to speak when it requests the page from the server.

The domain name is where things get interesting. This alone tells the browser who to ask for the page. The browser looks up www.asheesh.org in the domain name system, an Internet phone book service that converts names to numbers (so-called "IP addresses"). Once it knows the IP address for that name, it connects to it and prepares to speak HTTP.

The browser connects to that IP address, and asks (in the network language of HTTP):

So now, let's think about how http://www.asheesh.org/ and http://google.com/ differ: Their scheme is the same, and their path is the same. But the domain name is different.

The same is true for http://asheesh.org/ and http://www.asheesh.org/. You get the same content because, as luck has it, the administrator for asheesh.org is the same as the administrator for www.asheesh.org, and I decided to make them work the same way.

For some websites, if you add the www component, you do get different contents back: for example, http://cs.rochester.edu/ does not load, whereas http://www.cs.rochester.edu/ does.

So the final answer to Katy's question: You're lucky you ever get the same page for two URLs that are different, even if just by "www".

[/note/software] permanent link and comments

Fri, 27 Mar 2009

Pycon09: "Scrape the Web" is over

For my attendees, and anyone else following along at home:

Thanks for coming! I had a great time at the talk, and I already wrote a little bit about how much fun I had. I wanted to be sure to conclude the tutorial with some next steps for you all.

The presentation

Some crucial links:

The demos that didn't work

There were two demos that were less smooth than they ought to have been.

For WordPress Hash Cash, there's the lovely simple code I wrote that uses python-spidermonkey to post comments to a blog with HashCash. One such blog is online at http://pycon09.asheesh.org/hashcash/ ; please try it! (The reason it didn't work was some heavy load on my server during the talk.)

For Selenium RC, you can see the sample code in examples/seleniumrc/. There is a README in that directory that explains clearly how to run that code. (It didn't work for the same reason.)

The future

I'm available for questions, both hands-on at PyCon and by email after the weekend. Just email me!


[/note/preso] permanent link and comments

"Scrape the web" at PyCon: lots of fun!

Thursday morning at 9 am, I gave my scheduled tutorial at PyCon: Scrape the Web: Strategies for programming websites that don't expect it.

For those of you who attended, thank you! You made it loads of fun. The tutorial was supposedly full at 30 people, but in fact we had at least five more; at the halfway break, staff added another table to the room so that those of you standing in the back could sit down!

Because I was so behind on so many things from travel, I stayed up all night before the talk. This is actually fun for me, as we saw at Debconf last year. So I arrived at the talk energized and with my examples fleshed-out (for the most part).

There were a few ways I knew things were going well.

Early on in the talk, Nathan Yergler arrived and saw that we were scraping information from the CC lunch mainstay Mehfil Indian, nicknamed "Curry in a hurry." This caused a ricochet of smiles between me at the front and Nathan at the back; I hope that helped the mood for others, too!

Throughout the talk, the audience looked happy, and they felt comfortable enough to stop me and ask questions. Knowing that the audience feels comfortable participating is crucial for me. Participation and questioning are part of learning; they are also the best way for me to know how to tailor what topics I cover to the people in the room.

After the talk, some attendees handed in evaluation forms. One man asked me what he should do with his. "I've been putting them in this box face-down," I explained.

He suggested, "This one you ought to see face-up!"

About five people came up after the talk and asked me specific questions. One was a young lady who attended my preview talk at Baypiggies in January, which was great to see.

The same number came up to me at the end and thanked me for a good talk, which was very rewarding. One asked how often I give talks at conferences. I mentioned my OSCON talk with Nathan, and wondered to myself what other conference sessions I had led. He urged me, "You really ought to make speaking part of your career. You're a great speaker."

I followed a couple of attendees to lunch; one pointed out the room had been Twittering madly during the talk. A search of Twitter shows a lot of positive comments. (He also pointed out that someone else is "paulproteus" on Twitter.)

Basically, everybody loves me. Yay!

By the end of lunch, I was fading from the lack of sleep. I took a six-hour nap, and I woke up after all the official PyCon proceedings were over. I read an email from Greg Lindstrom, organizer of the Tutorials series of talks at PyCon. His email began:

It's Thursday night and I wanted to tell you how happy I am with the tutorials over the past two days. I haven't looked at the survey results yet -- give me a couple weeks on that; I'll share the results with you -- but the comments I heard were overwhelmingly positive. My favorite was overhearing someone ask "how did that kid in 'scrape the web' learn all of that?".

I giggled about this as I walked to dinner with Nathan.

What a great start to PyCon!

[/note/preso] permanent link and comments

Thu, 26 Mar 2009

FSF Award for Projects of Social Benefit

Last weekend, I attended the Free Software Foundation's LibrePlanet 2009 conference.

The first day was a full day of talks from Free Software luminaries including Jeremy Allison of Samba and Evan Podromou of identi.ca. During the talks, the conference IRC chat room was brimming with conversation; between talks, so were the hallways.

The day concluded in an award ceremony. We joked around on IRC:

<paulproteus> Man, I probably didn't get EITHER award.
<gmaxwell> paulproteus, cause I got both! ha!

Richard Stallman happily presented the Award for the Advancement of Free Software to Wietse Venema for writing the Postfix mail server. Then he continued to announce the Award for Projects of Social Benefit, awarded...

"...to Creative Commons."

Mike Linksvayer kept sitting at his laptop.

"Shouldn't we go get that?" I asked him.

"Yeah," he answered, not moving from his computer.

"Should I come with you?" I asked.

"Yes," he said crisply.

And up we went.

Richard handed Mike the award, and I stood next to Mike as Richard explained to the audience that he wished Creative Commons would talk more about freedom. As Mike accepted the award from the lectern, I did my best to not grin like an absolute idiot. I managed to look somewhat serious in the photo as Mike cropped it; maybe that's the effect of the shadows.

Laroia, Linksvayer, RMS

Asheesh Laroia and Mike Linksvayer of Creative Commons accept the 2008 Free Software Foundation Award for Project of Social Benefit from Richard Stallman. Detail of photo by Matt Hins / CC BY-SA. (Cropped image and this caption by Mike.)

I was immensely pleased. Creative Commons and Free Software, as organizations and as movements, are about lifting unneeded or immoral burdens copyright law levies on people who want to remix, improve, and share. These movements tie together as Free Culture, and they have been a huge part of my life. Moreover, Free Software was the first empowerment movement I could concretely understand.

"Happy hacking!" said Stallman to us as we walked off stage.

[/note/free-culture] permanent link and comments

Tue, 17 Mar 2009

We provide

WE PROVIDE - Rain or Shine – Maximum 10 children for 2 hours

[/scribble/companies] permanent link and comments

Sun, 15 Mar 2009

Asheesh changes the topic

Jon and I were discussing the security of passpack, a password storage website.

<paulproteus> This is slightly interesting but I have other, more pressing, less interesting things to think about.
<signalvsnoise> if you have an addon that is supposed to get fields
<signalvsnoise> ...
<signalvsnoise> like pies?
<paulproteus> man, pie.
<signalvsnoise> and how you want one right now
<signalvsnoise> I want pie too

[/scribble/people] permanent link and comments

Thu, 12 Mar 2009

Today at ETech: Baobab Health Partnership

(I mostly live-blogged this; excuse the messiness.)

Mike McKay gave a fantastic presentation about his work at Baobab Health Partnership. To quote his blurb:

Malawi, Africa, has a population of 14 million. One million are HIV positive and there are just 280 doctors in the country.

Baobab Health Partnership took i-openers, added a PIC to create a touch-screen, and hacked on Power over Ethernet, and configured them to be used as data entry and analysis workstations for HIV clinics in Malawi.

The i-opener was an "Internet appliance" sold as a simple web terminal: given a monthly subscription for the i-opener service, you could have an inexpensive, trouble-free web browsing experience. As it happened, the community swiftly repurposed the devices. According to one i-opener hacker, "What follows is discussion and photographs of a most righteous hack, turning the nearly-free Netpliance i-opener web appliance into a full featured pseudolaptop / electronic photo album." Slashdot saw a lot of excitement around these in 2000 and 2001, but they lacked Ethernet or wifi, so they fell out of favor.

I wanted to give you that background so I can explain that the Baobab Health terminal is the coolest thing I have ever heard of anyone doing with an i-opener. This puts my kitchen recipe terminal plan to shame. Check out patient registration video that shows the terminals in use.

The software stack is all open source: They have an Ubuntu server running MySQL and Ruby on Rails, and the user interface is just an AJAX web application running in full-screen mode. It was developed by Malawaian developers who, says McKay, "were trained in VB. Now they hang out on IRC and flame people on mailing lists. They're part of the Internet."

As they bought a few from eBay for about thirty dollars, they noticed that one person in Nebraska seemed to keep listing them. It turned out that he had been stockpiling the until he figured out something great to do with them. In the end, the Nebraskan gave them two thousand i-openers. (The twentieth century rises again.)

As far as impact, McKay pointed out decision support makes a huge difference in healthcare, as Provonost found at Hopkins years ago with checklists. He also talked about how they data validation dramatically changed the clinics' ability to record useful data; most of the previous paper records were found to be useless as they began to import them into the computerized system. Through comparisons to other people of similar age and gender, they can encourage correct weight measurement data entry. Danny O'Brien pointed out that this does not degrade well in the case of equipment or network failure. I believe Danny was hinting at the question of what happens when the entire system fails (backups? data loss?); McKay understood it to refer to what happens during temporary failures. He explained that with their generators, they've had very good reliability. When the computers fail, many clinics take paper records instead, though some simply lock the doors and stop seeing patients. That clinics choose the latter approach has provided strong motivation for their programmers to fix reliability issues!

McKay went on an aside to talk about having his appendix removed when he was in the States. "All of a sudden, I was exposed to the American health care system, and I was shocked at how broken it is. There's no computers anywhere! The only computers around are at billing."

To get to electronic medical records in the US, he says, "Maybe we just need a disruptive piece of technology."

As for how to spread the program to other countries or circumstances, McKay explained, "One of our problems is the next generation of hardware. I'm not going to send this hacked i-opener to a country where there aren't technicians who know what to do with them."

The program was started by Gerry Douglas in 2000. Douglas is now based in Pittsburgh, though he spends four to five months of the year in Malawi. Also, the data collected by the project inform policy discussions at a quarterly HIV Forum in Malawi. A man from the NIH suggested that they work on sharing that data and doing more research.

The presentation was fantastic; the work is brilliant; and the man is friendly and thoughtful.


[/note/event] permanent link and comments

Wikipedia is full (of blobs)

Database error
From Wikipedia, the free encyclopedia
A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "ExternalStoreDB::store". MySQL returned error
"1114: The table 'blobs' is full (10.0.2.161)".

-- Magnus Manske, March 9 2009 12:41:42.

[/scribble/wikimedia] permanent link and comments

Tue, 10 Mar 2009

Photo galleries

I prefer to host my own Internet services. Here is a quick summary of what I have found in the past two hours of my life:

It's 2009. Am I just going to use Flickr and call it a day?

I selected these by looking through web photo gallery programs on freshmeat.net. I'd like a dynamic one so I can accept comments.

It does seem there are some decent galleries for WordPress, in particular NextGEN Gallery.

What do others in the autonomous world do? Am I supposed to write my own?

(Waiting for Gallery3 is probably a good path forward, failing that.)

[/note/debian] permanent link and comments

Sat, 07 Mar 2009

That LJ post about Watchmen - Yeah, it sucked

My friend Lisa pointed me to Andrei's review of Watchmen.

The review begins with surprises. In the edited version that is online now, we learn watching the movie caused "the destruction of all of [Andrei's] hopes and dreams." That's a sizable investment for a movie based on a comic book. Furthermore, he writes on his LiveJournal post that he felt "punished with awkward sex and slow-mo violence." One would think that a LJer would be pleased with both of those!

Andrei asks, "Is producing copious amounts of blood the only way to generate a dark mood?" This must be a rhetorical question; any LJer worth his salt knows that changing the theme and typing in all lowercase can do that, too. (In fairness, Lisa points out that this is difficult to achieve in a movie.)

Lisa mentions that a previous version of the post called people who like the movie "sheep." It's good that Andrei does not call them goats, else he raise the ire of Frank the Goat, LiveJournal's beloved mascot. Presumably Frank's global fanbase would have risen up in anger, too. (Frank the Goat has previously written about his ambivalence about not being a sheep.)

There are many things in this review I would not have said. Despite the temptation, I would especially not have shared this utterance with my new friend Quinn, the first person I know to have seen the movie: "Comedian bad, but torn and damaged individual."

The story of Andrei's post is a case study in dramatic irony. He writes of the voice acting, "We got the same crap Bale did in Batman where he thinks making your voice gravelly gives it more gravitas. No, it doesn't." It seems that Andrei thinks calling your friends sheep gives your voice more gravitas. As his friends teach him in the comment section: No, it doesn't.

At least Andrei's user picture (unmodified from original) does elicit the appropriate sense of gravitas.


[/note/people] permanent link and comments

Image problem on post

To: Twisty.Faster at G mail
Subject: Image problem on post, "Field notes from the Equine Behavioral Studies Dept."

Twisty, your most recent post was lovely as always!

I would like to lodge one complaint, however: the image that you reference at the top, whose URL seems to be http://blog.iblamethepatriarchy.com/wp-content/uploads/2009/03/stella.jpg , appears instead as an "Error 404 - not found" page.

"Your query has produced bupkis," your website tells me. I was hoping for an image, and while I have many queries to pose, I intended to keep my questions to myself until I could fully understand the post through visual aid.

Love and admiration as always,

-- Asheesh.

--
There is an old time toast which is golden for its beauty.
"When you ascend the hill of prosperity may you not meet a friend."
-- Mark Twain


[/note/letter] permanent link and comments

Wed, 04 Mar 2009

Segmentation grace

Ladies and gentlemen, it seems that Google has finally done it: they shipped Multics for 386 (and compatible).

NoiseBridge, this evening: I descended the stairs after teaching my introduction to programming class and found Geoff Schmidt in our San Francisco hacker space.

I sat next to Geoff and overheard a lovely conversation with Mike Kan. Geoff told the sad tale of a beautiful, efficient, and dead operating system: Multics. Multics was written in the 1960s as a fast operating system for many users to share an expensive mainframe. It posed design problems never seriously tackled before, and after a decade, it had a practically perfect implementation for each of them. To achieve speed, it expected help from the hardware: registers on the processor split memory into different "segments," creating safe zones for each program to run in. The segment system was powerful and secure enough that a running process could execute code from another without the operating system kernel getting involved. The result was a fantasy come true: protection rings creating legendary security, the flexibility of a multi-user, multi-tasking operating system, and all this at hardly any performance overhead.

(Naturally, such perfection is the result of years of work at MIT. So is Geoff.)

This perfection came at a price: while Multics was uniquely well-designed and well-integrated, it expected specific support from the hardware it ran on.

Geoff's expression deflated, and he pointed out that another operating system arrived on the scene: UNIX. Anyone could study the UNIX source code, and it ran on whatever you gave it. UNIX (a joke on the name Multics) eventually won by being worse: it was slow, unreliable, and worst of all: incorrect. But anyone could read it and make it work on the computer he happened to have. By the mid-1970s, UNIX's dominance over Multics was clear.

Geoff skipped forward a decade to the 1980s. Intel had wanted to build a CPU that could be used as a modern computer, and users had shown that the puny memory protection system offered by the 286 wasn't adequate. The chip designers went back to the drawing board, and they brought back features that Multics invented: segmentation registers and protection rings. When shown these powerful, complex features, today's operating systems mostly ignore these Multics tricks and do the least work possible to build a UNIX-like flat memory model.

Mike interrupted and howled about how he can't buy a "real Macintosh" anymore; Apple's computers were once based on a simple architecture, but now they are built with the same complex Intel CPUs everyone else uses.

But I am writing this because of the release of Google Native Client, a browser extension that allows your computer to securely run machine code written by untrusted people on the Internet. How can it achieve this fantasy? The Wikipedia article summarizes the native client research paper:

Native Client is notable for its novel sandboxing technique which makes use of the x86 architecture's rarely-used segmentation facility.

[/note/debian] permanent link and comments

On daughters of political candidates

"Yes. For some reason, I read that entire paragraph. And the other ones!"


[/scribble/time] permanent link and comments

Sun, 01 Mar 2009

Tax cuts: expire

When I see that MoveOn has put this in my email box:

Subject: 10 Things You've Gotta Know About Obama's Plan

I procrastinate reading it, and I prepare for the worst.

Finally, later in the day, I manage the strength to open it. The list is not the doom and compromise I expected but instead ten lovely things.

The article cites the New York times and declares, "[Obama's budget] lets the Bush tax cuts for the wealthiest Americans expire."

With grace, President Obama lets the worst of the Bush cuts expire; they're just a thousand memories.

[/note/politics/hope-watch] permanent link and comments

Ringtone!

I just did something that is way more exciting than it is technically novel: make my own ringtone that plays when people call my phone.

If you see me tapping desks or laptops or counter-tops rhythmically, usually I'm thinking of the guiro that starts out the R.E.M. song, "Electrolite." So I opened up the audio file I have for that song in Audacity, selected just the quiet part at the start, trimmed the silence, boosted it up 25 dB, and transferred it to my phone. (If you like, you can listen to that audio file.)

Once it was on my phone, the rather-neat Ringdroid app took over and helped me set it as a system ringtone.

Yay!

Twentieth century, go and sleep....

[/note/projects] permanent link and comments

Sun, 22 Feb 2009

A typical request for help

Seen on #freeculture.

<matt-> Oh crap.
<matt-> All of the WordPress plugins disappeared.
<matt-> paulproteus: ^
<matt-> Crap.  Now I made it worse.
<aphid> ..they reappeared?
<aphid> as chupacabras?


[/note/debian] permanent link and comments

Sat, 21 Feb 2009

Anything

<paulproteus> I should blog my favorite story involving Baltimore and crack.
<quinn_norton> you should blog anything involving crack

[/scribble/people] permanent link and comments

Sun, 15 Feb 2009

Ani in Noisebridge (at twenty five to five)

Ani: "Can I have my cock back?"

a pause.

Ani: "The one you're holding."

a pause.

Steen: "You threw it up here; it disappeared. We don't know where it went."


[/note/people] permanent link and comments

Sat, 14 Feb 2009

Embedded Controllers and eternal power

"While running on eternal power the EC does not attempt to enter a low power mode."

-- Richard A. Smith.

[/scribble/olpc] permanent link and comments

Mon, 02 Feb 2009

Apologies

"English is also not my first language. Sorry if my grammar melted your eyes or something."

-- RedK on Slashdot.

[/scribble/people] permanent link and comments

Sat, 31 Jan 2009

Hope Watch begins

I attended Obama's inauguration ceremony in DC a bit over a week ago. It was inspirational and rendered me to tears.

I'm awestuck in the same way to read (a few days late) from Larry Lessig that "Rick Boucher is taking over the Energy and Commerce Committee's Subcommittee on Communications, Technology and the Internet (renamed Telecommunications Subcommittee)".

In 1998, Congress passsed a bill that made circumventing any copy protection scheme illegal; since then, even if you just want to play a DVD on a system where an officially-sanctioned DVD decoder has not been written, you are breaking the law. Rick Boucher introduced a bill to the House named the Digital Media Consumers' Rights Act (DMCRA) that reverses the most egregious of the pains that anti-circumvention law brought. He introduced it first in 2003 and again in 2005. I read these actions as, "I'm going to introduce this bill that is overwhelmingly reasonable. You all can watch as this corrupt institution drops it on the floor."

I know that Rick Boucher being the chair of this committee does not mean that a re-introduced DMCRA will immediately become law. But I believe in gestures, and this one feels like a personal message from this new Democratic Party under President Obama to me that the issues I believe are important have a chance of being addressed.

So here we start Hope Watch. Somehow, here in reality, actions by Congress touch me with hope that we might make this country and world a dramatically better place. If something else touches me, I'll try to make a note.

What an overwhelming feeling that is.

[/note/politics/hope-watch] permanent link and comments

Fri, 30 Jan 2009

Sense of humor

Even if you don’t appreciate my vaguely demented sense of humor (yes, yes, I’m only kidding about auctioning off Obama’s email), protecting your--our-- First Amendment rights is no joke.

-- California First Amendment Coalition, Jan 30 2009.

[/scribble/donate] permanent link and comments

Common sense

"I've always been a big proponent of using common sense, but it seems like this no longer applies."

-- Chad on Foundation-l.

[/scribble/people] permanent link and comments

Tue, 27 Jan 2009

Wikipedia on one page

"I am working on a project to host wiktionary on one web page and wikipedia on another."

-- Stephen Dunn.


[/scribble] permanent link and comments

Nineteenth Century Smileys

Rebekah sent me a link to a New York Times blog post investigating a possible smiley from 1862.

The evidence that it is a smiley rests primarily on the fact that every single typesetted character, including the ";)" about which one wonders, had to be typset by hand. Would someone really make a mistake like this?


[/note] permanent link and comments

Mon, 12 Jan 2009

Baypiggies, January 8: Scrape the web

I gave a presentation at the Thursday, January 8, Baypiggies meeting that was something of a preview of my scraping Tutorial that's coming up at PyCon 2009. (Baypiggies is the Bay Area Python Interest Group.)

If you want to take a look at what I presented, here is what I have. Note that you can grab all of these in bulk by doing:

$ svn checkout http://svn.asheesh.org/svn/public/20082009/scraping-preso/

The presentation itself:

The curry examples:

The actually-working example, Cepstral's weather reading-aloud tool:

Code snippets that might be useful:

Patches welcome! These were quickly half-baked on a truck ride provided by Jim Stockford. (-: I'll be revisiting them later as I prepare more for my PyCon talk.

[/note/preso] permanent link and comments

Wed, 07 Jan 2009

Trisk

I now know why Albert Lee uses the nickname Trisk. I have learned his secret identity.

Steve Jobs died in a car wreck in 1988. The current "Steve Jobs" is San Jose session musician, Roland Trisk. Trisk, who often doubled for Steve Jobs before his death in sales meetings and conferences, had plastic surgery in order closely resemble Jobs. There are hints everywhere-in the enclosure of the Mac LCII, the first NeXT CUBE, even Pixar's first full-length film, Toy Story. Wake up people! The truth is out there!

Thank you, Gizzmonic on Slashdot.

[/note/people] permanent link and comments

Sun, 04 Jan 2009

Actually for the

"actually for the Internet telephony imaginary laptop card"

[/scribble] permanent link and comments