Skip to main content.

Thu, 21 Aug 2008

dd, dd_rescue, and ddrescue

The short answer: "Use GNU ddrescue. GNU stands for Quality."

dd is a classic UNIX utility to read from and write to files (often devices). Typically, one uses it to copy a hard disk to a file, or to image a hard drive by copying a backup onto it.

One hits a problem when the hard disk has errors. In this case, dd abruptly stops working in the middle, reporting an "Input/output error." But when the hard disk has errors, usually what you want is to get an image of all the blocks on the hard disk that are readable - not just the first few before the first error!

(Note for the pedantic: Yes, I know about dd conv=notrunc,noerror. They're so easy to misuse (mostly by forgetting one of those two options) that they're worth avoiding.)

Two tools are available for this particular purpose. Confusingly, one is called ddrescue, and the other is called dd_rescue.

Around 2001, Kurt Garloff wrote dd_rescue. It does what dd does if you pass it some options, but it comes with instructions on how to use it to recover data from drivers, like by running it multiple times or bakcwards. A wrapper script called dd_rhelp automates that process.

When you're running dd_rescue on an obscure OS like Mac OS X 10.3 because you dropped your laptop in Uganda and the Linux partition grew bad blocks and you still want your data, you will find that dd_rhelp is written as a complicated shell script that relies on GNU versions of core system utilities. OS X provides non-GNU versions, and you will waste hours fiddling with compiling those utilities just so you can run some dumb shell script.

In the summer of 2004, the same summer as I dropped my laptop, Antonio Diaz Diaz wrote "ddrescue," a stand-alone C++ tool that does the same things as dd_rhelp, but more sanely and therefore more efficiently. It became an official GNU project. GNU ddrescue, like dd_rhelp, can keep a log file to let itself gracefully pick up after interrputions.

When your hard disk fails, you should turn to your backups. But if you need a tool like these, just remember: "GNU ddrescue."

$ sudo apt-get install gddrescue

[/note/sysop] permanent link and comments

Lamers

Kragen Sitaker and his wife Beatrice were very gracious in hosting me and my brother for a week in Buenos Aires.

I was looking for something on Kragen's website and found a ten-years-old discussion of how to find security problems in software. In it, he writes:

Body text last updated 1998-07-22. Recently has become the most popular page of mine, presumably because a bunch of lamers want to learn how to break into things. [...]
I wouldn't be surprised if calling 100-200 people a day `lamers' results in electronic attacks on me or my machine (kragen.dnaco.net.) All I can say is that people who do this would thereby demonstrate their lamosity.

Lamers, you say? Nelson took this picture of me a few years back. Look at the thumbs-up from the driver!

(Photo available for re-use under Creative Commons Attribution-ShareAlike 2.0.)

Note: Mako addressed this topic earlier this year, and then again more recently.

[/note/ba-2008] permanent link and comments

Tue, 19 Aug 2008

For Timo art with me

Chris Wakelin asks Timo of Dovecot to change his English usage:

I've been meaning to tell you that should be "Yeah" for an informal version of "Yes", otherwise it's a very archaic form of "Yes" or "Indeed" as in "Yea, though I walk in the valley of the shadow of death"!

Stewart Dean points out:

But Timo walks through the valley of the shadow for us all.....so maybe he's entitled.....

Psalm 23:4

[/scribble/code] permanent link and comments

Sun, 17 Aug 2008

Fake Out in Buenos Aires

"Falso," he said.

I accepted the 100 peso (US$30) note back. The only place we had gotten 100 peso notes were ATMs.

I found a different one with a good watermark and handed it to him. (This happened a bit over a week ago.)

[/note/ba-2008] permanent link and comments

Fri, 15 Aug 2008

Hello Planet Debian

I have a face on Planet Debian!

(Thanks to John Wright for setting it up for me!)

[/note/debian] permanent link and comments

Wed, 13 Aug 2008

Sending mail from a laptop

I often find myself on what I would call "hostile" networks: They allow only very limited Internet access, like by blocking port 25 so I can't connect to my mail server. Maybe for you, you're never on filtered Internet access, but your home ISP doesn't let you send mail out when you're not at home, but you want to send email directly from your laptop anyway.

Just do what I do! Let me explain.

Summary

Justification

Implementation in Three Steps

Step 1: ssh tunnel

This is the hardest part. To make things simple, I create a dedicated user on each end.
On the remote server (server)
[me@laptop] $ ssh me@server
[me@server] $ sudo adduser tunnelendpoint
[me@server] $ sudo su - tunnelendpoint
[tunnelendpoint@server] $ mkdir .ssh
On the local machine (laptop)
[me@laptop] $ sudo adduser tunnelclient
[me@laptop] $ sudo su - tunnelclient
[tunnelclient@laptop] $ ssh-keygen -t rsa # make it passwordless
[tunnelclient@laptop] $ cat .ssh/id_rsa.pub | ssh tunnelendpoint@server 'mkdir -p .ssh ; chmod 0700 .ssh ; cat >> .ssh/authorized_keys'
On the remote server
[me@server] $ sudo su - tunnelendpoint
[tunnelendpoint@server] $ nano -w .ssh/authorized_keys
You'll see a key that starts with "ssh-dss". Before that, add this string and leave a space before "ssh-dss":
command="nc localhost 25",no-X11-forwarding,no-agent-forwarding,no-port-forwarding

(Note: "nc" is in the netcat package.)

On the local machine (laptop)
[tunnelclient@laptop] $ ssh tunnelendpoint@server
220 rose.makesad.us ESMTP Postfix (Debian/GNU): "every tragedy is a beauty that has passed"

Hooray! If you see a reply like mine that starts with "220", then all is well.

You're done with the hard part. Now the easy parts.

Step 2: inetd

[me@laptop] $ sudo aptitude install openbsd-inetd

Now edit /etc/inetd.conf to have this line:

127.0.0.1:125 stream  tcp     nowait  tunnelclient    /usr/bin/ssh    -q -T tunnelendpoint@server

Now restart the inetd (sudo /etc/init.d/openbsd-inetd restart) and test it:

[me@laptop] $ telnet localhost 125 
220 rose.makesad.us ESMTP Postfix (Debian/GNU): "every tragedy is a beauty that has passed"

Step 3: Postfix (optional)

This is my favorite part, but it's only necessary if you plan to send email when you're not connected to the Internet.

Just install Postfix, and add this to /etc/postfix/main.cf:

relayhost = 127.0.0.1:125

Restart Postfix and you should be set. Try sending some mail!

Closing

I was inspired by a Debian Administration post, except I had my own ideas about the best way to do it. I still like my way best.

One problem with the above approach is that it requires root on "server". It would be possible to do the ssh tunnel thing without using a separate "tunnelendpoint" account, but instead to add that key to your regular username.

[/note/sysop] permanent link and comments

Tue, 12 Aug 2008

Geocoding location

Writes Aldon Hynes:

A random thought to muck up the works... What about people posting locations from virtual worlds?

Steve's head explodes.


[/scribble/code] permanent link and comments

Sat, 09 Aug 2008

Finding duplicate files

Every once in a while, I know one file is duplicated in many places. This happens, for example, when I have imported photos from my camera into a photo management program and also stored a copy of them somewhere else. Sometimes I have downloaded files twice from the web.

Detecting duplicate files is not hard - you just compare the file contents. The problem is that with large files, and a large number of files, it can take a long time if you compare every file to every other file.

Because I needed to do this for a few gigabytes of photos, and everything I found I either didn't trust or ran too slowly, I wrote my own. Once you detect duplicate files, you generally want to either delete all but one, or to "merge them" via hardlinks so that all the files exist, but they share storage space on disk.

Summary: I had a fairly good approach, but everyone should use rdfind instead of my code.

My approach

You can check out (using Subversion or a web browser) my code at http://svn.asheesh.org/svn/public/code/merge_dups/ .

This approach has to stat() every file at least once, but many files don't have to be read at all. For my photos, this was a huge time-saver.

(Why delete the one with a longer filename? Usually that's the one in some obscure directory named "camera-backup" or "recovered-from-some-dying-computer".)

I trust my code. Plus, it is verbose, printing out what it is doing and why. And the entire program with comments, status message print-outs, and vertical spacing easily fits on my screen.

Other implementations

Today, I decided to go through Freshmeat to see if I could retire my code and just rely on someone else's. So I checked out the reasonable contenders from this search.

find_duplicates by Fredrik Hubinette

It uses the first few kilobytes of the string as a hash, which is probably more efficient that reading the whole thing. It is safe and reads the whole files before marking them as duplicates.

dmerge.cpp by Jonathan H Lundquist

I stopped caring when I realized it calls external programs. I doubt it does it in a correct/secure way, so forget it.

duff by Camilla Berglund

This looks really good, but it doesn't actually do the merging. It relies on a shell script to do the merging, and I don't trust the correctness of the shell script's handling of filenames (due to the whitespace-separated output format of duff itself).

Note to Camilla: If you provided a -z option (like find -print0) to duff, and made sure the shell script respected it, then it would be practically perfect.

fslint 2.14

It was so litlte fun to use I don't even want to talk about it. The benchmarks on the rdfind web page confirm this with data.

rdfind by Paul Sundvall (WINNER!)

Finding software like this is why I look for software not written by me.

Other tools I didn't fully review

Conclusion

rdfind looks great. Every once in a while, two hours are better spent doing research rather than re-inventing the wheel. This is one of those times where I was more useful to my life as a secretary rather than by trying to be a programmer.

[/note/software] permanent link and comments

Mon, 04 Aug 2008

Francisco

Francisco is the name of the very energetic hostel attendant at America del Sud El Calafate.

After offering me a key (literally) for the wireless, he told me the password.

"What are you doing there?," he asked me. "It's email," I answered.

"Email? And how can you see? I can't see any letters." (The fonts are pretty small on my laptop.) "What program is that?"

"Pine," I said. "It's called Alpine."

He paused for a moment, and reported, "You look like a hacker with that." He patted me on the shoulder and wandered off.

[/note/ba-2008] permanent link and comments

Argentina for two weeks

For those of you I haven't told, I'm in Argentina. I've been here since Friday July 31. The idea is to take a week's vacation before heading to Mar del Plata for a week of the Debian conference, Debconf. This year, Debconf is held that beach resort town in the winter. From what I read, Mar del Plata is worth visiting even in this off season. On August 17, I'll be back in the Untied States (*).

The gracious Kragen Sitaker and Beatrice Murch are hosting me and my brother for a week in Buenos Aires. As a side note, right now I'm not in Buenos Aires but in a cold place called El Calafate.

On an overcast wintery day, B.A. looks like someone took a remix of Belgium and Paris and let it wear out a litte more than you'd expect from the Continentals. On any sort of day, from Kragen's and Beatrice's roof, it looks like someone ported Blade Runner to Europe.

(*.) [sic]

[/note/ba-2008] permanent link and comments

Real DOS on a virtual disk

Sometimes you need to run DOS programs, like to flash BIOSs on your laptop. Sometimes, if you're Kragen, that lets you fix ACPI on your BIOS, giving you a hope that X will boot up more often than 1 in 3, sound will skip less, and the first PC card you insert will be assigned a valid IRQ. (The last one is particularly interesting: to get a working PC card before the promised joy of the BIOS update, you have to plug in one card, watch it get assigned the mostly broken IRQ 3, plug in a second card, watch it get assigned the useful IRQ 4, and then you can remove the first one. This is a good way to get a wifi card working.)

Here's a simple HOWTO for getting that going on a Linux machine without repartitioning or booting off external media.

I'll refer to aptitude; I'm assuming you're using a Debian/Ubuntu machine so that makes sense.

Step 1: Install syslinux

$ sudo aptitude install syslinux

Now memdisk is in /usr/lib/syslinux/memdisk .

You should copy it to /boot/ in case your root filesystem is encrypted:

$ sudo cp /usr/lib/syslinux/memdisk /boot/

Step 2: Get your DOS floppy in /boot

Debian packages FreeDOS in dosemu-freedos. Unfortunately that doesn't include a floppy image. Instead:

$ cd /boot
$ sudo wget http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/distributions/1.0/fdboot.img

Step 3: Configure GRUB

Put this in your /boot/grub/menu.list and smoke it:

title FreeDOS
kernel /memdisk
initrd /fdboot.img
boot

Step 4: Reboot, and choose FreeDOS!

Ta-da, you're done.

More options

For bonus points, you can customize the floppy disk image. The easiest way to modify is to mount it loopback:

$ sudo mount -o loop,mode=777 /boot/fdboot.img /mnt/

Then you can copy files into /mnt/, and then when you're done:

$ sudo umount /mnt/

Ta-da, the image has been changed! (Thanks to Kragen for confirming that this actually works.)

The lame old way to customize the image is to use "mtools."

P.S. Thanks to Albert Lee for explaining this trick to me in the first place!

[/note/sysop] permanent link and comments