Thu, 21 Aug 2008
dd, dd_rescue, and ddrescue
The short answer: "Use GNU ddrescue. GNU stands for Quality."
dd is a classic UNIX utility to read from and write to files (often devices). Typically, one uses it to copy a hard disk to a file, or to image a hard drive by copying a backup onto it.
One hits a problem when the hard disk has errors. In this case, dd abruptly stops working in the middle, reporting an "Input/output error." But when the hard disk has errors, usually what you want is to get an image of all the blocks on the hard disk that are readable - not just the first few before the first error!
(Note for the pedantic: Yes, I know about dd conv=notrunc,noerror. They're so easy to misuse (mostly by forgetting one of those two options) that they're worth avoiding.)
Two tools are available for this particular purpose. Confusingly, one is called ddrescue, and the other is called dd_rescue.
Around 2001, Kurt Garloff wrote dd_rescue. It does what dd does if you pass it some options, but it comes with instructions on how to use it to recover data from drivers, like by running it multiple times or bakcwards. A wrapper script called dd_rhelp automates that process.
When you're running dd_rescue on an obscure OS like Mac OS X 10.3 because you dropped your laptop in Uganda and the Linux partition grew bad blocks and you still want your data, you will find that dd_rhelp is written as a complicated shell script that relies on GNU versions of core system utilities. OS X provides non-GNU versions, and you will waste hours fiddling with compiling those utilities just so you can run some dumb shell script.
In the summer of 2004, the same summer as I dropped my laptop, Antonio Diaz Diaz wrote "ddrescue," a stand-alone C++ tool that does the same things as dd_rhelp, but more sanely and therefore more efficiently. It became an official GNU project. GNU ddrescue, like dd_rhelp, can keep a log file to let itself gracefully pick up after interrputions.
When your hard disk fails, you should turn to your backups. But if you need a tool like these, just remember: "GNU ddrescue."
$ sudo apt-get install gddrescue
[/note/sysop] permanent link and comments
Lamers
Kragen Sitaker and his wife Beatrice were very gracious in hosting me and my brother for a week in Buenos Aires.
I was looking for something on Kragen's website and found a ten-years-old discussion of how to find security problems in software. In it, he writes:
Body text last updated 1998-07-22. Recently has become the most popular page of mine, presumably because a bunch of lamers want to learn how to break into things. [...]
I wouldn't be surprised if calling 100-200 people a day `lamers' results in electronic attacks on me or my machine (kragen.dnaco.net.) All I can say is that people who do this would thereby demonstrate their lamosity.
Lamers, you say? Nelson took this picture of me a few years back. Look at the thumbs-up from the driver!
(Photo available for re-use under Creative Commons Attribution-ShareAlike 2.0.)
Note: Mako addressed this topic earlier this year, and then again more recently.
[/note/ba-2008] permanent link and comments
Tue, 19 Aug 2008
For Timo art with me
Chris Wakelin asks Timo of Dovecot to change his English usage:
I've been meaning to tell you that should be "Yeah" for an informal version of "Yes", otherwise it's a very archaic form of "Yes" or "Indeed" as in "Yea, though I walk in the valley of the shadow of death"!
Stewart Dean points out:
But Timo walks through the valley of the shadow for us all.....so maybe he's entitled.....
[/scribble/code] permanent link and comments
Sun, 17 Aug 2008
Fake Out in Buenos Aires
"Falso," he said.
I accepted the 100 peso (US$30) note back. The only place we had gotten 100 peso notes were ATMs.
I found a different one with a good watermark and handed it to him. (This happened a bit over a week ago.)
[/note/ba-2008] permanent link and comments
Fri, 15 Aug 2008
Hello Planet Debian
I have a face on Planet Debian!
(Thanks to John Wright for setting it up for me!)
[/note/debian] permanent link and comments
Wed, 13 Aug 2008
Sending mail from a laptop
I often find myself on what I would call "hostile" networks: They allow only very limited Internet access, like by blocking port 25 so I can't connect to my mail server. Maybe for you, you're never on filtered Internet access, but your home ISP doesn't let you send mail out when you're not at home, but you want to send email directly from your laptop anyway.
Just do what I do! Let me explain.
Summary
- inetd listens on port 125
- Connections to it go through an SSH tunnel that executes "nc localhost 25" on some mail server
- (Optional) A real MTA runs on the laptop, so that I can send mail when offline; when mail delivery fails temporarily, Postfix queues the message until I get back online.
Justification
- Easy. Apps can be configured to use localhost port 25 (or port 125) with no password.
- Correct: Postfix (when using 25) handles sending mail when offline, and reattempts delivery for me.
- Secure: Encryption all the way through the network, with the icing on the cake that this all looks like SSH, so nosy networkers near your laptop can't even see that's what you're doing.
Implementation in Three Steps
Step 1: ssh tunnel
This is the hardest part. To make things simple, I create a dedicated user on each end.On the remote server (server)
[me@laptop] $ ssh me@server [me@server] $ sudo adduser tunnelendpoint [me@server] $ sudo su - tunnelendpoint [tunnelendpoint@server] $ mkdir .ssh
On the local machine (laptop)
[me@laptop] $ sudo adduser tunnelclient [me@laptop] $ sudo su - tunnelclient [tunnelclient@laptop] $ ssh-keygen -t rsa # make it passwordless [tunnelclient@laptop] $ cat .ssh/id_rsa.pub | ssh tunnelendpoint@server 'mkdir -p .ssh ; chmod 0700 .ssh ; cat >> .ssh/authorized_keys'
On the remote server
[me@server] $ sudo su - tunnelendpoint [tunnelendpoint@server] $ nano -w .ssh/authorized_keysYou'll see a key that starts with "ssh-dss". Before that, add this string and leave a space before "ssh-dss":
command="nc localhost 25",no-X11-forwarding,no-agent-forwarding,no-port-forwarding
(Note: "nc" is in the netcat package.)
On the local machine (laptop)
[tunnelclient@laptop] $ ssh tunnelendpoint@server 220 rose.makesad.us ESMTP Postfix (Debian/GNU): "every tragedy is a beauty that has passed"
Hooray! If you see a reply like mine that starts with "220", then all is well.
You're done with the hard part. Now the easy parts.
Step 2: inetd
[me@laptop] $ sudo aptitude install openbsd-inetd
Now edit /etc/inetd.conf to have this line:
127.0.0.1:125 stream tcp nowait tunnelclient /usr/bin/ssh -q -T tunnelendpoint@server
Now restart the inetd (sudo /etc/init.d/openbsd-inetd restart) and test it:
[me@laptop] $ telnet localhost 125 220 rose.makesad.us ESMTP Postfix (Debian/GNU): "every tragedy is a beauty that has passed"
Step 3: Postfix (optional)
This is my favorite part, but it's only necessary if you plan to send email when you're not connected to the Internet.
Just install Postfix, and add this to /etc/postfix/main.cf:
relayhost = 127.0.0.1:125
Restart Postfix and you should be set. Try sending some mail!
Closing
I was inspired by a Debian Administration post, except I had my own ideas about the best way to do it. I still like my way best.
One problem with the above approach is that it requires root on "server". It would be possible to do the ssh tunnel thing without using a separate "tunnelendpoint" account, but instead to add that key to your regular username.
[/note/sysop] permanent link and comments
Tue, 12 Aug 2008
Geocoding location
Writes Aldon Hynes:
A random thought to muck up the works... What about people posting locations from virtual worlds?
[/scribble/code] permanent link and comments
Sat, 09 Aug 2008
Finding duplicate files
Every once in a while, I know one file is duplicated in many places. This happens, for example, when I have imported photos from my camera into a photo management program and also stored a copy of them somewhere else. Sometimes I have downloaded files twice from the web.
Detecting duplicate files is not hard - you just compare the file contents. The problem is that with large files, and a large number of files, it can take a long time if you compare every file to every other file.
Because I needed to do this for a few gigabytes of photos, and everything I found I either didn't trust or ran too slowly, I wrote my own. Once you detect duplicate files, you generally want to either delete all but one, or to "merge them" via hardlinks so that all the files exist, but they share storage space on disk.
Summary: I had a fairly good approach, but everyone should use rdfind instead of my code.
My approach
You can check out (using Subversion or a web browser) my code at http://svn.asheesh.org/svn/public/code/merge_dups/ .
- Organize all the files grouped by size (since only files of equal size can have equal contents).
- For each size that contains more than one file, calculate a hash (MD5) of all the files.
- If any of the files have the same size and MD5, delete the one with a longer filename.
- Continue to the next file size.
This approach has to stat() every file at least once, but many files don't have to be read at all. For my photos, this was a huge time-saver.
(Why delete the one with a longer filename? Usually that's the one in some obscure directory named "camera-backup" or "recovered-from-some-dying-computer".)
I trust my code. Plus, it is verbose, printing out what it is doing and why. And the entire program with comments, status message print-outs, and vertical spacing easily fits on my screen.
Other implementations
Today, I decided to go through Freshmeat to see if I could retire my code and just rely on someone else's. So I checked out the reasonable contenders from this search.
find_duplicates by Fredrik Hubinette
- Homepage: http://fredrik.hubbe.net/hacks/
- License: GPL v2 (good)
- Efficiency: Good (uses file sizes the way I do)
- Language: Pike (weird, but seems okay)
- Strategy: Check sizes; hash; verify by reading file; merge via hardlinks
- Sanity: High
- Rating: Good
It uses the first few kilobytes of the string as a hash, which is probably more efficient that reading the whole thing. It is safe and reads the whole files before marking them as duplicates.
dmerge.cpp by Jonathan H Lundquist
- Homepage: http://www.fluxsmith.com/cgi-bin/twiki/view/Jonathan/DMerge
- License: X11-like (good)
- Efficiency: Good (uses file sizes the way I do)
- Language: C++ (bearable)
- Strategy: Check sizes; hash by calling an external program; verify by calling "cmp"; ...
- Sanity: Low
- Rating: Don't use
I stopped caring when I realized it calls external programs. I doubt it does it in a correct/secure way, so forget it.
duff by Camilla Berglund
- Homepage: http://duff.sourceforge.net/
- License: zlib/libpng (MIT-esque) (good)
- Efficiency: Good
- Language: C (okay)
- Strategy: Check sizes; hash with first few bytes; verify by SHA1 or actual
- Sanity: High (comes with a man page; very tunable; great web site)
- Rating: Don't use
This looks really good, but it doesn't actually do the merging. It relies on a shell script to do the merging, and I don't trust the correctness of the shell script's handling of filenames (due to the whitespace-separated output format of duff itself).
Note to Camilla: If you provided a -z option (like find -print0) to duff, and made sure the shell script respected it, then it would be practically perfect.
fslint 2.14
- Efficiency: Seemed lame
- Rating: Don't use
- Explanation: I tried it, and then I said, "That's it, I'm writing my own."
It was so litlte fun to use I don't even want to talk about it. The benchmarks on the rdfind web page confirm this with data.
rdfind by Paul Sundvall (WINNER!)
- Homepage: http://www2.paulsundvall.net/rdfind/rdfind.html
- License: GPL (v2, probably) (good)
- Efficiency: Excellent
- Language: C++ (okay)
- Strategy: Check sizes; check first bytes; calculate SHA1s; delete dups or create symlinks or create hardlinks or print report
- Sanity: High - object-oriented, well-commented, includes man page, includes benchmarks
- Self-importance: High, but seems deserved
- Rating: Excellent, use it
Finding software like this is why I look for software not written by me.
Other tools I didn't fully review
- finddup by Heiner Steven <http://www.shelldorado.com/scripts/cmds/finddup>
- Language: Shell, which probably means it has problems with complicated filenames
- clink by Michael Opdenacker <http://free-electrons.com/community/tools/utils/clink/>
- Language: Python (yay!)
- Does not support hard links, only symlinks, thereby (to the author's own admission) creates permissions problems
- dupfinder by Matthias Böhm <http://doubles.sourceforge.net/>
- Sanity: Moderate to Low - thinks that not using hash functions makes it "much faster" than other programs
- dupmerge2 by Rolf Freitag (continuation of work from Phil Karn) <http://sourceforge.net/projects/dupmerge/>
- Sanity: Moderate to Low: Bundles a pre-compiled binary, which is just weird
- dupseek by Antonio Bellezza <http://www.beautylabs.net/software/dupseek.html>
- Focus on interactive duplicate file removal. Probably good at that; I want correct, unattended operation.
- freedup by William Stearns <http://freedup.org/>
- Looks fairly good, even though it's written in bash (freaks me out)
- Offers an option to strip metadata and compare only file *contents* for MP3, MPEG4, MPC, JPEG, and Ogg (not FLAC, I guess), which is great.
Conclusion
rdfind looks great. Every once in a while, two hours are better spent doing research rather than re-inventing the wheel. This is one of those times where I was more useful to my life as a secretary rather than by trying to be a programmer.
[/note/software] permanent link and comments
Mon, 04 Aug 2008
Francisco
Francisco is the name of the very energetic hostel attendant at America del Sud El Calafate.
After offering me a key (literally) for the wireless, he told me the password.
"What are you doing there?," he asked me. "It's email," I answered.
"Email? And how can you see? I can't see any letters." (The fonts are pretty small on my laptop.) "What program is that?"
"Pine," I said. "It's called Alpine."
He paused for a moment, and reported, "You look like a hacker with that." He patted me on the shoulder and wandered off.
[/note/ba-2008] permanent link and comments
Argentina for two weeks
For those of you I haven't told, I'm in Argentina. I've been here since Friday July 31. The idea is to take a week's vacation before heading to Mar del Plata for a week of the Debian conference, Debconf. This year, Debconf is held that beach resort town in the winter. From what I read, Mar del Plata is worth visiting even in this off season. On August 17, I'll be back in the Untied States (*).
The gracious Kragen Sitaker and Beatrice Murch are hosting me and my brother for a week in Buenos Aires. As a side note, right now I'm not in Buenos Aires but in a cold place called El Calafate.
On an overcast wintery day, B.A. looks like someone took a remix of Belgium and Paris and let it wear out a litte more than you'd expect from the Continentals. On any sort of day, from Kragen's and Beatrice's roof, it looks like someone ported Blade Runner to Europe.
(*.) [sic]
[/note/ba-2008] permanent link and comments
Real DOS on a virtual disk
Sometimes you need to run DOS programs, like to flash BIOSs on your laptop. Sometimes, if you're Kragen, that lets you fix ACPI on your BIOS, giving you a hope that X will boot up more often than 1 in 3, sound will skip less, and the first PC card you insert will be assigned a valid IRQ. (The last one is particularly interesting: to get a working PC card before the promised joy of the BIOS update, you have to plug in one card, watch it get assigned the mostly broken IRQ 3, plug in a second card, watch it get assigned the useful IRQ 4, and then you can remove the first one. This is a good way to get a wifi card working.)
Here's a simple HOWTO for getting that going on a Linux machine without repartitioning or booting off external media.
I'll refer to aptitude; I'm assuming you're using a Debian/Ubuntu machine so that makes sense.
Step 1: Install syslinux
$ sudo aptitude install syslinux
Now memdisk is in /usr/lib/syslinux/memdisk .
You should copy it to /boot/ in case your root filesystem is encrypted:
$ sudo cp /usr/lib/syslinux/memdisk /boot/
Step 2: Get your DOS floppy in /boot
Debian packages FreeDOS in dosemu-freedos. Unfortunately that doesn't include a floppy image. Instead:
$ cd /boot $ sudo wget http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/distributions/1.0/fdboot.img
Step 3: Configure GRUB
Put this in your /boot/grub/menu.list and smoke it:
title FreeDOS kernel /memdisk initrd /fdboot.img boot
Step 4: Reboot, and choose FreeDOS!
Ta-da, you're done.
More options
For bonus points, you can customize the floppy disk image. The easiest way to modify is to mount it loopback:
$ sudo mount -o loop,mode=777 /boot/fdboot.img /mnt/
Then you can copy files into /mnt/, and then when you're done:
$ sudo umount /mnt/
Ta-da, the image has been changed! (Thanks to Kragen for confirming that this actually works.)
The lame old way to customize the image is to use "mtools."
P.S. Thanks to Albert Lee for explaining this trick to me in the first place!
