|
|
Surviving I/O Errors
While onsite last week on a contract, I ran into disk woes. I had worked many hours in a Parallels session. I reached a milestone, so I decided it would be a good time to shutdown the virtual machine, clone it, and continue on a copy. Poor man’s version control (given the weird set-up, it was about the best I could do). Imagine my surprise when my attempt to duplicate the machine image failed. I had installed a bunch of software into the image, which resulted in about 5GB of new data. Apparently some of that fresh data ended up landing on a disk block that was about to go bad. About 114MB into my disk image, there was a 4K unreadable section. Nuts.
It was interesting how different software coped: Finder issued an error message and wouldn’t complete the copy. I won’t begrudge the Finder for throwing up its anthropomorphic hands on an IO error — the Finder is a widespread app with a wide user demographic. Design-wise, I’d be hesitant to put any type of recovery options in there. Although link to some help docs explaining your options would be nice for unsophisticated users.
My only real gripe here is a UI one: I hit an IO error when copying a file. If this doesn’t demand a red-stop-sign “error” dialog, I don’t know what does. The Finder’s dialog looks and feels more like a warning than an error. The
For the rest of the onsite, I successfully used the rescue-cloned disk image. I’m not sure where that 4K of zeros wound up on the image’s file system, but I didn’t hit anything weird in the rest of my time. It kept me going until I returned home and can now clean-reinstall everything. This lone bad file presented another problem: backing up. I use Retrospect nightly to back up, so I was interested in how it would cope with a bad file. Pretty much as well as one could hope: it backed up everything except that file and let me know about the “execution error”. While Retrospect’s terminology sucks, it Did The Right Thing. The troubled drive in question is my main Firewire portable drive (where my home folder lives), so I was eager to completely move off it and stress-test it to see if this just a random incident or a signal I need to replace the drive wholesale. Of course, cloning drives is SuperDuper’s forte, so I attempted to use it to copy my portable drive onto a spare drive. Bad news: unlike Retrospect, SuperDuper halted on the IO error without an option to continue (the underlying Fortunately, Retrospect also has a drive cloning function. The UI is far less slick than SuperDuper’s, but it uses the same engine that continues in the face of IO errors. The result: an error log and a 114MB file, representing the beginning of the bedeviled image file up until the bad section. I don’t have any statistics how common IO errors are versus total drive failures, but it’s clear that a single simple IO error could put most users into a jam they couldn’t get out of. As a user, I feel like there should be a high-level handling of this failure scenario. However, as a developer I’m unsure about the best course of action. Zero-filling was the right solution for this case, but certainly isn’t in many other cases, and it’s nontrivial for the software to make the decision for the user. Also as a developer, I fear any type of automated solution would provide a disastrous band-aid. I can easily imagine a drive slowly dying with the user clicking through multiple “There was a disk error. Rescue the affected file?” alerts until the drive completely falls over. You could put in wording advising the user to back up, though I don’t know how much difference it would make. In the cruel real world, some users tend to keep going until blocked, and it’s better they get hung up on an IO error than a disk failure. Update: Alastair Houghton and Dave Nanian contribute to a followup. Monday, June 19, 2006
|
Contact Me Topics RSS Feed Linkblog
Bill Bumgarner Brent Simmons Daniel Jalkut Dave Dribin Eric Albert Eric Rescorla Eric Sink Greg Miller Gus Mueller Jeremy Zawodny John Gruber Mark Dalrymple Michael Tsai Peter Ammon Raymond Chen Ryan Wilcox Scott Stevenson Steven Frank The Daily WTF we hates software Wil Shipley |
Copyright © 1997-2010 Jonathan 'Wolf' Rentzsch. All rights reserved.
Questions? Comments? Contact Me.