My god, it's full of stars!
 
March 8th, 2010

zip vs. tar + gzip

Just had the need to create an archive of a folder containing 91 large text files, totaling 370MBs. Decided to pit zip against tar + gzip in a little speed test, using these commands:

tar cvzf awstats.tgz awstats
zip -9ry awstats.zip awstats

On the server in question, these were the elapsed times to accomplish this very similar task:

zip: One minute, 21 seconds
tar: 41 seconds

This is, in part because tar only has to compress once, after concatenating all the bits together (but that’s not the full story). In contrast, zip has to compress each file individually. And resulting archive sizes?

-rw-r--r-- 1 cdt cdt 141877473 Mar 8 10:31 awstats.tgz
-rw-r--r-- 1 cdt cdt 140081519 Mar 8 10:29 awstats.zip

So zip did have a slight advantage in the output size. But wait.. no fair! We used the “-9″ option with zip for maximum compression. To make it more fair, let’s use the “-9″ flag with gzip as well. Unfortunately, to do that we’ll need to run two consecutive commands:

$ tar cvf awstats.tar awstats ; gzip -9 awstats.tar

This caused the compression time for gzip to go way up; that command took 1:17 to run. But now the filesizes are approaching identical:

-rw-r--r-- 1 cdt cdt 140090837 Mar 8 10:42 awstats.tar.gz
-rw-r--r-- 1 cdt cdt 140081519 Mar 8 10:29 awstats.zip

Of course these kinds of things are very circumstantial – doing a similar test on a folder full of pre-compressed files like MP3s would yield very different results (in that case you’d be way better off just using tar without gzip, and definitely not zip). But the upshot is that when trying to decide whether to use zip or tar + gzip, compression times and output sizes are close enough to just not matter in general usage.

Update: I did end up doing a later test on the same dir with bzip2. Result: significantly smaller file size:

-rw-r--r-- 1 cdt cdt 104698994 Mar 8 14:17 awstats.tar.bz2

but at the expense of much longer compression times. If I use gzip and bzip2 side by side on the same 370MB tar file, I get these times:

gzip: 41 seconds
bzip2: 1 minute 36 seconds

Making bzip2 almost twice as slow as gzip (though it does generate smaller output files).

March 1st, 2010

Home Backup to the Cloud

Four years ago, long before Time Machine and the wide availability of cloud storage, I purchased a RAID/NAS for home backups. It’s done its job admirably, and has given us the confidence to back up the whole family without fear of drive failure. Even went as far as drilling holes in the floor and threading CAT-5 under the house so I could keep the Infrant in the closet, where it would make less noise.

It’s worked well, but the big problem it didn’t solve is the fire/flood/theft scenario. One good earthquake and all those images and videos of our child’s early years would be Gone Daddy Gone. Plus, my backup system was based on rsync. That worked fine, but was a bit too manual, and I had had occasional problems getting backups to complete to the non-Mac filesystem on the Infrant.

This problem had been hovering in the back of my mind for quite a while, when a dad at the local park mentioned that he had had  success with Backblaze. For $5/month, you get hands-off unlimited backup of your entire system to their data center. Drive space is dirt cheap these days, so it’s tempting to rely on purchased drives, but let’s do the math. Let’s say you spend $100 for a 500GB drive.  That’s the equivalent of  20 months of Backblaze service. If you go for the one-year commitment, you get the service for $4/month, so let’s say two years for the drive you just bought “cheap” to pay for itself. And you still haven’t got fire/flood/theft insurance. Seemed like a no-brainer to me, so I went for it.

My starter data set was 300 GBs – a healthy pile of bytes. Backblaze noted that the initial backup could take a couple of weeks, but in my case, the initial backup took more than three weeks, even over a fast broadband connection. After the initial backup  is complete, incrementals happen quickly, with no  interaction required.

Installation and backup management takes place through a preference pane on the Mac. It’s elegant, but I did have some problems along the way. At a certain point, halfway through the initial backup period, the pref pane informed me that the backup was complete, even though it wasn’t. It continued to report this for the next 10 days, even though I could see the bztransmit process chugging away in the background. The pref pane  provides a count of the number of files and their total size; to get this to update, I’d have to unmount and remount my external data drive, then wait 3-4 hours for the process to rescan volumes and report new information.

At this point,  I’ve made it through the initial backup and have added 150MBs of new data  to the external drive. The preference pane does not report any change to the totals, even though I have confirmed that the newly added files are available on Backblaze servers. I also had a number of instances where the bztransmit process would swell to consume very large (> 2GBs) amounts of memory. In some cases, the process memory would eventually come back down on its own. In others I had to manually kill all bz* processes and restart the backup process. It’s as if the backup process is running fine, but the preference pane is  unaware of what those processes are actually doing. Annoying, but not a deal-breaker.

I corresponded with Backblaze tech support during the process, and found them super-responsive, and not afraid to share detailed technical analysis of the process. They weren’t able to answer all of my questions about why the pref pane didn’t seem to know what the backup process was actually doing, but they were super detailed and quick, and I appreciate that.

Despite these glitches, my test restores have all gone well.

There is one little financial  hitch in my plan: That $4/month is only for one machine. I’ll have to spend more to be able to back up other computers in the house. I’m still mulling that one. In any case, it feels great to know that my backups are complete, even if disaster hits the home some day. And now that the glitches of the initial backup period have passed, it should be pretty smooth sailing ahead.

There are other cloud backup systems for the home out there, like CrashPlan and Amazon S3 with S3Hub. I haven’t tried them. If you have, what have your experiences been like?

February 23rd, 2010

delicious word cloud

wordle.net not only lets you generate tag clouds out of any chunk of text (which can be great for doing things like figuring out which keywords a politician emphasizes the most in a speech), it can also scan your delicious bookmarks to give you a weighted view of the kinds of things you keep track of. Kind of a zeitgeist snapshot of the inside of your head. It appears that I bookmark work-related/tech stuff almost exclusively. I do have a lot of non-tech bookmarks in delicious as well, but they’re drowned out in the frequency ranking by webdev stuff.

January 7th, 2010

NuForce uDAC

A few weeks ago, during a spell of unusually dry winter weather, I went to unplug a pair of Grado SR-80 headphones from my iMac. A spark of static electricity leapt from my fingers, I heard a brief crackling sound, and then… [silence]. From that moment forward, the headphone/speaker jack on the back of the Mac has refused to work, and only “Internal Speakers” showed up in the System Preferences Sound panel. My trusty work Mac had gone mute.

My only options were either to send the Mac in for repair or switch to USB audio output. I couldn’t afford to be without the Mac, and I was interested in hearing what kind of audio upgrade I’d get by bypassing the Mac’s internal Digital Audio Converter (DAC), so I hit up an audiophile friend for recommendations. Hit the jackpot when he suggested the NuForce μDAC (aka microDAC) – a handsome $99 outboard DAC smaller than a pack of smokes.

The unit arrived a few days later and I was blown away from the moment I plugged it in and enabled it in the Sound prefs Output panel. Digital audio has never sounded better on a computer I’ve owned. But since the original analog jack was fried, I had no way to directly compare the quality of the Mac’s native DAC with the new outboard. Today I sat down at someone else’s work Mac and did some A/B testing.

For the test, I chose two recordings:

  • Sonny Rollins: “I’m an Old Cowhand” (from Way Out West)
  • Beatles: “Because” (from Abbey Road 2009 Stereo Remaster)

(I chose these two because A) I love them and B) I had them on hand at 256kbps AAC, for best possible resolution).

Full disclosure: I appreciate great-sounding audio, but I’m far from a hardcore audiophile. For a balls-out audio tweak’s perspective on the μDAC, see HeadphoneAddict’s review at head-fi.org.

Just a few minutes into Cowhand, I noticed something I’d never heard before: The sound of the cork linings of the valves of Rollins’ saxophone tapping away as he played. It was subtle, but it had been there in the recording all along – I had just never noticed it. And that’s exactly the point – the differences are subtle, and you may not notice all of them unless you’re listening for them, but they’re present. And that subtlety adds up to an overall experience that’s simply more realistic, more nuanced than what you get with the cheaper DAC built into consumer PCs. It’s all about presence.

Likewise, I found the harmonies in Because fuller, richer, more bodied than they sounded through the Mac’s native DAC. The French horns far more alive and breathy, the harpsichord more twangy. Virtually everything about these two tracks sounded more engaging.

Another thing I noticed: Usually, near the end of a long day writing code, I feel the need to take the headphones off and rest my ears. I didn’t have that sensation today. I can’t say for sure, but I suspect that more natural sound is less fatiguing to the ears (and the brain’s processor).

One caveat, and this is true for any USB audio system attached to a computer: Because there’s no longer an analog sound channel for the computer to manipulate, you’ll lose the ability to control volume or to mute from the Mac’s keyboard. That habit has been ingrained for so many years I don’t even think about it, so retraining myself to adjust audio from the μDAC’s volume knob will take some getting used to. However, you can still use the volume control in iTunes itself, and it may be possible to re-map the keyboard’s audio control keys to tweak iTunes’ internal volume directly.

In any case, the NuForce μDAC is one of the best c-notes I’ve dropped on audio gear over the years. Recommended even if you haven’t fried your analog port.

December 22nd, 2009

(I Don’t Care About) Facebook and Privacy

I’m puzzling over the recent brouhaha regarding Facebook’s changes to their privacy policy. To be clear: I’m not puzzling over the changes (though they are confusing to the user who just wants to use the service instead of thinking about its internal minutia) – I’m puzzling over the concern about them.

computer-privacy.jpg Blogs are 100% public. Twitter is 100% public. Posting on newsgroups and forums is 100% public. The web in general is a public space. I’m wondering WHY there are such dramatically different expectations on Facebook than everywhere else. Fine-grained control over exactly who gets to see exactly what? All of this comes down to a single problem: Millions of people apparently want to have a web presence and yet be private at the same time. Everywhere else online, it’s one or the other.

For me, it’s simple: If what you have to say shouldn’t be said to the whole world, then don’t say it online. In other words, the basic assumption is wrong to begin with. Facebook is trying to give you the sense that you can post online and control your privacy at the same time. It doesn’t work.

Actually, this problem isn’t limited to the web. When you walk down the street, you’re on public display. You don’t pick your nose in public because… well, you just don’t. You don’t need to be told that that’s something you do in private. If you have something private to say to someone, you whisper in their ear, or you call them. Or you email them. Don’t post it where others can see it.

The idea that I should be able to play online but not have to worry that my thoughts are completely public just seems… unrealistic. How many stories have you read about people being fired or worse over comments they’ve made on Facebook? Did their privacy settings protect them? No – things get out. The problem is not Facebook’s new privacy settings, but an epidemic of oversharing. It’s a problem that should be solved the same way we solve it in the real world – by being discrete – not by adding more dials and levers to our interactions.

fbprivacy.gif

Then there’s the question of reach. In general, people want to be heard. They pay close attention to the number of Facebook Friends or Twitter followers they currently have. Bloggers watch their traffic logs obsessively. Why? Because they want their thoughts to be heard as widely as possible. Guess what gives your thoughts the widest possible reach? Completely open platforms with no concept of privacy, like Twitter, blogs, and forums. In those spaces, it’s up to the user not to broadcast things they don’t want the whole world to see.

I’m personally glad that Facebook is gradually nudging users to share more content publicly, putting the brakes on this expectation that people can post online but not be public. When was the last time a Facebook post showed up in your Google search results? OK granted, I wouldn’t want most Facebook posts polluting my search results (there’s a whole lot of noise out there), but there’s also a lot of great content locked away behind the “privacy” firewall that really should be part of the public web — which is built on concepts of openness and transparency.

The fact that only people who “friend” me can see my content on Facebook is an annoyance to me, not a feature I cherish and wring my hands over. My dream “privacy” preference for Facebook would be a simple checkbox option reading “I acknowledge that I’m writing stuff on the web. Treat my content as such.”

Update 01/04: In an interview in front of a live audience, Facebook founder Mark Zuckerberg says if he were starting all over again, he’d make everyone’s information public. Because that is the “social norm.”

December 3rd, 2009

iTunes Remote Control

bragg.jpg Scenario: Music collection on an iMac in the office on one end of the house, pumping music over Airport Express to stereo in the living room on the other. Need to be able to remotely navigate collection and control playback from a laptop in the living room.

Seemingly perfect solution: iTunes Remote app for iPhone, connecting to the office Mac via wi-fi. Close, but not quite. At first, iTunes Remote app seems like the perfect remote control, complete with album covers. But a real remote you can pick up and operate on a moment’s notice, no strings attached. The iTunes Remote app, on the other hand, takes around 10 seconds to re-connect to the remote library every time you want to use it. You wouldn’t accept that kind of delay from any other remote control, so iTunes Remote gets annoying fast.

Alternative 1: Enable iTunes Sharing on the office Mac, then launch a copy of iTunes on the living room laptop and access the shared library. Configure iTunes to send music from the laptop directly to the AEX. Problem solved? Not quite. I rely heavily on the ability to rate tracks as they roll through. 1 or 2 stars for the tracks I can live without, then periodically cull duds from the collection based on ratings. Tracks with 4 or 5 stars form the basis for my best playlists. Unfortunately, when connecting to a remote library in this way, you have read-only access, and no way to rate tracks on the remote box. Bzzzzzt, deal-breaker.

Remote_iTunes_Logo_1.jpg Alternative 2: Third-party software. There are a few shareware packages available in this niche, but the only one I found that worked reliably was Jonathan Beebe’s open source Remote iTunes. The interface is a stripped down clone of iTunes itself, but its remoting ability includes something iTunes does not – the ability to authenticate as an admin user. Enter the IP of the office Mac, a username and pass, and give it a few seconds to pull across the music library index. Once connected, it stays connected, and you get the ability to rate tunes on the remote system. It’s not perfect, but close enough for jazz.

I’d love for iTunes itself to grow this ability so I’d have access to all iTunes features. Alternatively, I’d kill (not literally) for a desktop version of the iPhone Remote app. But Remote iTunes gets the job done with less pain than anything else I’ve tried.

September 7th, 2009

Populate Mailman Lists from Django Projects

I spent much of the summer building an intranet in Django for Miles’ school. Since the school is a co-op, we need to keep track a lot of stuff – charges, credits, and obligations, parents, students, teachers, family jobs, committee membership, the board, etc. etc. I’m happy with how the site came out, but unfortunately can’t share it here, since it’s a private site.

One of the goals of the rebuild was to put an end to the laborious manual process of maintaining the school’s multiple overlapping mailing lists. Since all of those relationships, people types, and groups were already stored in the intranet’s database, I figured it should be possible to run various queries and populate Mailman mailing lists from them directly. Due to the messy nature of the real world, the process was a lot trickier than it sounds on paper, but I eventually did get a smoothly working list generation system up and running, talking to our Django system and working with virtually no manual intervention. Members can update their own profiles and find that their mailing list subscription address has changed automatically a few hours later. Administrators can give someone a new family job or board position and that person will find themselves subscribed to the right mailing list for it later that day.

Since there isn’t much published out there on making these two systems (Django and Mailman) play nicely together, I decided to publish the scripts and document the recipe I used to get it all working. Hope someone finds the system useful.

June 1st, 2009

GMail vs. Mail.app

Gmail-1 Confession: I’ve never liked webmail – I was a hardcore Eudora user for ages, then spent five years with BeOS desktop mail clients, then a year with Entourage on the Mac before finally switching four years ago to Apple’s Mail.app, with its flawless IMAP implementation. Every time I’ve tried the “next generation” of webmail clients, they’ve felt anemic to me, and I’ve felt like my workflow slowed way down — not because they were slow per se’, but because of the dozens of small niceties you get with desktop clients that you don’t get with webmail. I’ve relegated webmail to something you use when you’re not at your own machine for some reason and/or aren’t able to take the two minutes it takes to configure IMAP at a foreign machine.

Mailapp-1 That’s why I’ve always been amazed to see how many developers and gear-heads use GMail. These are tech-savvy people, who I’d think would have the same frustrations with webmail that I do. What are they seeing that I’m not seeing? I totally get the convenience factor of being able to access my mail through any web browser, anywhere. I wouldn’t mind having that, but so far it hasn’t seemed worth the sacrifices. I know GMail keeps getting better, so thought it was finally time to give myself over to GMail for a week and see how it goes. Here are some notes on that experience.

n.b.: I’m using Google’s official list of keyboard shortcuts. I used the 3rd party tool A to G to convert Apple’s Address Book to CSV, then imported 1200 contacts into GMail’s contact system.

My list of GMail gripes, with a few faint praises in the mix:

- No way to change the default reading font. Really??? The default reading/writing font is just too small to be comfortable (for me), and it’s ridiculous that something this straightforward and ubiquitous in desktop clients would not be there. How hard can it be to give the user a choice of common font faces and sizes? Does not compute.

- No way to quote previous text before replying. Every desktop mail client I’ve used lets you select a block of text in a message, then hit Reply. Only the selected text appears in the reply. This is so core to netiquette and to my every day workflow that it seems like a non-negotiable feature. And yet no webmail client I’ve tried supports it. Not even GMail. No wonder over-quoting is such a problem these days. Later… OK, I discovered that this “feature” is actually available under Settings | Labs. When I enabled it, it complained that it could “not be loaded,” and continues to complain every time I exit the Settings menu, though it did work correctly in my first test. Cool, but why is it in Labs, as if it’s some kind of optional convenience that only a few people might want? How can this not be part of the default package? Core functionality.

- Inline photos. A family member sent 10 photos as attachments. When viewed in Mail.app they’re displayed inline, nice and large; GMail only shows thumbnails inline, though you can click “View all images” to see them full size on a separate page. There is of course no option to “Save all to iPhoto” in GMail. Since they were family photos, that’s exactly what I wanted to do.

- No preview pane. For realsies? I know of at least two webmail clients (RoundCube, which is available on Birdhouse, and Apple’s mac.com (errr, me.com)). If they can do it, why can’t one of the most popular webmail clients of them all?

- More clicks to view the next message. When done viewing one message, if you click Delete or Archive, you’re taken back to the full message list, which lacks a preview pane. So you then need to click again to view the next message. This kind of “more clicks/keystrokes to accomplish common tasks” is all over the place in GMail.

- No way to turn external mail checking on/off. I now have GMail configured to work as a POP client to two external accounts (would have configured it as IMAP, but GMail doesn’t support that, even though you can use external clients to talk IMAP to GMail – weird). Now I’d like to have GMail stop checking those two external accounts for a while, without removing all the config info. Too bad – the only way to make it stop is apparently to delete the account completely. Grrr…

- Poor conversation threading. GMail does an OK job at this – better than other webmail clients, but nowhere near as clean visually or as easy to navigate as threaded discussions in desktop mail apps. And because GMail shows a thread all on one page (thanks again to no preview pane), deleting individual messages out of the thread takes a lot more scrolling and clicking than it does in a desktop client. GMail’s threading is a pale imitation of technology we’ve had on the desktop for years. However, I really do like being able to see my own replies automatically in the context of the thread, even without having explicitly cc:’d myself, and without having to dig through the Sent folder. But the ease of expanding and collapsing a thread, of jumping to the next unread message in a thread, of deleting individual messages from a thread… all vastly superior in Mail.app.

Threadcollapse
In Mail.app, a thread is indicated by the presence of an arrow in the left column.

Threadexpand
Cmd-RightArrow expands the thread; spacebar jumps you to the next unread message in the thread. The actual conversation is shown in the Preview pane. It’s easy to delete individual messages from the thread.

- Keyboard shortcuts. Yes, there are some. Yes, they work for the most part. But they’re not as ubiquitous or as clean to use as the keyboard shortcuts in a desktop client. I found myself doing a lot more mousing in GMail than I’m accustomed to doing in email.

- Adding contacts. I get a message from someone who’s not in my Contacts list. If there’s a way to add this person to my Contacts list on the fly, I’m not seeing it (yes, I looked). Mail.app makes this common process trivial and intuitive.

- Moving messages between accounts. One of the ways I rely heavily on IMAP is the ability to drag and drop messages between various mailboxes and servers. If I receive a message at work that I want to handle at home, I drag it from calmail to birdhouse, and vice versa. If I want to pull something out of cold storage (e.g. from a local mail store and put it back on a live mail server for handling), I can do that. GMail can be configured to talk to multiple accounts, but since it itself does not work like an IMAP client to foreign mail servers, it can’t do any kind of inter-server message moving. I guess the idea is that its model makes this kind of thing irrelevant, but it feels like a big missing piece of the modern mail experience.

- Integrated chat. Both GMail and Mail.app have this, but GMail clearly wins here when you’re at someone else’s computer since you don’t have to set up both the mail and chat clients (thanks @jrue for this point).

- “Send Again” feature. Not something you use a lot, but when you do, it’s a real time saver. Use this after sending a message to someone who’s address has died and you want to try again to the right address, or when you left someone off the original cc: list. Mail.app and other desktop clients have it. GMail doesn’t.

- Breaks quoting. Let’s say you’ve got a paragraph of quoted text in an incoming message and you want to reply to it in two parts. In a desktop client, you put the cursor where you want to break the graf and hit Return. A new quote mark is automatically added to the beginning of the new line. Not in GMail – you end up with the first line that should be quoted suddenly unquoted. Later… turns out this does work properly in rich text mode in GMail, but not in plain text mode. But I prefer to stay in plain text mode, only switching to rich text mode when necessary.

quotebroken
While replying in plain text mode in GMail, insert cursor in the middle of a paragraph and hit Return to start your reply. The new line lacks a starting mail quote mark, breaking netiquette and readability for the recipient.

- No Data Detectors. OK, this is only available in Mail.app, not all desktop mail clients, but it really is a killer feature. Roll over any date or time in any format, or any person’s name or email address, even in a plain text message, and you get a little drop-down menu that lets you quickly add that item to your calendar or address book.

Datadetector-1

Data detectors do an amazing job of figuring out all the right fields — almost magic (try it with messages referencing “tomorrow” or “next Tuesday.”) GMail does have an “Add Event” option but it’s nowhere near as intelligent or as slick, and it works for the whole message, not for individual text snippets within the message. Big win for Mail.app.

- Partial word searches. The search feature in GMail is nice, but is not better than the one in Mail.app. Yes, Google is a bit faster at returning results, but not by much (yes, Apple’s Spotlight is *that* fast). But here’s the kicker – Google and GMail can’t do partial-word searches. So if I’m looking for an email that I know includes the word “question” but I just type “quest” [Return] into GMail search, it turns up nothing! Wildcard searches don’t work either. Very frustrating. Even on their native search turf, Google loses to Apple. Update: There are also types of searches Mail.app can’t do, such as combined OR statements. So let’s call this one a draw.

- End-of-line key combo. On the Mac, the standard keyboard shortcuts to jump the cursor to the start/end of the current line are Cmd-RightArrow and Cmd-LeftArrow. These don’t work in GMail. In fact, as far as I can tell there’s no keyboard short to do this on the Mac in GMail. Which amounts to one more reason GMail is a lot more mouse dependent than using Mail.app or other desktop client. Can’t blame this on rich text editors either — WordPress uses a TinyMCE variant, and Cmd-RightArrow works there just fine. GMail is just broken in this respect.

- Ads in my email. They just bug me. I totally understand that that’s how I pay for the service. I get that. I still don’t like looking at them. Irritating. In fact, I found the whole GMail experience more cluttered and just… less elegant than working with a desktop client.

ADDED LATER

- Multiple windows. Sometimes I like to have two or more messages open at once, plus a compose window, so I can copy/paste bits around and between messages, or for reference while writing something new. Easy to do in a desktop client. Assumed I could do similar in GMail by cmd-clicking messages to open them in various tabs, but nope – GMail doesn’t allow that – forces you to only be looking at one thing at a time. Is that a feature they haven’t implemented yet, or an intentional limitation? Feels like the latter.

Upshot: I didn’t follow through on my promise to try GMail for a week. The frustration was too much to deal with, and I quit after four days. I’m back on Mail.app now. I probably missed out on some of GMail’s goodness, but overall, I left feeling exactly like I did going in. GMail has its advantages, but to me, it seems like they’re vastly outweighed by the absence of basic functionality and elegance present in all desktop mail clients (and by additional features in Mail.app) that I just missed too much. Feels good to be home.

April 6th, 2009

Headbanging with QuickTime

At the UC Berkeley Graduate School of Journalism and the Knight Digital Media Center, we’ve used Quicktime Streaming Server successfully for years. We mostly love it, but recently I’ve been banging my head against something that’s driving me nuts.

First, understand that .mov files on QTSS need to have a “hint” track added in order to enable genuine streaming. We run live webcasts with something called Wirecast Pro which lets us interleave titles, images, and output from a presenter’s desktop directly into live streams. It also records .mov files of those streams to disk for our webcast archives. After a conference ends, all I had to do was use QuickTime to add hint tracks to the recorded files and put them on our streaming server.

Recently we found that .mov files created with Wirecast would completely crash (hard!) QuickTime player when served from the streaming server into the browser. After much discussion on the QTSS mailing list, I was able to positively identify the problem as a bug in Apple’s hinting routine. Until the bug gets fixed, a developer at Apple recommended that I use the Penguin MP4 Encoder to add the hint tracks, rather than Apple tools.

That worked perfectly, but raised a separate problem – while the files will stream, they can no longer be played directly from the desktop (we offer a separate Download link). Attempting to play them results in an unhelpful “This movie file is apparently corrupt” message.

Thought I would go back to the drawing board and try to remove the hint tracks added by Penguin, so I could try a different approach. Can’t remove them in QuickTime since I can’t open them in QuickTime. Can’t remove them with the qtmedia command line binary that comes with OS X Server. Fortunately Penguin’s command-line tool does provide an “-unhint” option… but attempting to use that crashes Penguin.

So I’m stuck with a set of .mov files that play fine from QTSS but not locally, due to  weirdo hint tracks. And I can’t find a tool to remove  the hint tracks without crashing.

That’s my day so far. How’s yours?

March 27th, 2009

Longevity of Solid State Memory Cards?

Late last year, our house was broken into and a bunch of electronics were stolen, including the MiniDV video camera we had had since our wedding (fortunately the thief didn’t take all of our saved tapes). My video workflow over the past decade has consisted of shooting (judiciously), occassionally making a short web video, and putting the tape away in a cabinet for the archives.

When the camera was stolen, I replaced it with an HD camera that stores video data on SD cards. The usual workflow for SD-based cameras is that you extract what you need to disk when the card is full, then erase and re-use it. But I don’t always have time to do the reviewing and capturing every time, and don’t always feel comfortable erasing the card and starting over to shoot more footage. The question becomes, what is the best way to store this data long term?

I could of course buy another external hard drive dedicated to the task. They’re cheap enough, but experience teaches that disks are fallible, so then you get into the problem of having to back up what could quickly become terabytes of data.

Another solution would be to buy archival grade DVDs and copy data to them as cards fill up.

A final option would be to NOT reuse SD cards, but to replace them when full instead, and stack them in the cabinet for archival purposes just as I used to do with MiniDV tapes.

Doing some comparison shopping, it looks like the price ratio between using archival DVDs and buying new SD cards is similar enough to be neglible. The question then becomes, how do the shelf lives of these two media compare? If you search for information on the longevity of SD cards, you find lots of information about how they’re only good for a limited number of read/write operations before they start to fail… but that’s not what I’m interested in. I’m talking about writing to them once, only reading them a few times max, but storing them for years or decades. It’s surprisingly difficult to find information on how long data on an SD card will last if NOT used.

I’m confident they’d be fine for a few years. But what about 20? What about 50? (yes, I want my kid to be able to access this data when he’s grown up, hopefully without going through the hoops I recently did dealing with my dad’s 60-year-old 8- and 16-mm film stock.

Archival DVDs claim to be good for 100 years, and I’d be willing to trust that figure, or something like it, even though none of them have been around long enough for the estimate to be verified. But for convenience, I’d love to be able to skip the transfer step and just store SD cards long-term. Without information on that, I’m skittish about it.

Anyone have info on long-term shelf-life of unsed SD cards?

December 14th, 2008

Video Service Compression Test

A quick comparison of video compression quality at three of the major video upload services. I posted the same video file to YouTube, Flickr, and Vimeo, and have added them here alongside the original for comparison. I think the results speak for themselves.

miles_thump The original video was not shot with a video camera, but with a Canon SD1100S pocket still camera, which generated AVI files. I stitched a few together in QuickTime and saved the result as a QuickTime .mov. I did not alter any of the compression settings, and ended up with a file using the old standby codec Motion JPEG OpenDML at 640×480, 30fps, at a data rate of 15.75 mbit/sec.

Because it’s 60MBs, I’m linking to the original rather than embedding it.

Subject, by the way, is my son Miles (6) stomping in puddles on a rainy day at Jewel Lake in the Berkeley Hills.

YouTube clearly generates the worst results, with a huge amount of compression artifacts and general jerkiness:

To be fair, YouTube also offers a “high quality” version, which doesn’t look much (any?) better. Especially not compared to Flickr’s and Vimeo’s “normal” output.

Few people use Flickr Video, though the feature has been available for nearly a year. Results are definitely better than YouTube, but not as good as the original, and very similar to Vimeo (bottom).

I expected Vimeo to be the clear winner. Vimeo is known for excellent video quality (and the site design is excellent too). But now that I see them side by side, I’m having trouble finding much in the way of quality difference between Vimeo and Flickr. Downsides: It took Vimeo 70 minutes to make the video available after upload, and the tiny size of Vimeo’s social network means the video will get far less “drive-by” traffic than it will on YouTube.

September 3rd, 2008

Podcast Diet

Podcastlogo Podcasting changed my life.

There, I said it. Melodramatic, but true. When free time is whittled down to razor-thin margins, something’s gotta give, and media consumption is often the first luxury to go. And, speaking for myself, when I’m tired at the end of the day and give myself an hour of couch time, I’m not exactly predisposed to turn to the news. “Man vs. Wild” is more like it.

The one chunk of time I get all to myself every day is the daily commute (by bike or walk+train), which amounts to just over an hour a day. A few years ago, commute time was music time, but podcasting changed all that.

With a weekly quota of five hours consumption time, didn’t take long to subscribe to more podcasts than I could possibly digest before the next week rolled around. But I continue to hone the subscription list. Here are some of the podcasts I’ve come to call friends:

Links are to related sites – search iTunes for these if podcast links aren’t obvious.

- This Week in Tech: Tech maven Leo Laporte used to do great shows at ZDTV, now runs his own tech news & info podcasting network. I appeared on his TV show a few times back in the BeOS days; now I’m just a faceless audience member. Show gets rambly and too conversational at times, but they do a good job of traversing the landscape, and there are plenty of hidden gems. Frequent co-host John Dvorak drives me crazy, despite his smarts.

- Podcacher: All about geocaching, with “Sonny and Sandy from sunny San Diego, CA.” Great production values. Love it when the adventures are huge, but get bored with all the geocoin talk (unfortunately fast-forwarding through casts and bicycling don’t go well together, especially since losing tactile control after moving to the iPhone). Still, lots of tips, excellent anecdotes, and occasional hardware reviews.

- Radiolab: I’ll go with their own description: “On Radio Lab, science meets culture and information sounds like music. Each episode of Radio Lab. is an investigation — a patchwork of people, sounds, stories and experiences centered around One Big Idea.” I love what they do with sonic landscapes. I can’t think of a better example of utilizing the podcasting medium’s unique characteristics. The shows are mesmerizing, and welcome relief from my tech-heavy audio diet.

- This American Life: Everyone’s favorite NPR show. Excruciatingly wonderful overload of detail on the bizarre lives or ordinary Americans. Your soul needs this show.

- Slate Magazine Daily Podcast: They say it would be a waste of the medium’s potential to just have someone read stories into a microphone. I beg to differ. I don’t have time to read Slate, but love their journalism. I’m more than stoked to receive a digest version of the site through my ear-holes.

- FLOSS Weekly: Another Leo Laporte show, but in this one he gets out of the way and lets his guests do the talking. All open source, all the time. Usually interviews with leaders / founders / spokespeople for various major OSS initiatives. Great interviews recently with players from the Drizzle and Django camps.

- Stack Overflow: Who woulda thunk a pair of Windows-centric web developers would have captured my attention? But great insight here into the innards of web application construction. Geeks only.

- NPR: All Songs Considered If you’re old-and-in-the-way like me, feeling like your musical soul isn’t get fed the way it should, you could do a lot worse than subscribe to All Songs Considered – annotated rundown of recent (and sometimes not-so-recent) discoveries that remind you why music is Still Worth Paying Attention To.

- This Week in Django: Part of the reason I’ve been so quiet lately is that I’m deeply immersed in Django training, having inherited a fairly complex Django site at work (more on that another day). This podcast is pretty hardcore stuff, for Django developers only. Can’t pretend to understand it all, but right now it’s part of the immersion process, and is helping me gain scope on the Django landscape.

- The Wordpress Podcast: I spend more of my time (both at work and at home) tweaking on WordPress publication sites than anything else, and this is a great way to stay abreast of new plugins, security issues, techniques, etc. Wish it was more technical and had a faster pace, but it’s the best of the WordPress podcasts.

- Between the Lines: Back in my Ziff days, I worked for the amazing Dan Farber, who’s still going strong at ZD. This is my “check in with the veteran tech journalists” podcast, and is a serious distillation of goings-on in the tech world. Always a good listen.

Obviously there’s no way to fit all of these into the weekly commute hours, but I try. No time to digest more, but dying to know what podcasts have you gripped. Let me know.

Music: Minutemen :: Storm In My House
August 17th, 2008

Get Your Twitter Timeline into WordPress

After Twittering for a few months, I started to feel uncomfortable about not owning my data, and wanted an automated way to store a copy of each Tweet for posterity. Another installation of WordPress would be perfect as a Twitter backup repository (alternatively, you could copy all of your tweets to a dedicated category within your main WP installation, but I chose to do it in a separate install, since I wasn’t looking for integration with my main blog.

There were really two problems to solve:

1) Have new Tweets automatically hoovered into the WP backing store.
2) Get all of my older Tweets ported into the system as well.

Here’s the resulting site. It’s not really intended for public viewing – I don’t care if people browse it, but it’s really just a backup system in the form of a WordPress site.

Part 1 is pretty easy; Part 2 was more complicated. Here are recipes for both procedures.

(more…)

August 13th, 2008

The Long Tail in My LR

Fried from a long day, then with a client until 11:00, much-needed couch time. Overwhelmed myself with Olympic opening ceremony last night, couldn’t take more. Then remembered – wasn’t Tivo about to grow a YouTube gland? Checked in and sure enough, a bazillion new vids were there, waiting to be inhaled.

As expected, video quality isn’t great blown up to HDTV size, and audio is sometimes out of sync with the video, but the range of human experience at your fingertips is mind blowing. Started with a few Captain Beefheart clips, moved on to Django Rheinhardt, then to Jacob Kaplan-Moss talking about Django at Google HQ in 2006. I’d never watch an hour-long video at the computer, too restless for that, but this works.

The long tail is in my living room.

P.S. Thanks to the WordPress dev team for creating the WP posting client for iPhone, which I’m tapping away at now – wallowing in luxuriant tech.

“The ink is never dry on these babies.”

July 24th, 2008

Drizzle vs. Oracle

Logo-Mysql For years, the MySQL project has been busy bolting on features to help it compete for attention/market-space with the big boys of relational database land (mainly adding triggers and stored procedures, but also lots of other smaller features). Result: MySQL gets more respect with every passing year, and is now one of the most widely-deployed databases in the world (with the exception of SQLite – does that count?). Other result: MySQL is becoming more monolithic, consuming more memory and system resources.

But wait… the beauty of MySQL was always that it was perfect for web applications, with ultra-fast reads (since web apps spend the bulk of their time reading from, not writing to the database). The majority of the database-backed web consists of weblogs, forums, and various content management systems, where none of the fancy stuff is needed. Modern developers put their logic in application code, not in databases. Isn’t MySQL getting a bit fat for the bulk of sites it serves?

Enter Drizzle, a slimmed-down, microkernel version of MySQL optimized for web applications, with all the cruft that most of us never think about or use removed. O’Reilly: MySQL forks: could Drizzle be the next of the new generation of relational database?

“Aker presents this step as a return to the quick and lightweight MySQL that made it popular in the first place, a database engine that may not appeal to large corporate back offices but can easily power web sites. I see it also as a step back to the philosophy that Aker calls “Databases without business logic”: let the application handle consistency and complex calculations instead of making the database do them. Trust your programmers.”

So what ends up on the cutting room floor? Slashdot:

Akers has already selected particular functionality for removal: modes, views, triggers, prepared statements, stored procedures, query cache, data conversion inserts, access control lists and some data types.

Also interesting: “Aker stated that he is unwilling to support platforms without a proper GNU toolchain, such as Windows.” That means Drizzle will only run on Linux, BSD, and Mac OS.

Maybe it’s the company I keep, but I never seem to hear anything positive about working with the big databases. One person after another talks about working with Oracle and other large database systems as onerous, unnecessarily layered, annoying. Workmate Milan had this to say:

I did oracle database logic in EECS for a year or so and it was just a huge waste of time. I really started believing that the war for business logic in the db vs in the application really just amounted to oracle dba’s getting paid insane amounts of money to fiddle with PL/SQL triggers and procedures. Putting that logic in the application makes more sense to me and allows the application to remain in one, preferably OO (not Java, guess who*) language, and hence easier to maintain. It follows along with the rapid dev and ORM approach, which most developers see value in. DBAs on the other hand see their territory encroached upon. I will be a happy man when Oracle loses its grip over the business world. Oracle represents an aging empire that impedes progress.

* He’s referring to Python.

I can guarantee that of the 150+ sites running on both Birdhouse and the J-School, not a single one has any need for triggers, procedures, or any of the other non-core shiny stuff. Every site I’ve ever worked on would be perfectly happy running on a radically slimmer database, as would the vast majority of the web. Will be interesting to see this project evolve.

Music: The Kinks :: Brother