Just had the need to create an archive of a folder containing 91 large text files, totaling 370MBs. Decided to pit zip against tar + gzip in a little speed test, using these commands:
tar cvzf awstats.tgz awstats
zip -9ry awstats.zip awstats
On the server in question, these were the elapsed times to accomplish this very similar task:
zip: One minute, 21 seconds
tar: 41 seconds
This is, in part because tar only has to compress once, after concatenating all the bits together (but that’s not the full story). In contrast, zip has to compress each file individually. And resulting archive sizes?
-rw-r--r-- 1 cdt cdt 141877473 Mar 8 10:31 awstats.tgz
-rw-r--r-- 1 cdt cdt 140081519 Mar 8 10:29 awstats.zip
So zip did have a slight advantage in the output size. But wait.. no fair! We used the “-9″ option with zip for maximum compression. To make it more fair, let’s use the “-9″ flag with gzip as well. Unfortunately, to do that we’ll need to run two consecutive commands:
$ tar cvf awstats.tar awstats ; gzip -9 awstats.tar
This caused the compression time for gzip to go way up; that command took 1:17 to run. But now the filesizes are approaching identical:
-rw-r--r-- 1 cdt cdt 140090837 Mar 8 10:42 awstats.tar.gz
-rw-r--r-- 1 cdt cdt 140081519 Mar 8 10:29 awstats.zip
Of course these kinds of things are very circumstantial – doing a similar test on a folder full of pre-compressed files like MP3s would yield very different results (in that case you’d be way better off just using tar without gzip, and definitely not zip). But the upshot is that when trying to decide whether to use zip or tar + gzip, compression times and output sizes are close enough to just not matter in general usage.
Update: I did end up doing a later test on the same dir with bzip2. Result: significantly smaller file size:
-rw-r--r-- 1 cdt cdt 104698994 Mar 8 14:17 awstats.tar.bz2
but at the expense of much longer compression times. If I use gzip and bzip2 side by side on the same 370MB tar file, I get these times:
gzip: 41 seconds
bzip2: 1 minute 36 seconds
Making bzip2 almost twice as slow as gzip (though it does generate smaller output files).
bzip2 will create smaller files at the expense of time and CPU cycles.
7zip will really lower the file size, and speed is about on par with bzip2.
Just updated with a quickie comparison between gzip and bzip2. Wow – bzip2 is WAY slower. Sounds like 7zip would be the way to go unless time is of the essence, in which case you’d just stick with gzip.