Parallel MPI BZIP2 (MPIBZIP2)
Data Compression Software
by Jeff Gilchrist
NOTE: If you are looking for a parallel BZIP2 that works on multi-processor/muti-core/SMP machines, you should check out PBZIP2 which was designed for a multi-threaded shared-memory architecture. Screen Shot
License/Disclaimer
This software is distributed under a BSD-style license. For details, see the file COPYING. Use at your own risk. I take no responsibility for anything that happens to your data or equipment. Always test (bzip2 -tv) a compressed file containing important data before deleting the original to verify the compression was successful.
If you find this software useful or you are using it in a government/business/commercial environment, please consider making a donation to help support future improvements:
Download
Recent History
Contributions
- Bryan Stillwell <bryan [at] bokeoa {dot} com> - code cleanup - Dru Lemley [http://lemley.net/smp.html] - help with large file support - Joergen Ramskov <joergen [at] ramskov {dot} org> - initial version of man page - Peter Cordes <peter [at] cordes {dot} ca> - code cleanup - Jindrich Novy <jnovy [at] redhat {dot} com> - code cleanup and bug fixes - Richard Russon <ntfs [at] flatcap {dot} org> - help fix decompression bug - Paul Pluzhnikov <paul [at] parasoft {dot} com> - fixed minor memory leak
Benchmark Results
Usage
Run mpibzip2 for the help listing. ===================================================================
Usage: mpibzip2 [-1 .. -9] [-b#cdfktvV] <filename> <filename2> <filenameN> -b#: where # is the file block size in 100k (default 9 = 900k) -c : output to standard out (stdout) -d : decompress file -f : force, overwrite existing output file -k : keep input file, don't delete -t : test compressed file integrity -v : verbose mode -V : display version info for mpibzip2 then exit -1 .. -9 : set BWT block size to 100k .. 900k (default 900k)Example: mpibzip2 -b15k myfile.tarExample: mpibzip2 -v -5 myfile.tar second*.txtExample: mpibzip2 -d myfile.tar.bz2
=================================================================== The mpibzip2 program is a parallel implementation of the bzip2 block- sorting file compressor that uses MPI and achieves significant speedup on cluster machines. The output is fully compatible with the regular bzip2 data so any files created with mpibzip2 can be uncompressed by bzip2 and vice-versa. Since mpibzip2 uses MPI, you will need to launch it using mpirun or some similar utility (ie: mpirun -np 4 mpibzip2 myfile.tar).
The default settings for mpibzip2 will work well in most cases. The only switch you will likely need to use is -d to decompress files. Example 1: mpibzip2 -v myfile.tar This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use the default file block size of 900k and default BWT block size of 900k.
The program would report something like: ===================================================================
MPI BZIP2 v0.6 - by: Jeff Gilchrist [http://compression.ca][Jul. 18, 2007] (uses libbzip2 by Julian Seward)** This is a BETA version - Use at your own risk! ** # CPUs: 1 Master, 94 Slaves BWT Block Size: 900kFile Block Size: 900k------------------------------------------- File #: 1 of 1 Input Name: myfile.tar Output Name: myfile.tar.bz2 Input Size: 166604800 bytesCompressing data... Output Size: 29897521 bytes Wall Clock: 2.385957 seconds-------------------------------------------
=================================================================== Example 2: mpibzip2 -b15vk myfile.tar This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use a file block size of 1500k and a BWT block size of 900k. Verbose mode will be enabled so progress and other messages will be output to the display. The file "myfile.tar" will not be deleted after compression is finished.
MPI BZIP2 v0.6 - by: Jeff Gilchrist [http://compression.ca][Jul. 18, 2007] (uses libbzip2 by Julian Seward)** This is a BETA version - Use at your own risk! ** # CPUs: 1 Master, 94 Slaves BWT Block Size: 900kFile Block Size: 1500k------------------------------------------- File #: 1 of 1 Input Name: myfile.tar Output Name: myfile.tar.bz2 Input Size: 166604800 bytesCompressing data... Output Size: 29897521 bytes Wall Clock: 2.385957 seconds-------------------------------------------
=================================================================== Example 3: mpibzip2 -5 -v myfile.tar second*.txt This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use a BWT block size of 500k. Verbose mode will be enabled so progress and other messages will be output to the display. mpibzip2 will then use the same options to compress all other files that match the wildcard "second*.txt" in that directory.
MPI BZIP2 v0.6 - by: Jeff Gilchrist [http://compression.ca][Jul. 18, 2007] (uses libbzip2 by Julian Seward)** This is a BETA version - Use at your own risk! ** # CPUs: 1 Master, 2 Slaves BWT Block Size: 500kFile Block Size: 900k------------------------------------------- File #: 1 of 3 Input Name: myfile.tar Output Name: myfile.tar.bz2 Input Size: 7428687 bytesCompressing data... Output Size: 3237105 bytes Wall Clock: 15.127381 seconds------------------------------------------- File #: 2 of 3 Input Name: secondfile.txt Output Name: secondfile.txt.bz2 Input Size: 5897 bytesCompressing data... Output Size: 3192 bytes Wall Clock: 1.273381 seconds------------------------------------------- File #: 3 of 3 Input Name: secondbreakfast.txt Output Name: secondbreakfast.txt.bz2 Input Size: 83531 bytesCompressing data... Output Size: 11832 bytes Wall Clock: 3.672381 seconds-------------------------------------------
=================================================================== Example 4: mpibzip2 -dv myfile.tar.bz2 This example will decompress the file "myfile.tar.bz2" into the decompressed file "myfile.tar". The switches -b, and -1..-9 are not valid for decompression.
MPI BZIP2 v0.6 - by: Jeff Gilchrist [http://compression.ca][Jul. 18, 2007] (uses libbzip2 by Julian Seward)** This is a BETA version - Use at your own risk! ** # CPUs: 1 Master, 2 Slaves------------------------------------------- File #: 1 of 1 Input Name: myfile.tar.bz2 Output Name: myfile.tar BWT Block Size: 900k Input Size: 1123 bytesDecompressing data... Output Size: 6920 bytes Wall Clock: 0.022775 seconds-------------------------------------------
=================================================================== Bugs/Contact
If you would like to report any bugs or contact me related to the software you can reach me via e-mail at: