Parallel MPI BZIP2 (MPIBZIP2)

Data Compression Software

by Jeff Gilchrist
MPIBZIP2 Contact Address



MPIBZIP2 is a parallel implementation of the bzip2 block-sorting file compressor that uses MPI and achieves significant speedup
on cluster machines. The output of this version is fully compatible with bzip2 v1.0.2 or newer
(ie: anything compressed with mpibzip2 can be decompressed with bzip2).  MPIBZIP2 should work on any system that has a pthreads compatible C++ compiler (such as gcc). It has been tested on: Linux and Solaris.

NOTE: If you are looking for a parallel BZIP2 that works on multi-processor/muti-core/SMP machines, you should check out PBZIP2 which was designed for a multi-threaded shared-memory architecture.

Screen Shot

MPIBZIP2 v0.6 Screen Shot


License/Disclaimer

This software is distributed under a BSD-style license. For details, see the file COPYING. Use at your own risk. I take no responsibility for anything that happens to your data or equipment. Always test (bzip2 -tv) a compressed file containing important data before deleting the original to verify the compression was successful.

If you find this software useful or you are using it in a government/business/commercial environment, please consider making a donation to help support future improvements:


Download

Click to download the latest version:
Source Code: MPIBZIP2 v0.6 (18 KB) [SHA-1: 6053e5d6b39c160c5da23e73075a57b08ea712e7]
[MD5: 97c21a36caeffcc9e77aff84ddcfe49d]

Recent History

v0.6 (Jul. 18, 2007)
  • First public release!


Contributions

- Bryan Stillwell <bryan [at] bokeoa {dot} com> - code cleanup
- Dru Lemley [http://lemley.net/smp.html] - help with large file support
- Joergen Ramskov <joergen [at] ramskov {dot} org> - initial version of man page
- Peter Cordes <peter [at] cordes {dot} ca> - code cleanup
- Jindrich Novy <jnovy [at] redhat {dot} com> - code cleanup and bug fixes
- Richard Russon <ntfs [at] flatcap {dot} org> - help fix decompression bug
- Paul Pluzhnikov <paul [at] parasoft {dot} com> - fixed minor memory leak


Benchmark Results

The following benchmark was performed using a 128 CPU HP cluster with Myrinet G2 interconnect based on dual 2.2 GHz Opteron 275 systems with 1MB cache, 8GB system memory running Linux Kernel 2.6.9-22.7hp.1sp.XCsmp #1 SMP on the SHARCNET computing network.  The file being compressed is a 1875 MB binary database.

Benchmark results for compressing 1.83GB of data on a 2.2 GHz Opteron 275 based cluster.

Usage

Run mpibzip2 for the help listing.

===================================================================

Usage: mpibzip2 [-1 .. -9] [-b#cdfktvV] <filename> <filename2> <filenameN>

-b#: where # is the file block size in 100k (default 9 = 900k)
-c : output to standard out (stdout)
-d : decompress file
-f : force, overwrite existing output file
-k : keep input file, don't delete
-t : test compressed file integrity
-v : verbose mode
-V : display version info for mpibzip2 then exit
-1 .. -9 : set BWT block size to 100k .. 900k (default 900k)

Example: mpibzip2 -b15k myfile.tar
Example: mpibzip2 -v -5 myfile.tar second*.txt
Example: mpibzip2 -d myfile.tar.bz2

===================================================================

The mpibzip2 program is a parallel implementation of the bzip2 block- sorting file compressor that uses MPI and achieves significant speedup on cluster machines. The output is fully compatible with the regular bzip2 data so any files created with mpibzip2 can be uncompressed by bzip2 and vice-versa.  Since mpibzip2 uses MPI, you will need to launch it using mpirun or some similar utility (ie: mpirun -np 4 mpibzip2 myfile.tar).

The default settings for mpibzip2 will work well in most cases. The only switch you will likely need to use is -d to decompress files.

Example 1:
mpibzip2 -v myfile.tar

This example will compress the file "
myfile.tar" into the compressed file "myfile.tar.bz2". It will use the default file block size of 900k
and default BWT block size of 900k.

The program would report something like:
===================================================================

MPI BZIP2 v0.6  -  by: Jeff Gilchrist [http://compression.ca]
[Jul. 18, 2007] (uses libbzip2 by Julian Seward)

** This is a BETA version - Use at your own risk! **

# CPUs: 1 Master, 94 Slaves
BWT Block Size: 900k
File Block Size: 900k
-------------------------------------------
File #: 1 of 1
Input Name: myfile.tar
Output Name: myfile.tar.bz2

Input Size: 166604800 bytes
Compressing data...
Output Size: 29897521 bytes
Wall Clock: 2.385957 seconds
-------------------------------------------

===================================================================

Example 2:
mpibzip2 -b15vk myfile.tar

This example will compress the file "
myfile.tar" into the compressed file "myfile.tar.bz2". It will use a file block size of 1500k and a BWT block size of 900k. Verbose mode will be enabled so progress and other messages will be output to the display. The file "myfile.tar" will not be deleted after compression is finished.

The program would report something like:
===================================================================

MPI BZIP2 v0.6  -  by: Jeff Gilchrist [http://compression.ca]
[Jul. 18, 2007] (uses libbzip2 by Julian Seward)

** This is a BETA version - Use at your own risk! **

# CPUs: 1 Master, 94 Slaves
BWT Block Size: 900k
File Block Size: 1500k
-------------------------------------------
File #: 1 of 1
Input Name: myfile.tar
Output Name: myfile.tar.bz2

Input Size: 166604800 bytes
Compressing data...
Output Size: 29897521 bytes
Wall Clock: 2.385957 seconds
-------------------------------------------

===================================================================

Example 3:
mpibzip2 -5 -v myfile.tar second*.txt

This example will compress the file "
myfile.tar" into the compressed file "myfile.tar.bz2". It will use a BWT block size of 500k. Verbose mode will be enabled so progress and other messages will be output to the display. mpibzip2 will then use the same options to compress all other files that match the wildcard "second*.txt" in that directory.

The program would report something like:
===================================================================

MPI BZIP2 v0.6  -  by: Jeff Gilchrist [http://compression.ca]
[Jul. 18, 2007] (uses libbzip2 by Julian Seward)

** This is a BETA version - Use at your own risk! **

# CPUs: 1 Master, 2 Slaves
BWT Block Size: 500k
File Block Size: 900k
-------------------------------------------
File #: 1 of 3
Input Name: myfile.tar
Output Name: myfile.tar.bz2

Input Size: 7428687 bytes
Compressing data...
Output Size: 3237105 bytes
Wall Clock: 15.127381 seconds
-------------------------------------------
File #: 2 of 3
Input Name: secondfile.txt
Output Name: secondfile.txt.bz2

Input Size: 5897 bytes
Compressing data...
Output Size: 3192 bytes
Wall Clock: 1.273381 seconds
-------------------------------------------
File #: 3 of 3
Input Name: secondbreakfast.txt
Output Name: secondbreakfast.txt.bz2

Input Size: 83531 bytes
Compressing data...
Output Size: 11832 bytes
Wall Clock: 3.672381 seconds
-------------------------------------------

===================================================================

Example 4:
mpibzip2 -dv myfile.tar.bz2

This example will decompress the file "
myfile.tar.bz2" into the decompressed file "myfile.tar". The switches -b, and -1..-9 are not valid for decompression.

The program would report something like:
===================================================================

MPI BZIP2 v0.6  -  by: Jeff Gilchrist [http://compression.ca]
[Jul. 18, 2007] (uses libbzip2 by Julian Seward)

** This is a BETA version - Use at your own risk! **

# CPUs: 1 Master, 2 Slaves
-------------------------------------------
File #: 1 of 1
Input Name: myfile.tar.bz2
Output Name: myfile.tar

BWT Block Size: 900k
Input Size: 1123 bytes
Decompressing data...
Output Size: 6920 bytes
Wall Clock: 0.022775 seconds
-------------------------------------------

===================================================================

Bugs/Contact

If you would like to report any bugs or contact me related to the software you can reach me via e-mail at: MPIBZIP2 Contact Address


  • This web page is maintained by Jeff Gilchrist, Copyright © 2007-2009.
  • This web page best viewed using a resolution of 800 x 600 or higher.
compression.ca