CL-BZIP2 — BZIP2 compression/decompression for Common Lisp

Introduction

cl-bzip2 provides CFFI bindings for libbzip2 — the bzip2 compression/decompression library. It comes with a BSD-style license.

Contents

  1. Download and installation
  2. Support and mailing lists
  3. Examples
    1. Compression
    2. Decompression
    3. Working with vectors
  4. The cl-bzip2 dictionary
  5. Limitations
  6. Acknowledgements

Download and installation

Current release: Version 0.1.0

You can also browse the darcs repository or get yourself a copy using darcs get:

darcs get http://common-lisp.net/project/cl-bzip2/darcs/cl-bzip2

cl-bzip2 has a couple of dependencies:

  1. CFFI — the Common Foreign Function Interface, which provides C bindings for
  1. libbzip2 — The C library for compressing and decompressing data in the bzip2 format. Make sure the shared object library (libbz2.so/libbz2.dylib) is installed on your machine. The bzip2 manual can be found here.

To compile and load cl-bzip2, you can either use ASDF, or simply evaluate (load (compile-file "bzip2.lisp")) while in the cl-bzip2 source directory.

Support and mailing lists

Send questions, bug reports, patches, feature requests, etc. to cl-bzip2-devel. Release announcements are made on cl-bzip2-announce.

Examples

Compression

Use COMPRESS to compress data from a stream/pathname to another stream/pathname. Note that stream should be a binary stream. COMPRESS will not work with string streams i.e. a stream supplied by WITH-INPUT-FROM-STRING will most likely not work at all.


;;; Compression usage

;;; No values are returned if execution was successful
;;; Using pathnames
CL-USER> (bzip2:compress #p"test.txt" #p"test.txt.bz2")
; No value

;;; Using binary streams
CL-USER> (with-open-file (in "test.txt" :direction :input :element-type '(unsigned-byte 8))
           (with-open-file (out "test.txt.bz2" :direction :output :element-type '(unsigned-byte 8))
             (bzip2:compress in out)))
; No value

;;; Mixing stream and pathname
CL-USER> (with-open-file (in "test.txt" :direction :input :element-type '(unsigned-byte 8))
           (bzip2:compress in #p"test.txt.bz2"))
; No value

Decompression

Use DECOMPRESS to decompress data from a stream/pathname to another stream/pathname. As with COMPRESS, stream should be a binary stream.


;;; Decompression usage is similar to that for compression

;;; Using pathnames
CL-USER> (bzip2:decompress #p"test.txt.bz2" #p"test.txt")
; No value

;;; Using binary streams
CL-USER> (with-open-file (in "test.txt.bz2" :direction :input :element-type '(unsigned-byte 8))
           (with-open-file (out "test.txt" :direction :output :element-type '(unsigned-byte 8))
             (bzip2:decompress in out)))
; No value

Working with vectors

Compression/decompression of vectors can easily be done with in-memory binary streams by using, for example, FLEXI-STREAMS.


CL-USER> (defvar *vec* #(66 104 97 107 99 104 111 100 105 32 109 97 116 32 107 97 114 111 46))
*VEC*

CL-USER> (flex:with-input-from-sequence (in *vec*)
           (flex:with-output-to-sequence (out)
             (bzip2:compress in out)))
#(66 90 104 57 49 65 89 38 83 89 188 189 88 250 0 0 1 149 128 64 1 16 0
  44 106 148 0 32 0 34 4 245 52 204 144 128 104 3 109 12 42 5 148 84
  110 113 190 46 228 138 112 161 33 121 122 177 244)

CL-USER> (flex:with-input-from-sequence (in *)
           (flex:with-output-to-sequence (out)
             (bzip2:decompress in out)))
#(66 104 97 107 99 104 111 100 105 32 109 97 116 32 107 97 114 111 46)

CL-USER> (equalp * *vec*)
T

The cl-bzip2 dictionary

[Condition type]
bz-error

The default condition type for any BZIP2 compression/decompression related error.

[Function]
compress in out &key block-size-100k verbosity work-factor

Compresses data from IN to OUT. IN or OUT can either be a binary stream or a pathname. This function doesn’t return any value.

BLOCK-SIZE-100K (default 9), VERBOSITY (default 0) and WORK-FACTOR (default 30) correspond to the parameters blockSize100k, verbosity and workFactor, respectively, for the libbzip2 function BZ2_bzCompressInit.

From the bzip2 manual:

Parameter blockSize100k specifies the block size to be used for compression. It should be a value between 1 and 9 inclusive, and the actual block size used is 100000 x this figure. 9 gives the best compression but takes most memory.

Parameter verbosity should be set to a number between 0 and 4 inclusive. 0 is silent, and greater numbers give increasingly verbose monitoring/debugging output.

Parameter workFactor controls how the compression phase behaves when presented with worst case, highly repetitive, input data. If compression runs into difficulties caused by repetitive data, the library switches from the standard sorting algorithm to a fallback algorithm. The fallback is slower than the standard algorithm by perhaps a factor of three, but always behaves reasonably, no matter how bad the input.

[Function]
decompress in out &key verbosity smallp

Decompresses data from IN to OUT. IN or OUT can either be a binary stream or a pathname. This function doesn’t return any value.

VERBOSITY and SMALLP (default NIL) correspond to the parameters verbosity and small, respectively, for the libbzip2 function BZ2_bzDecompressInit.

For the meaning of VERBOSITY, see the documentation for COMPRESS. A non-NIL value for SMALLP corresponds to a non-zero value for the parameter small. Here’s what the bzip2 manual says about small:

If small is nonzero, the library will use an alternative decompression algorithm which uses less memory but at the cost of decompressing more slowly (roughly speaking, half the speed, but the maximum memory requirement drops to around 2300k).

Limitations

As of now, cl-bzip2 works only with binary streams. I haven’t figured out an easy way to make it work with string streams (i.e. easily using them as input). If you know how that can be done, please let us know.

Also, performance might not be as great as you expect. If you want to improve that, the guts of the code lie in COMPRESS-STREAM and COMPRESS-STREAM-AUX (for compression), and DECOMPRESS-STREAM and DECOMPRESS-STREAM-AUX (for decompression). Any help would be greatly appreciated!

Acknowledgements

Thanks to Julian Seward for the bzip2 compression format and the excellent libbzip2 interface.

Thanks also to Rakesh Pai for helping out with the CSS for this page. (And that became my introduction to CSS!). The markup for the cl-bzip2 dictionary was inspired from DOCUMENTATION-TEMPLATE.