cl-bzip2 provides CFFI bindings for libbzip2 — the bzip2 compression/decompression library. It comes with a BSD-style license.
Current release: Version 0.1.0
You can also browse the darcs repository or get yourself a copy using darcs get:
darcs get http://common-lisp.net/project/cl-bzip2/darcs/cl-bzip2
cl-bzip2 has a couple of dependencies:
To compile and load cl-bzip2, you can either use ASDF, or simply evaluate (load (compile-file "bzip2.lisp"))
while in the cl-bzip2 source directory.
Send questions, bug reports, patches, feature requests, etc. to cl-bzip2-devel. Release announcements are made on cl-bzip2-announce.
Use COMPRESS to compress data from a stream/pathname to another stream/pathname. Note that stream should be a binary stream. COMPRESS will not work with string streams i.e. a stream supplied by WITH-INPUT-FROM-STRING will most likely not work at all.
;;; Compression usage
;;; No values are returned if execution was successful
;;; Using pathnames
CL-USER> (bzip2:compress #p"test.txt" #p"test.txt.bz2")
; No value
;;; Using binary streams
CL-USER> (with-open-file (in "test.txt" :direction :input :element-type '(unsigned-byte 8))
(with-open-file (out "test.txt.bz2" :direction :output :element-type '(unsigned-byte 8))
(bzip2:compress in out)))
; No value
;;; Mixing stream and pathname
CL-USER> (with-open-file (in "test.txt" :direction :input :element-type '(unsigned-byte 8))
(bzip2:compress in #p"test.txt.bz2"))
; No value
Use DECOMPRESS to decompress data from a stream/pathname to another stream/pathname. As with COMPRESS, stream should be a binary stream.
;;; Decompression usage is similar to that for compression
;;; Using pathnames
CL-USER> (bzip2:decompress #p"test.txt.bz2" #p"test.txt")
; No value
;;; Using binary streams
CL-USER> (with-open-file (in "test.txt.bz2" :direction :input :element-type '(unsigned-byte 8))
(with-open-file (out "test.txt" :direction :output :element-type '(unsigned-byte 8))
(bzip2:decompress in out)))
; No value
Compression/decompression of vectors can easily be done with in-memory binary streams by using, for example, FLEXI-STREAMS.
CL-USER> (defvar *vec* #(66 104 97 107 99 104 111 100 105 32 109 97 116 32 107 97 114 111 46))
*VEC*
CL-USER> (flex:with-input-from-sequence (in *vec*)
(flex:with-output-to-sequence (out)
(bzip2:compress in out)))
#(66 90 104 57 49 65 89 38 83 89 188 189 88 250 0 0 1 149 128 64 1 16 0
44 106 148 0 32 0 34 4 245 52 204 144 128 104 3 109 12 42 5 148 84
110 113 190 46 228 138 112 161 33 121 122 177 244)
CL-USER> (flex:with-input-from-sequence (in *)
(flex:with-output-to-sequence (out)
(bzip2:decompress in out)))
#(66 104 97 107 99 104 111 100 105 32 109 97 116 32 107 97 114 111 46)
CL-USER> (equalp * *vec*)
T
[Condition type]
bz-error
The default condition type for any BZIP2 compression/decompression related error.
[Function]
compress in out &key block-size-100k verbosity work-factor
Compresses data from
IN
toOUT
.IN
orOUT
can either be a binary stream or a pathname. This function doesn’t return any value.
BLOCK-SIZE-100K
(default 9),VERBOSITY
(default 0) andWORK-FACTOR
(default 30) correspond to the parametersblockSize100k
,verbosity
andworkFactor
, respectively, for the libbzip2 functionBZ2_bzCompressInit
.From the bzip2 manual:
Parameter
blockSize100k
specifies the block size to be used for compression. It should be a value between 1 and 9 inclusive, and the actual block size used is 100000 x this figure. 9 gives the best compression but takes most memory.Parameter
verbosity
should be set to a number between 0 and 4 inclusive. 0 is silent, and greater numbers give increasingly verbose monitoring/debugging output.Parameter
workFactor
controls how the compression phase behaves when presented with worst case, highly repetitive, input data. If compression runs into difficulties caused by repetitive data, the library switches from the standard sorting algorithm to a fallback algorithm. The fallback is slower than the standard algorithm by perhaps a factor of three, but always behaves reasonably, no matter how bad the input.
[Function]
decompress in out &key verbosity smallp
Decompresses data from
IN
toOUT
.IN
orOUT
can either be a binary stream or a pathname. This function doesn’t return any value.
VERBOSITY
andSMALLP
(default NIL) correspond to the parametersverbosity
andsmall
, respectively, for the libbzip2 functionBZ2_bzDecompressInit
.For the meaning of
VERBOSITY
, see the documentation for COMPRESS. A non-NIL value forSMALLP
corresponds to a non-zero value for the parametersmall
. Here’s what the bzip2 manual says aboutsmall
:If
small
is nonzero, the library will use an alternative decompression algorithm which uses less memory but at the cost of decompressing more slowly (roughly speaking, half the speed, but the maximum memory requirement drops to around 2300k).
As of now, cl-bzip2 works only with binary streams. I haven’t figured out an easy way to make it work with string streams (i.e. easily using them as input). If you know how that can be done, please let us know.
Also, performance might not be as great as you expect. If you want to improve that, the guts of the code lie in COMPRESS-STREAM and COMPRESS-STREAM-AUX (for compression), and DECOMPRESS-STREAM and DECOMPRESS-STREAM-AUX (for decompression). Any help would be greatly appreciated!
Thanks to Julian Seward for the bzip2 compression format and the excellent libbzip2 interface.
Thanks also to Rakesh Pai for helping out with the CSS for this page. (And that became my introduction to CSS!). The markup for the cl-bzip2 dictionary was inspired from DOCUMENTATION-TEMPLATE.