Stream interface for data compression in Common Lisp -


in chipz decompression library there extremely useful function make-decompressing-stream, provides interface (using gray streams behind scenes) transparently decompress data read provided stream. allows me write single function read-tag (which reads single "tag" stream of structured binary data, common lisp's read function reads single lisp "form" stream) works on both compressed , uncompressed data, eg:

;; uncompressed data: (read-tag in-stream) ;; compressed data: (read-tag (chipz:make-decompressing-stream 'chipz:zlib in-stream)) 

as far can tell, api of associated compression library, salza2, doesn't provide (out-of-the-box) equivalent interface performing reverse task. how implement such interface myself? let's call make-compressing-stream. used own complementary write-tag function, , provide same benefits reading:

;; uncompressed-data: (write-tag out-stream current-tag) ;; compressed data: (write-tag (make-compressing-stream 'salza2:zlib-compressor out-stream)                  current-tag) 

in salza2's documentation (linked above), in overview, says: "salza2 provides interface creating compressor object. object acts sink octets (either individual or vectors of octets), , source octets in compressed data format. compressed octet data provided user-defined callback can write stream, copy vector, etc." current purposes, require compression in zlib , gzip formats, standard compressors provided.

so here's how think done: firstly, convert "tag" object octet vector, secondly compress using salza2:compress-octet-vector, , thirdly, provide callback function writes compressed data directly file. reading around, think first step achieved using flexi-streams:with-output-to-sequence - see here - i'm not sure callback function, despite looking @ salza2's source. here's thing: single tag can contain arbitrary number of arbitrarily nested tags, , "leaf" tags of structure can each carry sizeable payload; in other words, single tag can quite lot of data.

so tag->uncompressed-octets->compressed-octets->file conversion ideally need performed in chunks, , raises question don't know how answer, namely: compression formats - aiui - tend store in headers checksum of payload data; if compress data 1 chunk @ time , append each compressed chunk output file, surely there header , checksum each chunk, opposed single header , checksum entire tag's data, want? how can solve problem? or handled salza2?

thanks help, sorry rambling :)

from understand, can't directly decompress multiple chunks single file.

(defun bytes (&rest elements)     (make-array (length elements)       :element-type '(unsigned-byte 8)      :initial-contents elements))  (defun compress (chunk &optional mode)   (with-open-file (output #p"/tmp/compressed"                           :direction :output                           :if-exists mode                           :if-does-not-exist :create                           :element-type '(unsigned-byte 8))     (salza2:with-compressor (c 'salza2:gzip-compressor                                :callback (salza2:make-stream-output-callback output))       (salza2:compress-octet-vector chunk c))))  (compress (bytes 10 20 30) :supersede) (compress (bytes 40 50 60) :append)    

now, /tmp/compressed contains 2 consecutive chunks of compressed data. calling decompress reads first chunk only:

(chipz:decompress nil 'chipz:gzip #p"/tmp/compressed") => #(10 20 30) 

looking @ source of chipz, stream read using internal buffer, means bytes follows first chunk read not decompressed. explains why, when using 2 consecutive decompress calls on same stream, second 1 errors eof.

(with-open-file (input #p"/tmp/compressed"                         :element-type '(unsigned-byte 8))   (list    #1=(multiple-value-list(ignore-errors(chipz:decompress nil 'chipz:gzip input)))    #1#))  => ((#(10 20 30))     (nil #<chipz:premature-end-of-stream {10155e2163}>)) 

i don't know how large data supposed be, if ever becomes problem, might need change decompression algorithm when in done state (see inflate.lisp), enough data returned process remaining bytes new chunk. or, compress different files , use archive format tar (see https://github.com/froydnj/archive).


Comments