Zlib compression library (Gauche Users’ Reference)

12.57 `rfc.zlib` - zlib compression library

Module: rfc.zlib ¶

This module provides bindings to zlib compression library. Most features of zlib can be used through this module.

Zlib supports reading and writing of Zlib compressed data format (RFC1950), DEFLATE compressed data format (RFC1951), and GZIP file format (RFC1052). It also provides procedures to calculate CRC32 and Adler32 checksums.

Compression and decompression are done through specialized ports. There are number of parameters to fine-tune compression; refer to zlib documentation for the details.

Condition types

The following condition types are defined to represent errors during processing by zlib.

Condition Type: <zlib-error> ¶: {rfc.zlib} Subclass of <error> and superclass of the following condition types. This class is an abstract class to catch any of the zlib-specific errors. Zlib-specific errors raised by procedures in rfc.zlib are always an instance (or a compound condition including) one of the following specific classes.

Condition Type: <zlib-need-dict-error> ¶

Condition Type: <zlib-stream-error> ¶

Condition Type: <zlib-data-error> ¶

Condition Type: <zlib-memory-error> ¶

Condition Type: <zlib-version-error> ¶

{rfc.zlib} Subclasses of <zlib-error>. Those condition type correspond to zlib’s Z_NEED_DICT_ERROR, Z_STREAM_ERROR, Z_DATA_ERROR, Z_MEMORY_ERROR, and Z_VERSION_ERROR errors.

When an error occurs during reading data, a compound condition of a subclass of <zlib-error> and <io-read-error> is raised. When an error occurs without I/O, a simple condition of a subclass of <zlib-error> is raised. Errors unrelated to zlib, such as invalid argument error, would be a simple <error> condition.

Compression/decompression ports

Class: <deflating-port> ¶

Class: <inflating-port> ¶

{rfc.zlib} Compression and decompression functions are provided via ports. A deflating port is an output port that compresses the output data. An inflating port is an input that reads compressed data and decompress it.

When an inflating port encounters a corrupted compressed data, a compound condition of <io-read-error> and <zlib-data-error> is raised during read operation.

Function: open-deflating-port drain :key compression-level buffer-size window-bits memory-level strategy dictionary owner? ¶

{rfc.zlib} Creates and returns an instance of <deflating-port>, an output port that compresses the output data and sends the compressed data to another output port drain. This combines the functionality of zlib’s deflateInit2() and deflateSetDictionary().

You can specify an exact integer between 1 and 9 (inclusive) to compression-level. Larger integer means larger compression ratio. When omitted, a default compression level is used, which is usually 6.

The following constants are defined to specify compression-level conveniently:

Constant: Z_NO_COMPRESSION ¶
Constant: Z_BEST_SPEED ¶
Constant: Z_BEST_COMPRESSION ¶
Constant: Z_DEFAULT_COMPRESSION ¶: {rfc.zlib}

The buffer-size argument specifies the buffer size of the port in bytes. The default is 4096.

The window-bits argument specifies the size of the window in exact integer. Typically the value should be between 8 and 15, inclusive, and it specifies the base two logarithm of the window size used in compression. Larger number yields better compression ratio, but more memory usage. The default value is 15.

There are a couple of special modes specifiable by window-bits. When an integer between -8 and -15 is given to window-bits, the port produces a raw deflated data, that lacks zlib header and trailer. In this case, Adler32 checksum isn’t calculated. The actual window size is determined by the absolute value of window-bits.

When window-bits is between 24 and 31, the port uses GZIP encoding; that is, instead of zlib wrapper, the compressed data is enveloped by simple gzip header and trailer. The gzip header added by this case doesn’t have filenames, comments, header CRC and other data, and have zero modified time, and 255 (unknown) in the OS field. The zstream-adler32 procedure will return CRC32 checksum instead of Adler32. The actual window size is determined by window-bits-16.

The memory-level argument specifies how much memory should be allocated to keep the internal state during compression. 1 means smallest memory, which causes slow and less compression. 9 means fastest and best compression with largest amount of memory. The default value is 8.

To fine tune compression algorithm, you can use the strategy argument. The following constants are defined as the valid value as strategy:

Constant: Z_DEFAULT_STRATEGY ¶: {rfc.zlib} The default strategy, suitable for most ordinary data.

Constant: Z_FILTERED ¶: {rfc.zlib} Suitable for data generated by filters. Filtered data consists mostly of small values with a random distribution, and this makes the compression algorithm to use more huffman encoding and less string match.

Constant: Z_HUFFMAN_ONLY ¶: {rfc.zlib} Force huffman encoding only (no string match).

Constant: Z_RLE ¶: {rfc.zlib} Limit match distance to 1 (that is, to force run-length encoding). It is as fast as Z_HUFFMAN_ONLY and gives better compression for png image data.

Constant: Z_FIXED ¶: {rfc.zlib} Prohibits dynamic huffman encoding. It allows a simple decoder for special applications.

The choice of strategy only affects compression ratio and speed. Any choice produces correct and decompressable data.

You can give an initial dictionary to the dictionary argument to be used in compression. The compressor and decompressor must use exactly the same dictionary. See the zlib documentation for the details.

By default, a deflating port leaves drain open after all conversion is done, i.e. the deflating port itself is closed. If you don’t want to bother closing drain, give a true value to the owner? argument; then drain is closed after the deflating port is closed and all data is written out.

Note: You must close a deflating port explicitly, or the compressed data can be chopped prematurely. When you leave a deflating port open to be GCed, the finalizer will close it; however, the order in which finalizers are called is undeterministic, and it is possible that the drain port is closed before the deflating port is closed. In such cases, the deflating port’s attempt to flush the buffered data and trailer will fail.

Function: open-inflating-port source :key buffer-size window-bits dictionary owner? ¶

{rfc.zlib} Takes an input port source from which a compressed data can be read, and creates and returns a new instance of <inflating-port>, that is, a port that allows decompressed data from it. This procedure covers zlib’s functionality of inflateInit2() and inflateSetDictionary().

The meaning of buffer-size and owner are the same as open-deflating-port.

The meaning of window-bits is almost the same, except that if a value increased by 32 is given, the inflating port automatically detects whether the source stream is zlib or gzip by its header.

That is, you can specify between 8 to 15 to read zlib, 24 to 31 to read gzip, or 40 to 47 to use automatic detection.

The window bits must be equal to or greater than the window bits used to compress the source, or a <zlib-data-error> condition is thrown. If you don’t know the compression parameters of the input (which is most likely the case), you need to specify the maximum value, i.e. 15 for zlib, 31 for gzip, or 47 to autodetect.

If the input data is compressed with specified dictionary, the same dictionary must be given to the dictionary argument. Otherwise, a compound condition of <io-read-error> and <zlib-need-dict-error> will be raised.

Operations on inflating/deflating ports

Function: zstream-total-in xflating-port ¶

Function: zstream-total-out xflating-port ¶

Function: zstream-adler32 xflating-port ¶

Function: zstream-data-type xflating-port ¶

{rfc.zlib} The xflating-port argument must be either inflating and deflating port, or an error is raised.

Returns the value of total_in, total_out, adler32, and data_type fields of the z_stream structure associated to the given inflating or deflating port, respectively.

The value of data_type can be one of the following constants:

Constant: Z_BINARY ¶
Constant: Z_TEXT ¶
Constant: Z_ASCII ¶
Constant: Z_UNKNOWN ¶: {rfc.zlib}

Function: zstream-params-set! deflating-port :key compression-level strategy ¶: {rfc.zlib} Changes compression level and/or strategy during compressing.

Function: zstream-dictionary-adler32 deflating-port ¶: {rfc.zlib} When a dictionary is given to open-deflating-port, the dictionary’s adler32 checksum is calculated. This procedure returns the checksum. If no dictionary has been given, this procedure returns #f.

Function: deflating-port-full-flush deflating-port ¶: {rfc.zlib} Flush the data buffered in the deflating-port, and resets compression state. The decompression routine can skip the data to the full-flush point by inflate-sync.

Function: inflate-sync inflating-port ¶: {rfc.zlib} Skip the (possibly corrupted) compressed data up to the next full-flush point marked by deflating-port-full-flush. You may want to use this procedure when you get <zlib-data-error>. Returns the number of bytes skipped when the next full-flush point is found, or #f when the input reaches EOF before finding the next point.

Miscellaneous API

Function: zlib-version ¶: {rfc.zlib} Returns Zlib’s version in string.

Function: deflate-string string options … ¶: {rfc.zlib} Compresses the given string and returns zlib-compressed data in a string. All optional arguments are passed to open-deflating-port as they are.

Function: inflate-string string options … ¶: {rfc.zlib} Takes zlib-compressed data in string, and returns decompressed data in a string. All optional arguments are passed to open-inflating-port as they are.

Function: gzip-encode-string string options … ¶
Function: gzip-decode-string string options … ¶: {rfc.zlib} Like deflate-string and inflate-string, but uses the gzip format instead. It is same as giving more than 15 to the window-bits argument of deflate-string and inflate-string.

Function: crc32 string :optional checksum ¶: {rfc.zlib} Returns CRC32 checksum of string. If optional checksum is given, the returned checksum is an update of checksum by string.

Function: adler32 string :optional checksum ¶

{rfc.zlib} Returns Adler32 checksum of string. If optional checksum is given, the returned checksum is an update of checksum by string.

Calculating Adler32 is faster than CRC32, but it is known to produce uneven distribution of hash values for small input. See RFC3309 for the detailed description. If it matters, use CRC32 instead.

12.57 rfc.zlib - zlib compression library

Condition types

Compression/decompression ports

Operations on inflating/deflating ports

Miscellaneous API

12.57 `rfc.zlib` - zlib compression library