pserializer - portable serializer implementation Shiro Kawai, shiro@acm.org $Id: pserializer.txt,v 1.3 2000/01/26 11:13:39 shiro Exp $ Contents: OVERVIEW EXTERNAL INTERFACE SERIALIZED FORMAT EXTENDING THE SERIALIZER PORTING Overview ======== This is a simple implementation to serialize standard Scheme objects (boolean, pair, symbol, string, number, character, string and vector). Serialization is a function to convert a Scheme structure to a certain bytestream which is independent from the running process, and when it is read back ("deserialized"), it recovers a Scheme structure topologically equal to the original one. Serialized form is very useful to store a Scheme structure in a file (persistence), or to send it over the network. This implementation is intended to be portable and extensible for various Scheme implementations. To use this module, you need to define a few hash-table funcitons and error function. The example to use SLIB and STk is shown in the source code pserializer.scm. See "Porting" section below. Although this implementation deals with only the standard Scheme objects, you can extend the serializer routine to accept other types of objects specific to your Scheme implementation (e.g. records or classes). See "Extending the serializer" section below. External Interface ================== MAKE-OUTPUT-SERIALIZER port &optional extension [function] Create a serializer from given output port PORT and returns it. Optional argument EXTENSION is used to extend the serializer to recognize implementation dependent objects. WRITE-TO-OUTPUT-SERIALIZER object serializer [function] Write an OBJECT to the output serializer SERIALIZER. CALL-WITH-OUTPUT-SERIALIZER port proc &optional extension [function] PROC must be a procedure which takes one argument, serializer. This procedure creates a serializer from PORT and EXTENSION argument, and passes it to PROC. MAKE-INPUT-SERIALIZER port &optional extension [function] Create a deserializer from given input port PORT and returns it. Optional argument EXTENSION is used to extend the serializer to recognize implementation dependent objects. READ-FROM-INPUT-SERIALIZER serializer [function] Read one object from the input serializer SERIALIZER. It returns an eof object when it reaches the end of the input stream. CALL-WITH-INPUT-SERIALIZER port proc &optional extension [function] PROC must be a procedure which takes one argument, serializer. This procedure creates an input serializer from PORT and EXTENSION argument, and passes it to PROC. REGISTER-OBJECT-TO-INPUT-SERIALIZER object serializer [function] &optional key Add OBJECT to the reference lookup table of SERIALIZER. This procedure is needed to extend input serializer. See "Extending the serializer" section below. SERIALIZER->PORT serializer [function] Return a port associated to the serializer. Serialized format ================= Numbers, booleans, characters and the emptylist are written out the same way as the Scheme external presentation. Other types are preceded by a tag presenting its type, then written out in type dependent way. Those objects are also assigned a reference number in the order of appearance in the serializer, and if the same object (eq?-sense) appears more than once, the second and latter appearances are presented by REFERENCE, which is a reference tag and the number the object is assigned. Following tags are currently used: y : symbol, followed by its name p : pair, followed by its car and cdr. v : vector, followed by its length, then its elements s : string, followed by the string itself. r : reference, followed by the reference number. Here's a couple of examples. Form: (1 2 "3" #(a b c) a) Serialized: p 1 p 2 p s "3" p v 3 y a y b y c p r 6 () Form: #0=(a b c . #0#) ;; circular list Serialized: p y a p y b p y c r 0 Design note 1: To serialize a variable length data structure, you need a mechanism to specify the end of the structure. There're two ways to do it; to put a special terminator after the contents of the structure, or to put a size of the structure before the contents. Scheme external presentation uses the former method to mark the end of lists and vectors. Pserializer uses the latter method, except a list which is written as a sequence of pairs. For it is simpler to handle references (consider a recursive vector which has itself in one of its element). Design note 2: There's no "magic number" or "header" in the serialized form to indicates, for example, the version of the format. If the application is planned to use multiple versions of the serialized format incompatible to each other, it's up to the application implementator to insert such information to a serialized output. Extending the serializer ======================== You can extend the serializer to deal with implementation dependent objects, by passing an extention specification to the optional parameter EXTENTION to MAKE-{INPUT|OUTPUT}-SERIALIZER. Extension specification is a list of reader/writer specification. A reader/writer specification is a list of four elements; a tag symbol, a test procedure, a writer procedure and a reader procedure. You can choose an arbitrary symbol as the tag except the ones already used for Scheme primitives, shown in the previous section. When a serializer enconters an object of unknown type, it applies the test procedure on the object in the order it appears in the extension specification until it returns true. Then the serializer writes an associated tag, and call a write procedure with the serializer and the object to be written. The reference is handled in the serializer so the write procedure need not care about it. If no test succeeds, the serializer reports an error. When a deserializer encounters an unknown tag, it looks for the matching tag in the extension specification. If found, the associated reader procedure is called with the input serializer. The reader procedure is then responsible to read the information, to reconstruct the object, to register it and to return it. Registering the object is done by calling REGISTER-OBJECT-TO-INPUT-SERIALIZER. It assigns reference number to the object so that later the input serializer can refer to it. Registering should be done _before_ any new Scheme object is read from the input serializer, to keep the reference counter in sync, and to allow the circular structure to be serialized. For example, a vector reader first reads the length of the vector, constructs a vector with undefined contents, registers it to the serializer, then proceeds to read its elements and fills the vector contents. You can't read the elements first then construct the vector, since the reference counter will be wrong if you do so, and also you can't deal with recursive vector. Because of this, you cannot deserialize an object which requires its element ready at the construction time. An example of extension is found in pserializer-stk.scm in the distribution. Porting ======= Following implementation-dependent procedures must be defined in the source. The example implementation for SLIB and STk is provided in the original source. PSERIALIZER:MAKE-HASH-TABLE [function] Returns a hash table PSERIALIZER:HASH-TABLE-GET hashtable key [function] Returns a value associated to the KEY. KEY can be any Scheme object, and comparison must be done by eq?. If no entry for KEY is defined, it must return #f. PSERIALIZER:HASH-TABLE-PUT! hashtable key value [function] Add VALUE associated to KEY to the hashtable. PSERIALIZER:ERROR format &rest args [function] Report error. FORMAT and ARGS is the same as format procedure which can be found in CommonLisp and other Lisp dialects. pserializer uses only ~a and ~s formatter.