Serializer - Generic serializer interface for STk

version 0.3.3

Shiro Kawai (shiro@acm.org)


Table of Contents


Copyright (C) 2000 Shiro Kawai (shiro@acm.org)

1. Introduction

This package provides an interface for a serializer, which converts Scheme objects to an external presentation which can later be read back to reconstruct the structure topologically equal to the original one.

Object serialization technique is important when applications exchange data with each other by persistent storage or over networks. Some programming languages such as Java and Python support the serialization feature by the standard library. Scheme does have a clearly defined external presentation, but it doesn't handle shared substructure, circular reference, or user defined classes.

Having a standard serializer is handy but also has its drawback. The requirements for a serializer may differ among applications; some applications may need human-readable ASCII representation, while others need to have binary ones prefering serialization speed, even compensating the portability of the serialized data.

Instead of providing a single instance of implementation, this package defines an abstract class <serializer> for the common interface. Libraries which need a serializer can use <serializer> class without knowing actual implementation. You can implement your own serializer independently, and just "plug-in" it to your application.

A couple of implementations, <aserializer> and <dserialzier>, come with this package.

2. Installation

The most recent version of this document and the package can be obtained at the following URL. (The package tarball includes the document as well).

You need STk 4.0.1 or above. STk is available at http://kaolin.unice.fr/stk/.

3. Using a serializer

A serializer interface is available by this `require' form:

(require "serializer")

If you actually create an instance of a serializer, you need to use one of actual implementations instead of serializer module. An implementation called aserializer comes with the serializer package. Thus you can say the following instead of the above.

(require "aserializer")

3.1 Class <serializer>

Class: <serializer>
An abstract base class defining the serializer interface. This class implements predicates and accessors, but the actual serialization algorithm should be implemented by the subclasses.

This class has three slots.

Instance Variable: <serializer> port
A port associated to this serializer. If the serializer is an input serializer, it reads the data from this port. If the serializer is an output serializer, it writes the data to this port.

Instance Variable: <serializer> direction
This slot contains :in if this is an input serializer, or :out if this is an output serializer.

Instance Variable: <serializer> preserve-equality
This slot contains a boolean value indicating the serializer should preserve equality of serialized format. See the description below.

A serializer is instantiated by the generic method make.

Generic method: make class &key :port :direction :preserve-equality

Create a serializer when class is a subclass of <serializer>.

The keyword argument :port is mandatory. It specifies a port the serializer is associated to.

The keyword argument :direction must be either :in or :out, specifying the direction of the serializer. If omitted, it is guessed by the direction of the port; the direction is :in if the port is an input port, and :out if the port is an output port. If port is a bidirectional port, direction should be specified (there's no bidirectional serializer). If the direction of port and direction contradict, an error is signalled.

In general, serialized forms may or may not equal even if the original objects are equal, since a serializer may insert extra information such as a timestamp in the serialized form. In other words, even if (equal? a b) is true for two objects a and b, (equal? (write-to-string-with-serializer a) (write-to-string-with-serializer b)) need not be true (of course, if you deserialize the serialized forms using read-from-string-with-serializer, they must produce objects equal? to each other). However, if the keyword argument :preserve-equality is to advice the serializer that, when it serializes objects equal? to each other, its serialized form should be equal, too. This feature is used, for example, a dbm interface which passes a serialized form as a key to the dbm database.

Two methods are defined to perform actual serialization. Subclasses will overload these method.

Generic method: write-to-serializer (ser <serializer>) obj
An interface to serialize an object obj. A subclass must provide the actual implementation.

Types of object which can be serializable depends on the implementation. For the general purpose serializer, however, the application programmer can assume that the most of standard scheme objects, kewords, hash tables and STklos objects are serializable. See section 4. Serializer behavior for details.

This method may throw an error when it encounters an unserializable object.

Generic method: read-from-serializer (ser <serializer>)
An interface to retrieve an object from its serialized form. A subclass must provide the actual implementation.

If there's no more object in the input serializer, an eof-object is returned.

Following utility methods are also defined.

Generic method: serializer? obj
Generic method: input-serializer? obj
Generic method: output-serializer? obj
Predicates. serializer? returns true iff obj is a serializer. input-serializer? returns true iff obj is a serializer and its direction is input. output-serializer? returns true iff obj is a serializer and its direction is output.

Generic method: port-of (obj <serializer>)
Returns a port associated to a serializer obj.

Generic method: direction-of (obj <serializer>)
Returns the direction (either :in or :out) of a serializer obj.

Generic method: write-to-string-with-serializer (serializer-class <class>) obj &rest options
Generic method: write-to-file-with-serializer (serializer-class <class>) obj filename &rest options

Convenience funcionts to create a serialized form of a Scheme object obj. write-to-string-with-serializer returns a serialized form as a Scheme string. write-to-file-with-serializer writes out a serialized form to the file specified by filename.

A temporary output serializer of class serializer-class is created to serialize the object. Its port and direction are set appropriately. If extra arguments options are provded, they are passed to the constructor of the serializer (make) as well.

Generic method: read-from-string-with-serializer (serializer-class <class>) str &rest options
Generic method: read-from-file-with-serializer (serializer-class <class>) filename &rest options

Convenience funcionts to read a Scheme object from its serialized form. read-from-string-with-serializer takes a serialized from from a string str, and read-from-file-with-serializer from a file specified by filename.

A temporary input serializer of class serializer-class is created to deserialize the object. Its port and direction are set appropriately. If extra arguments options are provded, they are passed to the constructor of the serializer (make) as well.

3.2 Extending a serializer

A simple extension mechansism is provided so that the user can customize serializer behavior on STklos object without knowing the actual implemantation.

Generic method: get-serializable-slots (obj <object>)
This method should return a list of symbols each of which represents a slot name to be serialized.

The output serializer retrieves the value of those slots by slot-ref. The input serializer uses slot-set! on those slots. The input serializer sets the slot values in the order of the list get-serlializable-slots returns.

Default method returns all slots except virtual ones.

Alternatively, an implementation of a serializer may define its own extension protocol. This may potentially more efficient, although it can only be used only with that implementation.

4. Serializer behavior

Various serialization algorithms can be implemented by subclassing <serializer>, overloading write-to-serializer and read-from-serializer method. We don't set any restriction to the serializer behavior. The application programmer may implement his own serializer with supporting only limited type of objects if it is all the application needs.

However, to ease the programming, we define a few general guidelines for the behavior of serializers. If you aim to implement a general purpose serializer, it is recommended to follow these guidelines.

4.1 Serializable objects

A general purpose serializer should support following STk primitive types:

The items marked by (*) may have certain restrictions described in the following sections.

4.2 Object eq?-ness

When a serializer is created, it creates an internal object dictionary. For each object which goes through the serializer, it registers the object to the dictionary so that when it encounters the same (eq?) object again it can use a reference of the object to preserve eq?-ness.

Each serializer keeps its own dictionary, and which is kept until the serializer is garbage-collected. Thus object eq?-ness is not preserved among different serializers.

There are other cases where object eq?-ness is not preserved:

4.3 STklos instance serialization

General purpose serializer may take an STklos instance. The instance does not need to be a subclass of certain class. However, to ease the implementation of serializers, it is desirable for the instances to be serialized to follow the following guidelines.

First, the instance must be created without any initialization parameters, and all the serializable slots have to be able to be set by slot-set!. For the input serializer may not know the initialization protocol of your object, so it may create an instance without initialization parameters and then fill out the slot values using slot-set!.

Second, full information about metaobjects may not serialized. For the input serializer to work, the classes (and metaclasses) of the serialized object should be defined before calling read-from-serializer. With this assumption, the output serializer can only puts the minimal class information for instances to be serialized.

The implementation may detect if the class on memory matches the one on the serialized form and throws an error (like Java), or may try to fill the new class slots by the ones read from the serialized form as much as possible.

5. Implementations

A couple of simple implementations are available in the package.

5.1 Aserializer

A simple implementation, aserializer, is provided with the serializer package. It follows the serializer guidelines. The serialized data is represented by ASCII characters, and portable among different architectures. It is not particularly optimized for speed nor space.

Class: <aserializer>
Implements a serializer, based on a `portable serializer'.

The object is serialized as followings:

  1. Characters, numbers, booleans and an empty list are serialized to the standard external representation.
  2. Other objects are serialized as a tag (one-letter symbol) followed by parameters. First, the object is registered to the object dictionary, and assigned a serial number begins from the first object the serializer encounters. If the object already appeared, it is serialized as a tag `r' followed by a serial number. Otherwise, it is serialized as described below.
  3. If the object is a symbol, a tag `y' followed by a symbol name is the representation.
  4. If the object is a keyword, it is serialized by a tag `k' followed by the name of the keyword.
  5. If the object is a string, a tag `s' followed by a symbol name is the representation.
  6. If the object is a pair, it is serialied by a tag `p', followred by serialized car value and cdr value.
  7. If the object is a vector, it is serialized by a tag `v', followed by a number of elements, followed by the serialized form of each elements.
  8. If the object is a hash table, it is serialized by a tag `h', followed by a list of key/value pairs.
  9. if the object is an STklos instance, it is serialized by a tag `i', followed by a classinfo which is a cons of the name of the class and the list of names of the serializable slots, then followed by actual slot values. In the current implementation, an unbound slot is serialized as if it has a value #f. Later version will properly treat it as an unbound slot. When an instance is read back, the slot values are recovered according to the classinfo. If the class implementation has been changed since the instance was serialized, the serializer fills only the slots whose name match the ones in the classinfo. If a slot is missing in the class implementation on the memory, the value of the slot is simply discarded.

5.2 Dserializer

Class: <dserializer>
A default or dummy serializer implementation. This is just a wrapper of STk built-in write* and read. The capability is limited to the objects those functions can deal with: booleans, numbers, symbols, strings, lists and vectors. The output serializer doesn't complain if an object other than these types is passed, but it cannot be read back by the input serializer. Eq?-ness of strings are not preserved. Circular references and shared substructures are, however, properly treated.

In spite of these restrictions, sometimes this is useful when you know your object to be serialized meets the conditions. And it works fast.

Index

Jump to: < - d - g - i - m - o - p - r - s - w

<

  • <aserializer>
  • <dserializer>
  • <serializer>
  • d

  • direction-of
  • g

  • get-serializable-slots
  • i

  • input-serializer?
  • m

  • make
  • o

  • output-serializer?
  • p

  • port-of
  • r

  • read-from-file-with-serializer
  • read-from-serializer
  • read-from-string-with-serializer
  • s

  • serializer?
  • w

  • write-to-file-with-serializer
  • write-to-serializer
  • write-to-string-with-serializer

  • This document was generated on 5 December 2000 using texi2html 1.56k.