For Development HEAD DRAFTSearch (procedure/syntax/module):

9.40 gauche.vport - Virtual ports

Module: gauche.vport

Virtual ports, or procedural ports, are the ports whose behavior can be programmed in Scheme.

This module provides two kinds of virtual ports: Fully virtual ports, in which every I/O operation invokes user-provided procedures, and virtual buffered ports, in which I/O operations are done on an internal buffer and user-provided procedures are called only when the buffer needs to be filled or flushed.

This module also provides virtual buffered ports backed up by a uniform vector, as an example of the feature.

Fully virtual ports

This type of virtual ports are realized by classes <virtual-input-port> and <virtual-output-port>. You can customize the port behavior by setting appropriate slots with procedures.

Class: <virtual-input-port>

{gauche.vport} An instance of this class can be used as an input port. The behavior of the port depends on the settings of the instance slot values.

To work as a meaningful input port, at least either one of getb or getc slot must be set. Otherwise, the port returns EOF for all input requests.

Instance Variable of <virtual-input-port>: getb

If set, the value must be a procedure that takes no arguments. Every time binary input is required, the procedure is called.

The procedure must return an exact integer between 0 and 255 inclusive, or #f or an EOF object. If it returns an integer, it becomes the value read from the port. If it returns other values, the port returns EOF.

If the port is requested a character input and it doesn’t have the getc procedure, the port calls this procedure, possibly multiple times, to construct a whole character.

Instance Variable of <virtual-input-port>: getc

If set, the value must be a procedure that takes no arguments. Every time character input is required, the procedure is called.

The procedure must return a character, #f or an EOF object. If it returns a character, it becomes the value read from the port. If it returns other values, the port returns EOF.

If the port is requested a binary input and it doesn’t have the getb procedure, the port calls this procedure, then converts a character into a byte sequence, and use it as the binary value(s) read from the port.

Instance Variable of <virtual-input-port>: gets

If set, the value must be a procedure that takes one argument, a positive exact integer. It is called when the block binary input, such as read-uvector, is requested. It must return a (maybe incomplete) string up to the specified size, or #f or EOF object. If it returns a null string, #f or EOF object, the port thinks it reached EOF. If it returns other string, it is used as the result of block read. It shouldn’t return a string larger than the given size (Note: you must count size (bytes), not the number of characters). The reason of this procedure is efficiency; if this procedure is not provided, the port calls getb procedure repeatedly to prepare the block of data. In some cases, providing block input can be much more efficient (e.g. suppose you’re reading from a block of memory chunk).

You can leave this slot unset if you don’t need to take such advantage.

Instance Variable of <virtual-input-port>: ready

If set, the value must be a procedure that takes one boolean argument. It is called when char-ready? or byte-ready? is called on the port. The value returned from your procedure will be the result of these procedures.

The boolean argument is #t if char-ready? is called, or #f if byte-ready? is called.

If unset, char-ready? and byte-ready? always return #t on the port

Instance Variable of <virtual-input-port>: close

If set, the value must be a procedure that takes no arguments. It is called when the port is closed. Return value is discarded. You can leave this unset if you don’t need to take an action when the port is closed.

This procedure may be called from a finalizer, so you have to be careful to write it. See the note on finalization below.

Instance Variable of <virtual-input-port>: seek

If set, the value must be a procedure that takes two arguments, offset and whence. The meaning of them is the same as the arguments to port-seek (see Common port operations). The procedure must adjust the port’s internal read pointer so that the next read begins from the new pointer. It should return the updated pointer (the byte offset from the beginning of the port).

If unset, call of port-seek and port-tell on this port will return #f.

Note that this procedure may be called for the purpose of merely querying the current position, with 0 as offset and SEEK_CUR as whence. If your port knows the read pointer but cannot move it, you can still provide this procedure, which returns the current pointer position for such queries and returns #f for other arguments.

Class: <virtual-output-port>

{gauche.vport} An instance of this class can be used as an output port. The behavior of the port depends on the settings of the instance slot values.

To work as an output port, at least either one of putb or putc slot has to be set.

Instance Variable of <virtual-output-port>: putb

If set, the value must be a procedure that takes one argument, a byte value (exact integer between 0 and 255, inclusive). Every time binary output is required, the procedure is called. The return value of the procedure is ignored.

If this slot is not set and binary output is requested, the port may signal an <io-unit-error> error.

Instance Variable of <virtual-output-port>: putc

If set, the value must be a procedure that takes one argument, a character. Every time character output is required, the procedure is called. The return value of the procedure is ignored.

If this slot is not set but putb slot is set, the virtual port decomposes the character into a sequence of bytes then calls putb procedures.

Instance Variable of <virtual-output-port>: puts

If set, the value must be a procedure that takes a (possibly incomplete) string. The return value of the procedure is ignored.

This is for efficiency. If this slot is not set, the virtual port calls putb or putc repeatedly to output a chunk of data. But if your code can perform chunked output efficiently, you can provide this procedure.

Instance Variable of <virtual-output-port>: flush

If set, the value must be a procedure that takes no arguments. It is called when flushing a port is required (e.g. flush is called on the port, or the port is being closed).

This procedure is useful that your port does some sort of buffering, or needs to keep some state. If your port doesn’t do stateful operation, you can leave this unset.

This procedure may be called from a finalizer, and needs a special care. See notes on finalizers below.

Instance Variable of <virtual-output-port>: close

The same as <virtual-input-port>’s close slot.

Instance Variable of <virtual-output-port>: seek

The same as <virtual-input-port>’s seek slot.

Virtual buffered ports

This type of virtual ports are realized by classes <buffered-input-port> and <buffered-output-port>. You can customize the port behavior by setting appropriate slots with procedures.

Those ports have internal buffer and only calls Scheme procedures when the buffer needs to be filled or flushed. Generally it is far more efficient than calling Scheme procedures for every I/O operation. Actually, the internal buffering mechanism is the same as Gauche’s file I/O ports.

These ports uses u8vector as a buffer. See Uniform vectors for the details.

Class: <buffered-input-port>

{gauche.vport} An instance of this class behaves as an input port. It has the following instance slots. For a meaningful input port, you have to set at least fill slot.

Instance Variable of <buffered-input-port>: fill

If set, it must be a procedure that takes one argument, a u8vector. It must fill the data from the beginning of the vector. It doesn’t need to fill the entire vector if there’s not so many data. However, if there are remaining data, it must fill at least one byte; if the data isn’t readily available, it has to wait until some data becomes available.

The procedure must return a number of bytes it actually filled. It may return 0 or an EOF object to indicate the port has reached EOF.

Instance Variable of <buffered-input-port>: ready

If set, it must be a procedure that takes no arguments. The procedure must return a true value if there are some data readily available to read, or #f otherwise. Unlike fully virtual ports, you don’t need to distinguish binary and character I/O.

If this slot is not set, the port is regarded as it always has data ready.

Instance Variable of <buffered-input-port>: close

If set, it must be a procedure that takes no arguments. The procedure is called when the virtual buffered port is closed. You don’t need to set this slot unless you need some cleaning up when the port is closed.

This procedure may be called from a finalizer, and needs special care. See the note on finalization below.

Instance Variable of <buffered-input-port>: filenum

If set, it must be a procedure that returns underlying file descriptor number (exact nonnegative integer). The procedure is called when port-file-number is called on the port.

If there’s no such underlying file descriptor, you can return #f, or you can leave this slot unset.

Instance Variable of <buffered-input-port>: seek

If set, it must be a procedure that takes two arguments, offset and whence. It works the same way as <virtual-input-port>’s seek procedure; see above.

This procedure may be called from a finalizer, and needs special care. See the note on finalization below.

Besides those slot values, you can pass an exact nonnegative integer as the :buffer-size keyword argument to the make method to set the size of the port’s internal buffer. If :buffer-size is omitted, or zero is passed, the system’s default buffer size (something like 8K) is used. :buffer-size is not an instance slot and you cannot set it after the instance of the buffered port is created. The following example specifies the buffered port to use a buffer of size 64K:

(make <buffered-input-port> :buffer-size 65536 :fill my-filler)
Class: <buffered-output-port>

{gauche.vport} An instance of this class behaves as an output port. It has the following instance slots. You have to set at least flush slot.

Instance Variable of <buffered-output-port>: flush

If set, it must be a procedure that takes two arguments, an u8vector buffer and a flag. The procedure must output data in the buffer to somewhere, and returns the number of bytes actually output.

If the flag is false, the procedure may output less than entire buffer (but at least one byte). If the flag is true, the procedure must output entire buffer.

Instance Variable of <buffered-output-port>: close

Same as <buffered-input-port>’s close slot.

Instance Variable of <buffered-output-port>: filenum

Same as <buffered-input-port>’s filenum slot.

Instance Variable of <buffered-output-port>: seek

Same as <buffered-input-port>’s seek slot.

Besides those slot values, you can pass an exact nonnegative integer as the :buffer-size keyword argument to the make method to set the size of the port’s internal buffer. See the description of <buffered-input-port> above for the details.

Uniform vector ports

The following two procedures return a buffered input/output port backed up by a uniform vector. The source or destination vector can be any type of uniform vector, but they will be aliased to u8vector (see uvector-alias in Uvector conversion operations).

If used together with pack/unpack (see binary.pack - Packing binary data), it is useful to parse or construct binary data structure. It is also an example of using virtual ports; read gauche/vport.scm (or ext/vport/vport.scm in the source tree) if you’re curious about the implementation.

Function: open-input-uvector uvector

{gauche.vport} Returns an input port that reads the content of the given uniform vector uvector from its beginning. If reading operation reaches the end of uvector, EOF is returned. Seek operation is also implemented.

Function: open-input-bytevector u8vector

[R7RS base] {gauche.vport} Similar to open-input-uvector, but the argument must be an u8vector. This is an R7RS base procedure.

Function: open-output-uvector :optional uvector :key extendable

{gauche.vport} Returns an output port that uses the given uvector as the storage for the data output to the port.

If uvector is completely filled, what happens after that depends on extendable - if it is false (default), the rest of data is discarded silently. If it is true, the storage is extended automatically to accommodate more data.

If you give true value to extendable, you have to retrieve the result by get-output-uvector below, since the uvector you passed in won’t contain spilled data.

As a special case, you can omit uvector argument; then u8vector is used as the storage. In that case you can’t specify extendable keyword argument, but it is assumed true, since it won’t make sense otherwise. Use get-output-uvector to retrieve the stored result.

Seek operation is also implemented. Note that the meaning of SEEK_END whence differ between extendable and fixed-size uvector ports. For extendable ports, the end whence placed next to the biggest offset of the data ever written; if you open a port and just write one byte, the end whence is the second byte, no matter how big the existing buffer is. On the other hand, for fixed-size uvector ports, end whence is fixed to the next to the end of the given buffer, no matter how much data you’ve written to it. In the latter case, you can’t seek on or past the end (you need to pass negative number along SEEK_END to port-seek).

Function: open-output-bytevector

[R7RS base] {gauche.vport} Same as open-output-uvector without arguments. Uses extenable u8vector as the buffer. This is an R7RS base procedure.

Function: get-output-uvector port :key shared

{gauche.vport} If port is a port created by open-output-uvector, returns a uvector that contains accumulated data. If port is not a port created by open-output-uvector, #f is returned.

The returned uvector is the same type as the one passed to open-output-uvector, containing up to actually written data; it may be smaller than the uvector passed to open-output-uvector; it can be larger if the port is extendable.

If the type of uvector is other than s8vector and u8vector, and the written data doesn’t fill up the whole element won’t be in the result. For example, if you use s32vector to create the port, then write 7 bytes to it, get-output-uvector returns a single element s32vector, for the last 3 bytes does not consist a whole 32bit integer.

By default, the returned vector is a fresh copy of the contents. Passing true value to shared may avoid copying and allow sharing storage for the one being used by port. If you do so, keep in mind that if you seek back and write to port subsequently, the content of returned vector may be changed.

Function: get-output-bytevector port

[R7RS base] {gauche.vport} Extract the data put to an bytevector output port as an u8vector. The port must be created by open-output-bytevector or open-output-uvector. This is an R7RS base procedure.

List ports

The following procedures allow you to use list of characters or octets as a source of an input port. These are (a kind of) opposite of port->list family (see Input utility functions) or port->char-lseq family (see Lazy sequences).

Function: open-input-char-list char-list
Function: open-input-byte-list byte-list

{gauche.vport} Creates and returns an input port that uses the given list of characters and bytes as the source.

(read (open-input-char-list '(#\a #\b)))
 ⇒ ab
Function: get-remaining-input-list port

{gauche.vport} If port is the one created by open-input-char-list or open-input-byte-list, returns a list of remaining data that hasn’t been read yet. If the port already read everything, or the port is not the one created by open-input-char-list or open-input-byte-list, an empty list is returned.

A caveat: Gauche allows mixing binary input and textual input from the same port. If you read or even peek a byte from a port created from a character list, the port buffers a character and disassembles it to bytes; the disassembled character may not be included in the remaining input list.

Generator ports

The following procedures allow you to use character generators or byte generators as a source of an input port. These are (a kind of) opposite of port->char-generator family (see Generator constructors).

Function: open-input-char-generator cgen
Function: open-input-byte-generator bgen

{gauche.vport} Creates and returns an input port that uses the given generators as the source. The cgen argument must be a generator that yields characters. The bgen argument must be a generator that yields bytes (exact integers between 0 and 255, inclusive). An error will be raised if the given generator yields incorrect type of objects.

(read (open-input-char-generator (string->generator "foo")))
 ⇒ foo

Since the generators are objects relying on side effects, you shouldn’t use cgen or bgen after you pass them to those procedures; if you use them afterwards, the result is undefined.

Function: get-remaining-input-generator port

{gauche.vport} If port is the one created by open-input-char-generator or open-input-byte-generator, returns a generator that yields the characters or bytes that haven’t been read yet. If the port already read everything, an empty generator is returned.

Once you take the remaining input generator, you should no longer read from the input generator ports; they share internal states and mixing them will likely to cause unexpected behaviors. If side-effects safe behavior is desired, use lazy sequence and input list ports.

Accumulator ports

Accumulators are dual to generators; it’s a procedure that accepts a value at a time, and the end of the value is indicated by EOF. See scheme.generator - R7RS generators, for the basic operations of accumulators.

The following procedures turns an accumulator that accepts characters or octets into an output port.

Function: open-output-char-accumulator acc

{gauche.vport} Returns an output port that sends the output characters to an accumulator acc, which takes a character as the argument. When the returned output port is closed, EOF is passed to acc.

Note: The behavior is undefined if you try to perform binary output to the returned output port.

Function: open-output-char-accumulator acc

{gauche.vport} Returns an output port that sends the output bytes to an accumulator acc, which takes a byte as the argument. When the returned output port is closed, EOF is passed to acc.

A character sent to the output port is converted to octets in the Gauche’s native encoding.

Function: open-output-accumulator acc

{gauche.vport} The accumulator acc must accept a byte, a character or a string. Returns an output port that sends the output data to the acc. When the returned output port is closed, EOF is passed to acc.

Note on finalization

If an unclosed virtual port is garbage collected, its close procedure is called (in case of virtual buffered ports, its flush procedure may also be called before close procedure). It is done by a finalizer of the port. Since it is a part of garbage-collection process (although the Scheme procedure itself is called outside of the garbage collector main part), it requires special care.



For Development HEAD DRAFTSearch (procedure/syntax/module):
DRAFT