[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22 Input and Output


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.1 Ports

Builtin Class: <port>

A port class. A port is Scheme’s way of abstraction of I/O channel. Gauche extends a port in number of ways so that it can be used in wide range of applications.

Textual and binary I/O

R7RS defines textual and binary ports. In Gauche, most ports can mix both text I/O and binary I/O. It is cleaner to think the two is distinct, for they are sources/sinks of different types of objects and you don’t need to mix textual and binary I/O.

In practice, however, a port is often a tap to an untyped pool of bytes and you may want to decide interpret it later. One example is the standard I/O; in Unix-like environment, it’s up to the program to use pre-opened ports for textual or binary I/O. R7RS defines the initial ports for current-input-port etc. are textual ports; in Gauche, you can use either way.

Conversion

Some ports can be used to convert a data stream from one format to another; one of such applications is character code conversion ports, provided by gauche.charconv module (See section gauche.charconv - Character Code Conversion, for details).

Extra features

There are also a ports with special functionality. A coding-aware port (See section Coding-aware ports) recognizes a special "magic comment" in the file to know which character encoding the file is written. Virtual ports (See section gauche.vport - Virtual ports) allows you to program the behavior of the port in Scheme.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.2 Port and threads

When Gauche is compiled with thread support, the builtin port operations locks the port, so that port access from multiple threads will be serialized. (It is required by SRFI-18, BTW). Here, "builtin port operations" are the port access functions that takes a port and does some I/O or query on it, such as read/write, read-char/write-char, port->string, etc. Note that call-with-* and with-* procedures do not lock the port during calling the given procedures, since the procedure may pass the reference of the port to the other thread, and Gauche wouldn’t know if that’s the case.

This means you don’t need to be too paranoia to worry about ports under multithreaded environment. However, keep it in mind that this locking mechanism is meant to be a safety net from breaking the port’s internal state, and not to be a general mutex mechanism. It assumes port accesses rarely conflict, and uses spin lock to reduce the overhead of majority cases. If you know there will be more than one thread accessing the same port, you should use explicit mutex to avoid conflicts.

Function: with-port-locking port thunk

Executes thunk, while making the calling thread hold the exclusive lock of port during the dynamic extent of thunk.

Calls of the builtin port functions during the lock is held would bypass mutex operations and yield better performance.

Note that the lock is held during the dynamic extent of thunk; so, if thunk invokes a continuation captured outside of with-port-locking, the lock is released. If the continuation captured within thunk is invoked afterwards, the lock is re-acquired.

With-port-locking may be nested. The lock is valid during the outermost call of with-port-locking.

Note that this procedure uses the port’s built-in lock mechanism which uses busy wait when port access conflicts. It should be used only for avoiding fine-grain lock overhead; use explicit mutex if you know there will be conflicts.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.3 Common port operations

Function: port? obj
Function: input-port? obj
Function: output-port? obj

[R7RS] Returns true if obj is a port, an input port and an output port, respectively. Port? is not listed in the R5RS standard procedures, but mentioned in the "Disjointness of Types" section.

Function: port-closed? port

Returns true if obj is a port and it is already closed. A closed port can’t be reopened.

Parameter: current-input-port
Parameter: current-output-port
Parameter: current-error-port

[R7RS] Returns the current input, output and error output port, respectively.

R7RS defines that the initial values of these ports are textual ports. In Gauche, initial ports can handle both textual and binary I/O.

Values of the current ports can be temporarily changed by parameterize (See section gauche.parameter - Parameters), though you might want the convenience procedures such as with-output-to-string or with-input-from-file in typical cases.

 
(use gauche.parameter)
(let1 os (open-output-string)
  (parameterize ((current-output-port os))
    (display "foo"))
  (get-output-string os))
 ⇒ "foo"
Parameter: standard-input-port
Parameter: standard-output-port
Parameter: standard-error-port

Returns standard i/o ports at the time the program started. These ports are the default values of current-input-port, current-output-port and current-error-port, respectively.

You can also change value of these procedures by parameterize, but note that (1) current-*-ports are initialized before the program execution, so changing values of standard-*-port won’t affect them, and (2) changing values these procedures only affect Scheme-world, and does not change system-level stdio file descriptors low-level libraries referring.

Function: with-input-from-port port thunk
Function: with-output-to-port port thunk
Function: with-error-to-port port thunk

Calls thunk. During evaluation of thunk, the current input port, current output port and current error port are set to port, respectively. Note that port won’t be closed after thunk is executed.

Function: with-ports iport oport eport thunk

Does the above three functions at once. Calls thunk while the current input, output, and error ports are set to iport, oport, and eport, respectively. You may pass #f to any port argument(s) if you don’t need to alter the port(s).

Note that port won’t be closed after thunk is executed. (Unfortunately, recent Scheme standards added a similar named procedure, call-with-port, which does close the port. See below.)

Function: close-port port
Function: close-input-port port
Function: close-output-port port

[R7RS] Closes the port. Close-port works both input and output ports, while close-input-port and close-output-port work only for the respective ports and throws an error if another type of port is passed.

Theoretically, only close-port would suffice; having those three is merely for historical reason. R5RS has close-input-port and close-output-port; R6RS and R7RS support all three.

Function: call-with-port port proc

[R7RS] Calls proc with one argument, port. After proc returns, or it throws an uncaptured error, port is closed. Value(s) returned from proc will be the return value(s) of call-with-port.

Function: port-type port

Returns the type of port in one of the symbols file, string or proc.

Function: port-name port

Returns the name of port. If the port is associated to a file, it is the name of the file. Otherwise, it is some description of the port.

Function: port-buffering port
Function: (setter port-buffering) port buffering-mode

If port is type of file port (i.e. (port-type port) returns file), these procedures gets and sets the port’s buffering mode. For input ports, the port buffering mode may be either one of :full, :modest or :none. For output ports, port-buffering, it may be one of :full, :line or :none. See section File ports, for explanation of those modes.

If port-buffering is applied to ports other than file ports, it returns #f. If the setter of port-buffering is applied to ports other than file ports, it signals an error.

Function: port-current-line port

Returns the current line count of port. This information is only available on file-based port, and as long as you’re doing sequential character I/O on it. Otherwise, this returns -1.

Function: port-file-number port

Returns an integer file descriptor, if the port is associated to the system file I/O. Returns #f otherwise.

Function: port-seek port offset :optional whence

If the given port allows random access, this procedure sets the read/write pointer of the port according to the given offset and whence, then returns the updated offset (number of bytes from the beginning of the data). If port is not random-accessible, #f is returned. In the current version, file ports and input string ports are fully random-accessible. You can only query the current byte offset of output string ports.

Note that port position is represented by byte count, not character count.

It is allowed to seek after the data if port is an output file port. See POSIX lseek(2) document for details of the behavior. For input file port and input string port, you can’t seek after the data.

The whence argument must be a small integer that represents from where offset should be counted. The following constant values are defined.

SEEK_SET

Offset represents the byte count from the beginning of the data. This is the default behavior when whence is omitted.

SEEK_CUR

Offset represents the byte count relative to the current read/write pointer. If you pass 0 to offset, you can get the current port position without changing it.

SEEK_END

Offset represents the byte count relative to the end of the data.

Function: port-tell port

Returns the current read/write pointer of port in byte count, if port is random-accessible. Returns #f otherwise. This is equivalent to the following call:

 
(port-seek port 0 SEEK_CUR)

Note on the names: Port-seek is called seek, file-position or input-port-position/ output-port-position on some implementations. Port-tell is called tell, ftell or set-file-position!. Some implementations have port-position for different functionality. CommonLisp has file-position, but it is not suitable for us since port need not be a file port. Seek and tell reflects POSIX name, and with Gauche naming convention we could use sys-seek and sys-tell; however, port deals with higher level of abstraction than system calls, so I dropped those names, and adopted new names.

Function: copy-port src dst :key (unit 0) (size #f)

Copies data from an input port src to an output port dst, until eof is read from src.

The keyword argument unit may be zero, a positive exact integer, a symbol byte or a symbol char, to specify the unit of copying. If it is an integer, a buffer of the size (in case of zero, a system default size) is used to copy, using block I/O. Generally it is the fastest if you copy between normal files. If unit is a symbol byte, the copying is done byte by byte, using C-verson of read-byte and write-byte. If unit is a symbol char, the copying is done character by character, using C-version of read-char and write-char.

If nonnegative integer is given to the keyword argument size, it specifies the maximum amount of data to be copied. If unit is a symbol char, size specifies the number of characters. Otherwise, size specifies the number of bytes.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.4 File ports

Function: open-input-file filename :key if-does-not-exist buffering element-type encoding conversion-buffer-size
Function: open-output-file filename :key if-does-not-exist if-exists buffering element-type encoding conversion-buffer-size

[R7RS+] Opens a file filename for input or output, and returns an input or output port associated with it, respectively.

The keyword arguments specify precise behavior.

:if-exists

This keyword argument can be specified only for open-output-file, and specifies the action when the filename already exists. One of the following value can be given.

:supersede

The existing file is truncated. This is the default behavior.

:append

The output data will be appended to the existing file.

:overwrite

The output data will overwrite the existing content. If the output data is shorter than the existing file, the rest of existing file remains.

:error

An error is signaled.

#f

No action is taken, and the function returns #f.

:if-does-not-exist

This keyword argument specifies the action when filename does not exist.

:error

An error is signaled. This is the default behavior of open-input-file.

:create

A file is created. This is the default behavior of open-output-file. The check of file existence and creation is done atomically; you can exclusively create the file by specifying :error or #f to if-exists, along this option. You can’t specify this value for open-input-file.

#f

No action is taken, and the function returns #f.

:buffering

This argument specifies the buffering mode. The following values are allowed. The port’s buffering mode can be get/set by port-buffering. (See section Common port operations).

:full

Buffer the data as much as possible. This is the default mode.

:none

No buffering is done. Every time the data is written (to an output port) or read (from an input port), the underlying system call is used. Process’s standard error port is opened in this mode by default.

:line

This is valid only for output ports. The written data is buffered, but the buffer is flushed whenever a newline character is written. This is suitable for interactive output port. Process’s standard output port is opened in this mode by default. (Note that this differs from the line buffering mode of C stdio, which flushes the buffer as well when input is requested from the same file descriptor.)

:modest

This is valid only for input ports. This is almost the same as the mode :full, except that read-uvector may return less data than requested if the requested amount of data is not immediately available. (In the :full mode, read-uvector waits the entire data to be read). This is suitable for the port connected to a pipe or network.

:element-type

This argument specifies the type of the file.

:character

The file is opened in "character" (or "text") mode.

:binary

The file is opened in "binary" mode.

In the current version, this argument is ignored and all files are opened in binary mode. It doesn’t make difference in the Unix platforms.

:encoding

This argument specifies character encoding of the file. The argument is a string or a symbol that names a character encoding scheme (CES).

For open-input-file, it can be a wildcard CES (e.g. *jp) to guess the file’s encoding heuristically (See section Autodetecting the encoding scheme), or #t, in which case we assume the input file itself has magic encoding comment and use open-coding-aware-port (See section Coding-aware ports).

If this argument is given, Gauche automatically loads gauche.charconv module and converts the input/output characters as you read to or write from the port. See Supported character encoding schemes, for the details of character encoding schemes.

:conversion-buffer-size

This argument may be used with the encoding argument to specify the buffer size of character encoding conversion. It is passed as a buffer-size argument of the conversion port constructors (See section Conversion ports).

Usually you don’t need to give this argument; but if you need to guess the input file encoding, larger buffer size may work better since guessing routine can have more data before deciding the encoding.

By combination of if-exists and if-does-not-exist flags, you can implement various actions:

 
(open-output-file "foo" :if-exists :error)
 ⇒ ;opens "foo" exclusively, or error

(open-output-file "foo" :if-exists #f)
 ⇒ ;opens "foo" exclusively, or returns #f

(open-output-file "foo" :if-exists :append
                        :if-does-not-exist :error)
 ⇒ ;opens "foo" for append only if it already exists

To check the existence of a file without opening it, use sys-access or file-exists? (See section File stats).

Note for portability: Some Scheme implementations (e.g. STk) allows you to specify a command to filename and reads from, or writes to, the subprocess standard input/output. Some other scripting languages (e.g. Perl) have similar features. In Gauche, open-input-file and open-output-file strictly operates on files (what the underlying OS thinks as files). However, you can use “process ports” to invoke other command in a subprocess and to communicate it. See section Process ports, for details.

Function: call-with-input-file string proc :key if-does-not-exist buffering element-type encoding conversion-buffer-size
Function: call-with-output-file string proc :key if-does-not-exist if-exists buffering element-type encoding conversion-buffer-size

[R7RS+] Opens a file specified by string for input/output, and call proc with one argument, the file port. When proc returns, or an error is signaled from proc that is not captured within proc, the file is closed.

The keyword arguments have the same meanings of open-input-file and open-output-file’s. Note that if you specify #f to if-exists and/or if-does-not-exist, proc may receive #f instead of a port object when the file is not opened.

Returns the value(s) proc returned.

Function: with-input-from-file string thunk :key if-does-not-exist buffering element-type encoding conversion-buffer-size
Function: with-output-to-file string thunk :key if-does-not-exist if-exists buffering element-type encoding conversion-buffer-size

[R7RS] Opens a file specified by string for input or output and makes the opened port as the current input or output port, then calls thunk. The file is closed when thunk returns or an error is signaled from thunk that is not captured within thunk.

Returns the value(s) thunk returns.

The keyword arguments have the same meanings of open-input-file and open-output-file’s, except that when #f is given to if-exists and if-does-not-exist and the opening port is failed, thunk isn’t called at all and #f is returned as the result of with-input-from-file and with-output-to-file.

Notes on semantics of closing file ports: R7RS states, in the description of call-with-port et al., that "If proc does not return, then the port will not be closed automatically unless it is possible to prove that the port will never again be used for read or write operation."

Gauche’s implementation slightly misses this criteria; the mere fact that an uncaptured error is thrown in proc does not prove the port will never be used. Nevertheless, it is very difficult to think the situation that you can do meaningful operation on the port after such an error is signaled; you’d have no idea what kind of state the port is in. In practical programs, you should capture error explicitly inside proc if you still want to do some meaningful operation with the port.

Note that if a continuation captured outside call-with-input-file et al. is invoked inside proc, the port is not closed. It is possible that the control returns later into the proc, if a continuation is captured in it (e.g. coroutines). The low-level exceptions (See section Handling exceptions) also doesn’t ensure closing the port.

Function: open-input-fd-port fd :key buffering name owner?
Function: open-output-fd-port fd :key buffering name owner?

Creates and returns an input or output port on top of the given file descriptor. Buffering specifies the buffering mode as described in open-input-file entry above; the default is :full. Name is used for the created port’s name and returned by port-name. A boolean flag owner? specifies whether fd should be closed when the port is closed.

Function: port-fd-dup! toport fromport

Interface to the system call dup2(2). Atomically closes the file descriptor associated to toport, creates a copy of the file descriptor associated to fromport, and sets the new file descriptor to toport. Both toport and fromport must be file ports. Before the original file descriptor of toport is closed, any buffered output (when toport is an output port) is flushed, and any buffered input (when toport is an input port) is discarded.

‘Copy’ means that, even the two file descriptors differ in their values, they both point to the same system’s open file table entry. For example they share the current file position; after port-fd-dup!, if you call port-seek on fromport, the change is also visible from toport, and vice versa. Note that this ’sharing’ is in the system-level; if either toport or fromport is buffered, the buffered contents are not shared.

This procedure is mainly intended for programs that needs to control open file descriptos explicitly; e.g. a daemon process would want to redirect its I/O to a harmless device such as ‘/dev/null’, and a shell process would want to set up file descriptors before executing the child process.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.5 String ports

String ports are the ports that you can read from or write to memory.

Function: open-input-string string

[R7RS][SRFI-6] Creates an input string port that has the content string. This is a more efficient way to access a string in order rather than using string-ref with incremental index.

 
(define p (open-input-string "foo x"))
(read p) ⇒ foo
(read-char p) ⇒ #\space
(read-char p) ⇒ #\x
(read-char p) ⇒ #<eof>
(read-char p) ⇒ #<eof>
Function: get-remaining-input-string port

Port must be an input string port. Returns the remaining content of the input port. The internal pointer of port isn’t moved, so the subsequent read from port isn’t affected. If port has already reached to EOF, a null string is returned.

 
(define p (open-input-string "abc\ndef"))
(read-line p)                  ⇒ "abc"
(get-remaining-input-string p) ⇒ "def"
(read-char p)                  ⇒ #\d
(read-line p)                  ⇒ "ef"
(get-remaining-input-string p) ⇒ ""
Function: open-output-string

[R7RS][SRFI-6] Creates an output string port. Anything written to the port is accumulated in the buffer, and can be obtained as a string by get-output-string. This is a far more efficient way to construct a string sequentially than pre-allocate a string and fill it with string-set!.

Function: get-output-string port

[R7RS][SRFI-6] Takes an output string port port and returns a string that has been accumulated to port so far. If a byte data has been written to the port, this function re-scans the buffer to see if it can consist a complete string; if not, an incomplete string is returned.

This doesn’t affect the port’s operation, so you can keep accumulating content to port after calling get-output-string.

Function: call-with-input-string string proc
Function: call-with-output-string proc
Function: with-input-from-string string thunk
Function: with-output-to-string thunk

These utility functions are trivially defined as follows. The interface is parallel to the file port version.

 
(define (call-with-output-string proc)
  (let ((out (open-output-string)))
    (proc out)
    (get-output-string out)))

(define (call-with-input-string str proc)
  (let ((in (open-input-string str)))
    (proc in)))

(define (with-output-to-string thunk)
  (let ((out (open-output-string)))
    (with-output-to-port out thunk)
    (get-output-string out)))

(define (with-input-from-string str thunk)
  (with-input-from-port (open-input-string str) thunk))
Function: call-with-string-io str proc
Function: with-string-io str thunk
 
(define (call-with-string-io str proc)
  (let ((out (open-output-string))
        (in  (open-input-string str)))
    (proc in out)
    (get-output-string out)))

(define (with-string-io str thunk)
  (with-output-to-string
    (lambda ()
      (with-input-from-string str
        thunk))))
Function: write-to-string obj :optional writer
Function: read-from-string string :optional start end

These convenience functions cover common idioms using string ports.

 
(write-to-string obj writer)
  ≡
  (with-output-to-string (lambda () (writer obj)))

(read-from-string string)
  ≡
  (with-input-from-string string read)

The default value of writer is the procedure write. The default values of start and end is 0 and the length of string.

Portability note: Common Lisp has these functions, with different optional arguments. STk has read-from-string without optional argument.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.6 Coding-aware ports

A coding-aware port is a special type of procedural input port that is used by load to read a program source. The port recognizes the magic comment to specify the character encoding of the program source, such as ;; -*- coding: utf-8 -*-, and makes an appropriate character encoding conversion. See Multibyte scripts for the details of coding magic comment.

Function: open-coding-aware-port iport

Takes an input port and returns an input coding aware port, which basically just pass through the data from iport to its reader. However, if a magic comment appears within the first two lines of data from iport, the coding aware port applies the necessary character encoding conversion to the rest of the data as they are read.

The passed port, iport, is "owned" by the created coding-aware port. That is, when the coding-aware port is closed, iport is also closed. The content read from iport is buffered in the coding-aware port, so other code shouldn’t read from iport.

By default, Gauche’s load uses a coding aware port to read the program source, so that the coding magic comment works for the Gauche source programs (see Loading Scheme file). However, since the mechanism itself is independent from load, you can use this port for other purposes; it is particularly useful to write a function that processes Scheme source programs which may have the coding magic comment.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.7 Input

For the input-related procedures, the optional iport argument must be an input port, and when omitted, the current input port is assumed.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.7.1 Reading data

Function: read :optional iport

[R7RS] Reads an S-expression from iport and returns it. Gauche recognizes the lexical structure specified in R7RS, and some additional lexical structures listed in Lexical structure.

If iport has already reached to the end of file, an eof object is returned.

The procedure reads up to the last character that consists the S-expression, and leaves the rest in the port. It’s not like CommonLisp’s read, which consumes whitespaces after S-expression by default.

Function: read-with-shared-structure :optional iport
Function: read/ss :optional iport

[SRFI-38] These procedures are defined in srfi-38 to recognize shared substructure notation (#n=, #n#). Gauche’s builtin read recognizes the srfi-38 notation, so these are just synonyms to read; these are only provided for srfi-38 compatibility.

Function: read-char :optional iport

[R7RS] Reads one character from iport and returns it. If iport has already reached to the end, returns an eof object. If the byte stream in iport doesn’t consist a valid character, the behavior is undefined. (In future, a port will have a option to deal with invalid characters).

Function: peek-char :optional iport

[R7RS] Reads one character in iport and returns it, keeping the character in the port. If the byte stream in iport doesn’t consist a valid character, the behavior is undefined. (In future, a port will have a option to deal with invalid characters).

Function: read-byte :optional iport

Reads one byte from an input port iport, and returns it as an integer in the range between 0 and 255. If iport has already reached EOF, an eof object is returned.

This is called read-u8 in R7RS.

Function: peek-byte :optional iport

Peeks one byte at the head of an input port iport, and returns it as an integer in the range between 0 and 255. If iport has already reached EOF, an eof object is returned.

This is called peek-u8 in R7RS.

Function: read-line :optional iport allow-byte-string?

[R7RS+] Reads one line (a sequence of characters terminated by newline or EOF) and returns a string. The terminating newline is not included. This function recognizes popular line terminators (LF only, CRLF, and CR only). If iport has already reached EOF, an eof object is returned.

If a byte sequence is read from iport which doesn’t constitute a valid character in the native encoding, read-line signals an error by default. However, if a true value is given to the argument allow-byte-string?, read-line returns a byte string (incomplete string) in such case, without reporting an error. It is particularly useful if you read from a source whose character encoding is not yet known; for example, to read XML document, you need to check the first line to see if there is a charset parameter so that you can then use an appropriate character conversion port. This optional argument is Gauche’s extention to R7RS.

Function: read-string nchars :optional iport

[R7RS] Read nchars characters, or as many characters as available before EOF, and returns a string that consists of those characters. If the input has already reached EOF, an eof object is returned.

Function: read-block nbytes :optional iport

This procedure is deprecated - use read-uvector instead (See section Uvector block I/O).

Reads nbytes bytes from iport, and returns an incomplete string consisted by those bytes. The size of returned string may shorter than nbytes when iport doesn’t have enough bytes to fill. If nbytes is zero, a null string is always returned.

If iport has already reached EOF, an eof object is returned.

If iport is a file port, the behavior of read-block differs by the buffering mode of the port (See section File ports, for the detail explanation of buffering modes).

If you want to write a chunk of bytes to a port, you can use either display if the data is in string, or write-uvector in gauche.uvector (See section Uvector block I/O) if the data is in uniform vector.

Function: eof-object

[R7RS] Returns an EOF object.

Function: eof-object? obj

[R7RS] Returns true if obj is an EOF object.

Function: char-ready? :optional port

[R7RS] If a character is ready to be read from port, returns #t.

For now, this procedure actually checks only if next byte is immediately available from port. If the next byte is a part of a multibyte character, the attempt to read the whole character may block, even if char-ready? returns #t on the port. (It is unlikely to happen in usual situation, but theoretically it can. If you concern, use read-uvector to read the input as a byte sequence, then use input string port to read characters.)

Function: byte-ready? :optional port

If one byte (octet) is ready to be read from port, returns #t.

In R7RS, this procedure is called u8-ready?


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.7.2 Reader lexical mode

Parameter: reader-lexical-mode

Get/set the reader lexcial mode. Changing this parameter switches behavior of the reader concerning some corner cases of the lexical syntax, where legacy Gauche syntax and R7RS syntax aren’t compatible.

In general, you don’t need to change this parameter directly. The lexical syntax matters at the read-time, while changing this parameter happens at the execution-time; unless you know the exact timing when each phase occurs, you might not get what you want.

The hash-bang directive #!gauche-legacy and #!r7rs indirectly affects this parameter; the first one sets the reader mode to legacy, and the second one to strict-r7.

The command-line argument -fwarn-legacy sets the default reader mode to warn-legacy.

Change to this parameter is delimited within load; once load is done, the value of this parameter is reset to the value when load is started.

The parameter takes one of the following symbols as a value.

permissive

This is the default mode. It tries to find a reasonable compromise between two syntax.

In string literals, hex escape sequence is first interpreted as R7RS lexical syntax. If the syntax doesn’t conform R7RS hex escape, it is interpreted as legacy Gauche hex escape syntax. For example, "\x30;a" is read as "0a", for the hex escape sequence including the terminating semicolon is read as R7RS hex escape sequence. It also reads "\x30a" as "0a", for the legacy Gauche hex escape always takes two hexadecimal digits without the terminator. With this mode, you can use R7RS hex escape syntax for the new code, and yet almost all legacy Gauche code can be read without a problem. However, if the legacy code has a semicolon followed by hex escape, it is interpreted as R7RS syntax and the incompatibility arises.

Identifiers beginning with a colon are read as keywords. For the strict R7RS behavior, you need to use vertical-bar escaping (e.g. |:foo|) to have symbols beginning with colon. Note that this incompatibility will be addressed in the future version of Gauche, when keywords become a subtype of symbols.

strict-r7

Strict R7RS compatible mode. When the reader encounters the hash-bang directive #!r7rs, the rest of file is read with this mode.

In this mode, Gauche’s extended lexical syntax will raise an error. Identifiers beginning with a colon are read as symbols.

Use this mode to read R7RS code with maximum compatibility.

legacy

The reader works as the legacy Gauche (version 0.9.3.3 and before). When the reader encounters the hash-bang directive #!gauche-legacy, the rest of file is read with this mode.

This only matters when you want to read two-digit hex escape followed by semicolon as a character plus a semicolon, e.g. "\x30;a" as "0;a" instead of "0a". We expect such a sequence rarely appears in the code, but if you dump a data in a string literal format, you may have such sequence (especially in incomplete string literals).

warn-legacy

The reader works as the permissive mode, but warns if it reads legacy hex-escape syntax. This mode is default when -fwarn-legacy command-line argument is given to gosh.

This is useful to check if you have any incompatible escape sequence in your code.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.7.3 Read-time constructor

Read-time constructor, defined in SRFI-10, provides an easy way to create an external representation of user-defined structures.

Reader Syntax: #,(tag arg …)

[SRFI-10] Gauche maintains a global table that associates a tag (symbol) to a constructor procedure.

When the reader encounters this syntax, it reads arg …, finds a reader constructor associated with tag, and calls the constructor with arg … as arguments, then inserts the value returned by the constructor as the result of reading the syntax.

Note that this syntax is processed inside the reader—the evaluator doesn’t see any of args, but only sees the object the reader returns.

Function: define-reader-ctor tag procedure

[SRFI-10] Associates a reader constructor procedure with tag.

Examples:

 
(define-reader-ctor 'pi (lambda () (* (atan 1) 4)))

#,(pi) ⇒ 3.141592653589793

'(#,(pi)) ⇒ (3.141592653589793)

(define-reader-ctor 'hash
  (lambda (type . pairs)
    (let ((tab (make-hash-table type)))
      (for-each (lambda (pair)
                  (hash-table-put! tab (car pair) (cdr pair)))
                pairs)
      tab)))

(define table
 #,(hash eq? (foo . bar) (duh . dah) (bum . bom)))

table ⇒ #<hash-table eq? 0x80f9398>
(hash-table-get table 'duh) ⇒ dah

Combined with write-object method (See section Output), it is easy to make a user-defined class written in the form it can be read back:

 
(define-class <point> ()
  ((x :init-value 0 :init-keyword :x)
   (y :init-value 0 :init-keyword :y)))

(define-method write-object ((p <point>) out)
  (format out "#,(<point> ~s ~s)" (ref p 'x) (ref p 'y)))

(define-reader-ctor '<point>
  (lambda (x y) (make <point> :x x :y y)))

NOTE: The extent of the effect of define-reader-ctor is not specified in SRFI-10, and might pose a compatibility problem among implementations that support SRFI-10. (In fact, the very existence of define-reader-ctor is up to an implementation choice.)

In Gauche, at least for the time being, define-reader-ctor take effects as soon as the form is compiled and evaluated. Since Gauche compiles and evaluates each toplevel form in order, tag specified in define-reader-ctor can be used immediately after that. However, it doesn’t work if the call of define-reader-ctor and the use of tag is enclosed in a begin form, for the entire begin form is compiled at once before being evaluated.

Other implementations may require to read the entire file before making its define-reader-ctor call effective. If so, it effectively prevents one from using define-reader-ctor and the defined tag in the same file. It is desirable to separate the call of define-reader-ctor and the use of tag in the different files if possible.

Another issue about the current define-reader-ctor is that it modifies the global table of Gauche system, hence it is not modular. The code written by different people might use the same tags, and yield an unexpected result. In future versions, Gauche may have some way to encapsulate the scope of tag, although the author doesn’t have clear idea yet.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.7.4 Input utility functions

Function: port->string port
Function: port->list reader port
Function: port->string-list port
Function: port->sexp-list port

Generally useful input procedures. The API is taken from scsh and STk.

port->string reads port until EOF and returns the accumulated data as a string.

port->list applies reader on port repeatedly, until reader returns an EOF, then returns the list of objects reader returned.

port->string-list is a port->list specialized by read-line, and port->sexp-list is a port->list specialized by read.

If the input contains an octet sequence that’s not form a valid character in the Gauche’s native character encoding, port->string and port->string-list may return incomplete string(s).

Function: port-fold fn knil reader
Function: port-fold-right fn knil reader
Function: port-for-each fn reader
Function: port-map fn reader

Convenient iterators over the input read by reader.

Since these procedures are not really about ports, they are superseded by generator-fold, generator-fold-right, generator-for-each and generator-map, respectively. See section Folding generated values, for the details.

We provide these only for the backward compatibility.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.8 Output


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.8.1 Layers of output routines

Gauche has quite a few output procedures which may confuse newcomers. The following table will help to understand how to use those procedures:

Object writers

Procedures that write out Scheme objects. Although there exist more low-level procedures, these are regadarded as a basic layer of output routines, since it works on a generic Scheme object as a single unit. They come in two flavors:

High-level formatting output

To produce output in specific width, alignment, etc: format. This corresponds to C’s printf.

Low-level type-specific output

Procedures that deal with raw data.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.8.2 Object output

For the following procedures, the optional port argument must be an output port, and when omitted, the current output port is assumed.

Function: write obj :optional port
Function: write-shared obj :optional port
Function: write-simple obj :optional port

[R7RS] The write-family procedures are used to write an external representation of Scheme object, which can be read back by read procedure. The three procedures differ in a way to handle shared or circular structures.

Write is circular-safe; that is, it uses datum label notation (#n= and #n#) to show cycles. It does not use datum label notation for non-circular structures that are merely shared (see the second example).

 
(let1 x (list 1)
  (set-cdr! x x)   ; create a cycle
  (write x))
 ⇒ shows #0=(1 . #0#)

(let1 x (list 1)
  (write (list x x)))
 ⇒ shows ((1) (1))

Write-shared is also circular-safe, and it also shows shared structures using datum labels. Use this if you need to preserve topology of a graph structure.

 
(let1 x (list 1)
  (write (list x x)))
 ⇒ shows (#0=(1) #0#)

Finally, write-simple writes out the object recursively without taking account of shared or circular structures. This is fast, for it doesn’t need to scan the structure before actually writing out. However, it won’t stop when a circular structure is passed.

When these procedures encounter an object of a user-defined class, they call the generic function write-object.

Historical context: Write has been in Scheme standards, but handling of circular structures hasn’t been specified until R7RS. In fact, until Gauche 0.9.4, write diverged for circular structures. SRFI-38 introduced the datum-label notation and write-with-shared-structure and write/ss procedures to produce such notation, and Gauche supported it. R7RS clarified this issue, and Gauche 0.9.4 followed.

Function: write-with-shared-structure obj :optional port
Function: write/ss obj :optional port
Function: write* obj :optional port

[SRFI-38] These are aliases of write-shared above.

Gauche has been used the name write* for long, which is taken from STklos. SRFI-38 defines write-with-shared-structure and write/ss. These names are kept for the backward compatibility. New code should use write-shared.

Function: display obj :optional port

[R7RS] Produces a human-friendly representation of an object obj to the output port.

If obj contains cycles, display uses datum-label notation.

When display encounters an object of a user-defined class, it calls the generic function write-object.

 
(display "\"Mahalo\", he said.")
 ⇒ shows "Mahalo", he said.

(let ((x (list "imua")))
  (set-cdr! x x)
  (display x))
 ⇒ shows #0=(imua . #0#)
Function: print expr …

Displays exprs (using display) to the current output port, then writes a newline.

Method: write-object (obj <object>) port

You can customize how the object is printed out by this method.

Function: newline :optional port

[R7RS] Writes a newline character to port. This is equivalent to (write-char #\newline port), (display "\n" port). It is kept for a historical reason.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.8.3 Formatting output

Function: format port string arg …
Function: format string arg …

[SRFI-28+] Format arg … according to string. This function is a subset of CommonLisp’s format function, with a bit of extension. It is also a superset of SRFi-28, Basic format strings (SRFI-28).

port specifies the destination; if it is an output port, the formatted result is written to it; if it is #t, the result is written to the current output port; if it is #f, the formatted result is returned as a string. Port can be omitted, as SRFI-28 format; it has the same effects as giving #f to the port.

string is a string that contains format directives. A format directive is a character sequence begins with tilda, ‘~’, and ends with some specific characters. A format directive takes the corresponding arg and formats it. The rest of string is copied to the output as is.

 
(format #f "the answer is ~s" 42)
  ⇒ "the answer is 42"

The format directive can take one or more parameters, separated by comma characters. A parameter may be an integer or a character; if it is a character, it should be preceded by a quote character. Parameter can be omitted, in such case the system default value is used. The interpretation of the parameters depends on the format directive.

Furthermore, a format directive can take two additional flags: atmark ‘@’ and colon ‘:’. One or both of them may modify the behavior of the format directive. Those flags must be placed immediately before the directive character.

If a character ‘v’ or ‘V’ is in the place of the parameter, the value of the parameter is taken from the format’s argument. The argument must be either an integer, a character, or #f (indicating that the parameter is effectively omitted).

Some examples:

~10,2s

A format directive ~s, with two parameters, 10 and 2.

~12,,,'*A

A format directive ~a, with 12 for the first parameter and a character ‘*’ for the fourth parameter. The second and third parameters are omitted.

~10@d

A format directive ~d, with 10 for the first parameter and ‘@’ flag.

~v,vx

A format directive ~x, whose first and second parameter will be taken from the arguments.

The following is a complete list of the supported format directives. Either upper case or lower case character can be used for the format directive; usually they have no distinction, except noted.

~mincol,colinc,minpad,padchar,maxcolA

Ascii output. The corresponding argument is printed by display. If an integer mincol is given, it specifies the minimum number of characters to be output; if the formatted result is shorter than mincol, a whitespace is padded to the right (i.e. the result is left justified).

The colinc, minpad and padchar parameters control, if given, further padding. A character padchar replaces the padding character for the whitespace. If an integer minpad is given and greater than 0, at least minpad padding character is used, regardless of the resulting width. If an integer colinc is given, the padding character is added (after minpad) in chunk of colinc characters, until the entire width exceeds mincol.

If atmark-flag is given, the format result is right justified, i.e. padding is added to the left.

The maxcol parameter, if given, limits the maximum number of characters to be written. If the length of formatted string exceeds maxcol, only maxcol characters are written. If colon-flag is given as well and the length of formatted string exceeds maxcol, maxcol - 4 characters are written and a string “ ...” is attached after it.

 
(format #f "|~a|" "oops")
  ⇒ "|oops|"
(format #f "|~10a|" "oops")
  ⇒ "|oops      |"
(format #f "|~10@a|" "oops")
  ⇒ "|      oops|"
(format #f "|~10,,,'*@a|" "oops")
  ⇒ "|******oops|"

(format #f "|~,,,,10a|" '(abc def ghi jkl))
  ⇒ "|(abc def gh|"
(format #f "|~,,,,10:a|" '(abc def ghi jkl))
  ⇒ "|(abc de ...|"
~mincol,colinc,minpad,padchar,maxcolS

S-expression output. The corresponding argument is printed by write. The semantics of parameters and flags are the same as ~A directive.

 
(format #f "|~s|" "oops")
  ⇒ "|\"oops\"|"
(format #f "|~10s|" "oops")
  ⇒ "|\"oops\"    |"
(format #f "|~10@s|" "oops")
  ⇒ "|    \"oops\"|"
(format #f "|~10,,,'*@s|" "oops")
  ⇒ "|****\"oops\"|"
~mincol,padchar,commachar,intervalD

Decimal output. The argument is formatted as an decimal integer. If the argument is not an integer, all parameters are ignored (after processing ‘v’ parameters) and it is formatted by ~A directive.

If an integer parameter mincol is given, it specifies minimum width of the formatted result; if the result is shorter than it, padchar is padded on the left (i.e. the result is right justified). The default of padchar is a whitespace.

 
(format #f "|~d|" 12345)
  ⇒ "|12345|"
(format #f "|~10d|" 12345)
  ⇒ "|     12345|"
(format #f "|~10,'0d|" 12345)
  ⇒ "|0000012345|"

If atmark-flag is given, the sign ‘+’ is printed for the positive argument.

If colon-flag is given, every interval-th digit of the result is grouped and commachar is inserted between them. The default of commachar is ‘,’, and the default of interval is 3.

 
(format #f "|~:d|" 12345)
  ⇒ "|12,345|"
(format #f "|~,,'_,4:d|" -12345678)
  ⇒ "|-1234_5678|"
~mincol,padchar,commachar,intervalB

Binary output. The argument is formatted as a binary integer. The semantics of parameters and flags are the same as the ~D directive.

~mincol,padchar,commachar,intervalO

Octal output. The argument is formatted as an octal integer. The semantics of parameters and flags are the same as the ~D directive.

~mincol,padchar,commachar,intervalX
~mincol,padchar,commachar,intervalx

Hexadecimal output. The argument is formatted as a hexadecimal integer. If ‘X’ is used, upper case alphabets are used for the digits larger than 10. If ‘x’ is used, lower case alphabets are used. The semantics of parameters and flags are the same as the ~D directive.

 
(format #f "~8,'0x" 259847592)
  ⇒ "0f7cf5a8"
(format #f "~8,'0X" 259847592)
  ⇒ "0F7CF5A8"
~count*

Moves the argument counter count times forward, effectively skips next count arguments. The default value of count is 1, hence skip the next argument. If a colon-flag is given, moves the argument counter backwards, e.g. ~:* makes the next directive to process last argument again. If an atmark-flag is given, count specifies absolute position of the arguments, starting from 0.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.22.8.4 Low-level output

Function: write-char char :optional port

[R7RS] Write a single character char to the output port port.

Function: write-byte byte :optional port

Write a byte byte to the port. byte must be an exact integer in range between 0 and 255.

This procedure is called write-u8 in R7RS.

Function: flush :optional port
Function: flush-all-ports

Output the buffered data in port, or all ports, respectively.

The function "flush" is called in variety of ways on the various Scheme implementations: force-output (Scsh, SCM), flush-output (Gambit), or flush-output-port (Bigloo). The name flush is taken from STk and STklos. R7RS calls this flush-output-port


[ < ] [ > ]   [ << ] [ Up ] [ >> ]

This document was generated on July 19, 2014 using texi2html 1.82.