Input and output (Gauche Users’ Reference)

6.21 Input and Output

6.21.1 Ports

Builtin Class: <port> ¶

A port class. A port is Scheme’s way of abstraction of I/O channel. Gauche extends a port in number of ways so that it can be used in wide range of applications.

Textual and binary I/O

R7RS defines textual and binary ports. In Gauche, most ports can mix both text I/O and binary I/O. It is cleaner to think the two is distinct, for they are sources/sinks of different types of objects and you don’t need to mix textual and binary I/O.

In practice, however, a port is often a tap to an untyped pool of bytes and you may want to decide interpret it later. One example is the standard I/O; in Unix-like environment, it’s up to the program to use pre-opened ports for textual or binary I/O. R7RS defines the initial ports for current-input-port etc. are textual ports; in Gauche, you can use either way.

Conversion

Some ports can be used to convert a data stream from one format to another; one of such applications is character code conversion ports, provided by gauche.charconv module (see gauche.charconv - Character Code Conversion, for details). SRFI-181 also provides this feature (see Transcoded ports).

Extra features

There are also a ports with special functionality. A coding-aware port (see Coding-aware ports) recognizes a special "magic comment" in the file to know which character encoding the file is written. Virtual ports (see gauche.vport - Virtual ports) allows you to program the behavior of the port in Scheme. It is also provided by SRFI-181 (see srfi.181 - Custom ports).

Ports can be combined to provide communication channel. See control.plumbing - Plumbing ports, for utilities to do so.

6.21.2 Port and threads

The builtin port operations locks the port internally, so that port access from multiple threads will be serialized. (It is required by SRFI-18, BTW). Here, "builtin port operations" are the port access functions that takes a port and does some I/O or query on it, such as read/write, read-char/write-char, port->string, etc. Note that call-with-* and with-* procedures do not lock the port during calling the given procedures, since the procedure may pass the reference of the port to the other thread, and Gauche wouldn’t know if that’s the case.

This means you don’t need to be too paranoia to worry about ports under multithreaded environment. However, keep it in mind that this locking mechanism is meant to be a safety net from breaking the port’s internal state, and not to be a general mutex mechanism. It assumes port accesses rarely conflict, and uses spin lock to reduce the overhead of majority cases. If you know there will be more than one thread accessing the same port, you should use explicit mutex to avoid conflicts.

Function: with-port-locking port thunk ¶

Executes thunk, while making the calling thread hold the exclusive lock of port during the dynamic extent of thunk.

Calls of the builtin port functions during the lock is held would bypass mutex operations and yield better performance.

Note that the lock is held during the dynamic extent of thunk; so, if thunk invokes a continuation captured outside of with-port-locking, the lock is released. If the continuation captured within thunk is invoked afterwards, the lock is re-acquired.

With-port-locking may be nested. The lock is valid during the outermost call of with-port-locking.

Note that this procedure uses the port’s built-in lock mechanism which uses busy wait when port access conflicts. It should be used only for avoiding fine-grain lock overhead; use explicit mutex if you know there will be conflicts.

6.21.3 Common port operations

Function: port? obj ¶
Function: input-port? obj ¶
Function: output-port? obj ¶: [R7RS base] Returns true if obj is a port, an input port and an output port, respectively. Port? is not listed in the R5RS standard procedures, but mentioned in the "Disjointness of Types" section.

Function: port-closed? port ¶: Returns true if obj is a port and it is already closed. A closed port can’t be reopened.

Parameter: current-input-port ¶

Parameter: current-output-port ¶

Parameter: current-error-port ¶

[R7RS base] Returns the current input, output and error output port, respectively.

R7RS defines that the initial values of these ports are textual ports. In Gauche, initial ports can handle both textual and binary I/O.

Values of the current ports can be temporarily changed by parameterize (see Parameters), though you might want the convenience procedures such as with-output-to-string or with-input-from-file in typical cases.

(let1 os (open-output-string)
  (parameterize ((current-output-port os))
    (display "foo"))
  (get-output-string os))
 ⇒ "foo"

Parameter: current-trace-port ¶

A paremeter that holds an output port which debug trace goes to. The initial value is the same as the initial value of current-error-port.

The debug-print feature (see Debugging aid) and the macro trace feature (see Tracing macro expansion) uses this port.

Parameter: standard-input-port ¶

Parameter: standard-output-port ¶

Parameter: standard-error-port ¶

Returns standard i/o ports at the time the program started. These ports are the default values of current-input-port, current-output-port and current-error-port, respectively.

You can also change value of these procedures by parameterize, but note that (1) current-*-ports are initialized before the program execution, so changing values of standard-*-port won’t affect them, and (2) changing values these procedures only affect Scheme-world, and does not change system-level stdio file descriptors low-level libraries referring.

Function: with-input-from-port port thunk ¶
Function: with-output-to-port port thunk ¶
Function: with-error-to-port port thunk ¶: Calls thunk. During evaluation of thunk, the current input port, current output port and current error port are set to port, respectively. Note that port won’t be closed after thunk is executed.

Function: with-ports iport oport eport thunk ¶

Does the above three functions at once. Calls thunk while the current input, output, and error ports are set to iport, oport, and eport, respectively. You may pass #f to any port argument(s) if you don’t need to alter the port(s).

Note that port won’t be closed after thunk is executed. (Unfortunately, recent Scheme standards added a similar named procedure, call-with-port, which does close the port. See below.)

Function: close-port port ¶

Function: close-input-port port ¶

Function: close-output-port port ¶

[R7RS base] Closes the port. Close-port works both input and output ports, while close-input-port and close-output-port work only for the respective ports and throws an error if another type of port is passed.

Theoretically, only close-port would suffice; having those three is merely for historical reason. R5RS has close-input-port and close-output-port; R6RS and R7RS support all three.

Function: call-with-port port proc ¶: [R7RS base] Calls proc with one argument, port. After proc returns, or it throws an uncaptured error, port is closed. Value(s) returned from proc will be the return value(s) of call-with-port.

Function: port-type port ¶: Returns the type of port in one of the symbols file, string or proc.

Function: port-name port ¶: Returns the name of port. If the port is associated to a file, it is the name of the file. Otherwise, it is some description of the port.

Function: port-buffering port ¶

Function: (setter port-buffering) port buffering-mode ¶

If port is type of file port (i.e. (port-type port) returns file), these procedures gets and sets the port’s buffering mode. For input ports, the port buffering mode may be either one of :full, :modest or :none. For output ports, port-buffering, it may be one of :full, :line or :none. See File ports, for explanation of those modes.

If port-buffering is applied to ports other than file ports, it returns #f. If the setter of port-buffering is applied to ports other than file ports, it signals an error.

Function: port-link! input-port output-port ¶

Function: port-unlink! port ¶

An input port and an output port can be linked. After being linked, attempt to read from input-port causes flushing output-port’s buffer. A port cannot be linked to more than one port.

By default, standard input port is linked to standard output port.

The port-link! procedure links the given input port and output port. Both port must not be already linked, or an error is signaled.

The port-unlink! procedure removes the link between the given port and another port, if there’s a link. If the given port is not linked, it does nothing.

You can use link slot of the port to see if a port is linked or not, and if linked, to which port.

Function: port-current-line port ¶: Returns the current line count of port. This information is only available on file-based port, and as long as you’re doing sequential character I/O on it. Otherwise, this returns -1.

Function: port-file-number port :optional dup? ¶

Returns an integer file descriptor, if the port is associated to the system file I/O. Returns #f otherwise.

If a true value is passed to dup?, the procedure calls dup(2) and returns a duplicated file descriptor. In such case, the returned file descriptor can be closed independently with the port, and it is caller’s responsibility to close it (by sys-close) once it’s done.

Function: port-position port ¶

[SRFI-192] Returns the current position of port. For the input port, the current position is where the next data will be read from, and for the output port, it is where the next data will be written to. (Note that peek-char/peek-byte won’t move the current position).

If port is a simple file port or a string port, the position is represented as an integer byte offset from the top of the stream. If it is more complicated ports, such as the one doing the character encoding conversion, the port position may not be available, or returned object may not correspond to the byte offset. If port is a procedural (virtual) port (see gauche.vport - Virtual ports), the returned value can be an arbitrary Scheme object, it can only be valid to be used for set-port-position!.

An error is thrown if port doesn’t support port positions. You can use port-has-port-position? to check if the port supports port positions.

For portable code: SRFI-192 defines the following conditions.

Return value is an arbitrary Scheme object and you should treat it as an opaque object. In general, it is only safe to pass it to the set-port-position! on the same port.
If the port is a binary port and the returned position is a nonnegative exact integer, it represents the byte offset of the position.

Function: port-has-port-position? port ¶: [SRFI-192] Returns #t iff the port supports port-position procedure.

Function: set-port-position! port pos ¶

[SRFI-192] Sets the current position of port to pos. For the input port, the current position is where the next data will be read from, and for the output port, it is where the next data will be written to.

Interpretation of pos is up to the port, and generally it must be an object returned previously by port-position on the same port. If port is a simple file port (a port opened on file, without CES conversion) or an input string port, you may pass a nonnegative exact integer as the byte offset.

An error is signaled if the position is not settable, or pos is not acceptable as a position for port. You can check if position of port is settable by port-has-set-port-position!?.

Function: port-has-set-port-position!? port ¶: [SRFI-192] Returns #t iff the port supports set-port-position! procedure.

Function: port-seek port offset :optional whence ¶

(This procedure is deprecated. Use port-position and set-port-position! for the new code.

If the given port allows random access, this procedure sets the read/write pointer of the port according to the given offset and whence, then returns the updated offset (number of bytes from the beginning of the data). If port is not random-accessible, #f is returned. In the current version, file ports and input string ports are fully random-accessible. You can only query the current byte offset of output string ports.

Note that port position is represented by byte count, not character count.

It is allowed to seek after the data if port is an output file port. See POSIX lseek(2) document for details of the behavior. For input file port and input string port, you can’t seek after the data.

The whence argument must be a small integer that represents from where offset should be counted. The following constant values are defined.

SEEK_SET: Offset represents the byte count from the beginning of the data. This is the default behavior when whence is omitted.
SEEK_CUR: Offset represents the byte count relative to the current read/write pointer. If you pass 0 to offset, you can get the current port position without changing it.
SEEK_END: Offset represents the byte count relative to the end of the data.

Function: port-tell port ¶

Returns the current read/write pointer of port in byte count, if port is random-accessible. Returns #f otherwise. This is equivalent to the following call:

(port-seek port 0 SEEK_CUR)

Note on the names: Port-seek is called seek, file-position or input-port-position/ output-port-position on some implementations. Port-tell is called tell, ftell or set-file-position!. Some implementations have port-position for different functionality. CommonLisp has file-position, but it is not suitable for us since port need not be a file port. Seek and tell reflects POSIX name, and with Gauche naming convention we could use sys-seek and sys-tell; however, port deals with higher level of abstraction than system calls, so I dropped those names, and adopted new names.

Function: copy-port src dst :key (unit 0) (size #f) ¶

Copies data from an input port src to an output port dst, until eof is read from src.

The keyword argument unit may be zero, a positive exact integer, a symbol byte or a symbol char, to specify the unit of copying. If it is an integer, a buffer of the size (in case of zero, a system default size) is used to copy, using block I/O. Generally it is the fastest if you copy between normal files. If unit is a symbol byte, the copying is done byte by byte, using C-version of read-byte and write-byte. If unit is a symbol char, the copying is done character by character, using C-version of read-char and write-char.

If nonnegative integer is given to the keyword argument size, it specifies the maximum amount of data to be copied. If unit is a symbol char, size specifies the number of characters. Otherwise, size specifies the number of bytes.

Returns number of characters copied when unit is a symbol char. Otherwise, returns number of bytes copied.

6.21.4 File ports

6.21.4.1 Opening file ports

Function: open-input-file filename :key if-does-not-exist buffering element-type encoding conversion-buffer-size conversion-illegal-output ¶

Function: open-output-file filename :key if-does-not-exist if-exists buffering element-type encoding conversion-buffer-size conversion-illegal-output ¶

[R7RS+ file] Opens a file filename for input or output, and returns an input or output port associated with it, respectively.

The keyword arguments specify precise behavior.

:if-exists

This keyword argument can be specified only for open-output-file, and specifies the action when the filename already exists. One of the following value can be given.

:supersede: The existing file is truncated. This is the default behavior.
:append: The output data will be appended to the existing file.
:overwrite: The output data will overwrite the existing content. If the output data is shorter than the existing file, the rest of existing file remains.
:error: An error is signaled.
#f: No action is taken, and the function returns #f.

:if-does-not-exist

This keyword argument specifies the action when filename does not exist.

:error: An error is signaled. This is the default behavior of open-input-file.
:create: A file is created. This is the default behavior of open-output-file. The check of file existence and creation is done atomically; you can exclusively create the file by specifying :error or #f to if-exists, along this option. You can’t specify this value for open-input-file.
#f: No action is taken, and the function returns #f.

:buffering

This argument specifies the buffering mode. The following values are allowed. The port’s buffering mode can be get/set by port-buffering. (see Common port operations).

:full: Buffer the data as much as possible. This is the default mode.
:none: No buffering is done. Every time the data is written (to an output port) or read (from an input port), the underlying system call is used. Process’s standard error port is opened in this mode by default.
:line: This is valid only for output ports. The written data is buffered, but the buffer is flushed whenever a newline character is written. This is suitable for interactive output port. Process’s standard output port is opened in this mode by default. (Note that this differs from the line buffering mode of C stdio, which flushes the buffer as well when input is requested from the same file descriptor.)
:modest: This is valid only for input ports. This is almost the same as the mode :full, except that read-uvector may return less data than requested if the requested amount of data is not immediately available. (In the :full mode, read-uvector waits the entire data to be read). This is suitable for the port connected to a pipe or network.

:element-type

This argument specifies the type of the file.

:binary: The file is opened in "binary" mode. (This is the default)
:character: The file is opened in "character" (or "text") mode.

Note: This flag makes difference only on Windows-native platforms, and only affect the treatment of line terminators. In character mode, writing #\newline on the output causes CR + LF characters to be written, instead of just LF. And reading CR + LF sequence returns just #\newline.

On Unix, both mode are the same.

Note that Gauche doesn’t distinguish character (textual) port and binary port. So this flag really matters only on Windows line terminators.

:encoding

This argument specifies character encoding of the file. The argument is a string or a symbol that names a character encoding scheme (CES).

For open-input-file, it can be a wildcard CES (e.g. *jp) to guess the file’s encoding heuristically (see Autodetecting the encoding scheme), or #t, in which case we assume the input file itself has magic encoding comment and use open-coding-aware-port (see Coding-aware ports).

If this argument is given, Gauche automatically loads gauche.charconv module and converts the input/output characters as you read to or write from the port. See Supported character encoding schemes, for the details of character encoding schemes.

If this argument is omitted, the value of the parameter default-file-encoding is assumed for the file’s character encoding (see Character encoding of file I/O).

:conversion-buffer-size

This argument may be used with the encoding argument to specify the buffer size of character encoding conversion. It is passed as a buffer-size argument of the conversion port constructors (see Conversion ports).

Usually you don’t need to give this argument; but if you need to guess the input file encoding, larger buffer size may work better since guessing routine can have more data before deciding the encoding.

:conversion-illegal-output

This argument may be used with the encoding argument to specify the behavior when the source character can’t be mapped to the destination character. It must be either a symbol raise or a symbol replace, and is passed as illegal-output argument to the conversion port. See Conversion ports, for the details.

By combination of if-exists and if-does-not-exist flags, you can implement various actions:

(open-output-file "foo" :if-exists :error)
 ⇒ ;opens "foo" exclusively, or error

(open-output-file "foo" :if-exists #f)
 ⇒ ;opens "foo" exclusively, or returns #f

(open-output-file "foo" :if-exists :append
                        :if-does-not-exist :error)
 ⇒ ;opens "foo" for append only if it already exists

To check the existence of a file without opening it, use sys-access or file-exists? (see File stats).

Note for portability: Some Scheme implementations (e.g. STk) allows you to specify a command to filename and reads from, or writes to, the subprocess standard input/output. Some other scripting languages (e.g. Perl) have similar features. In Gauche, open-input-file and open-output-file strictly operates on files (what the underlying OS thinks as files). However, you can use “process ports” to invoke other command in a subprocess and to communicate it. See Process ports, for details.

Function: call-with-input-file string proc :key if-does-not-exist buffering element-type encoding conversion-buffer-size ¶

Function: call-with-output-file string proc :key if-does-not-exist if-exists buffering element-type encoding conversion-buffer-size ¶

[R7RS+ file] Opens a file specified by string for input/output, and call proc with one argument, the file port. When proc returns, or an error is signaled from proc that is not captured within proc, the file is closed.

The keyword arguments have the same meanings of open-input-file and open-output-file’s. Note that if you specify #f to if-exists and/or if-does-not-exist, proc may receive #f instead of a port object when the file is not opened.

Returns the value(s) proc returned.

Function: with-input-from-file string thunk :key if-does-not-exist buffering element-type encoding conversion-buffer-size ¶

Function: with-output-to-file string thunk :key if-does-not-exist if-exists buffering element-type encoding conversion-buffer-size ¶

[R7RS file] Opens a file specified by string for input or output and makes the opened port as the current input or output port, then calls thunk. The file is closed when thunk returns or an error is signaled from thunk that is not captured within thunk.

Returns the value(s) thunk returns.

The keyword arguments have the same meanings of open-input-file and open-output-file’s, except that when #f is given to if-exists and if-does-not-exist and the opening port is failed, thunk isn’t called at all and #f is returned as the result of with-input-from-file and with-output-to-file.

Notes on semantics of closing file ports: R7RS states, in the description of call-with-port et al., that "If proc does not return, then the port will not be closed automatically unless it is possible to prove that the port will never again be used for read or write operation."

Gauche’s implementation slightly misses this criteria; the mere fact that an uncaptured error is thrown in proc does not prove the port will never be used. Nevertheless, it is very difficult to think the situation that you can do meaningful operation on the port after such an error is signaled; you’d have no idea what kind of state the port is in. In practical programs, you should capture error explicitly inside proc if you still want to do some meaningful operation with the port.

Note that if a continuation captured outside call-with-input-file et al. is invoked inside proc, the port is not closed. It is possible that the control returns later into the proc, if a continuation is captured in it (e.g. coroutines). The low-level exceptions (see Low-level exception handling mechanism) also doesn’t ensure closing the port.

6.21.4.2 Character encoding of file I/O

Parameter: default-file-encoding ¶

A character encoding name assumed when you omit encoding keyword arguments in file port opening procedures, such as open-input-port, call-with-output-file, etc. The default value is utf-8.

In most circumstances, you don’t need to change the value of this parameter. These days, utf-8 became de facto in data exchange, so you need to specify encoding argument explicitly only when you know you’re dealing with non-utf8 data.

However, suppose you have exising system that assumes, say, latin-1 encoding, and you’ve been using Gauche with none CES encoding (which means Gauche just treats an octet as a character). Gauche uses utf-8 exclusively in its internal encoding since 1.0, and if you switch to it, you would need to check every file opening calls to make sure they have :encoding latin-1. That would be a lot of work.

You can change this parameter to latin-1 instead, and the exiting calls work as if :encoding latin-1 is attached unless it already has explicit :encoding.

Note that source code is loaded using coding aware ports (see Coding-aware ports), with the default encoding being Gauche’s native encoding. The value of this parameter doesn’t have effect. Use encoding magic comment to specify the encoding of source code.

6.21.4.3 File descriptor ports

Function: open-input-fd-port fd :key buffering name owner? ¶

Function: open-output-fd-port fd :key buffering name owner? ¶

Creates and returns an input or output port on top of the given file descriptor. Buffering specifies the buffering mode as described in open-input-file entry above; the default is :full. Name is used for the created port’s name and returned by port-name.

A boolean flag owner? specifies whether fd should be closed when the port is closed. If it is #f, closing port doesn’t close fd. If it s #t, closing port automatically closes fd. It can also be a symbol dup, in which case fd is duplicated (by dup(2) system call) and the resulting port owns the new fd which will be closed automatically–but fd itself remains open, and can be closed independently.

Function: port-fd-dup! toport fromport ¶

Interface to the system call dup2(2). Atomically closes the file descriptor associated to toport, creates a copy of the file descriptor associated to fromport, and sets the new file descriptor to toport. Both toport and fromport must be file ports. Before the original file descriptor of toport is closed, any buffered output (when toport is an output port) is flushed, and any buffered input (when toport is an input port) is discarded.

‘Copy’ means that, even the two file descriptors differ in their values, they both point to the same system’s open file table entry. For example they share the current file position; after port-fd-dup!, if you call port-seek on fromport, the change is also visible from toport, and vice versa. Note that this ’sharing’ is in the system-level; if either toport or fromport is buffered, the buffered contents are not shared.

This procedure is mainly intended for programs that needs to control open file descriptors explicitly; e.g. a daemon process would want to redirect its I/O to a harmless device such as /dev/null, and a shell process would want to set up file descriptors before executing the child process.

6.21.5 String ports

String ports are the ports that you can read from or write to memory.

Function: open-input-string string :key name ¶

[R7RS base][SRFI-6] Creates an input string port that has the content string. This is a more efficient way to access a string in order rather than using string-ref with incremental index.

(define p (open-input-string "foo x"))
(read p) ⇒ foo
(read-char p) ⇒ #\space
(read-char p) ⇒ #\x
(read-char p) ⇒ #<eof>
(read-char p) ⇒ #<eof>

The name keyword argument is a Gauche extension. By default, the created port is named as (input string port). It is mainly used for debugging. You can specify alternative name with this argument. As Gauche’s convention, file ports has the source file path as its name, so port names for debugging information should be parenthesized not to be taken as pathnames.

gosh> (open-input-string "")
#<iport (input string port) 0x215c0c0>
gosh> (open-input-string "" :name "(user input)")
#<iport (user input) 0x22a4e40>

Function: get-remaining-input-string port ¶

Port must be an input string port. Returns the remaining content of the input port. The internal pointer of port isn’t moved, so the subsequent read from port isn’t affected. If port has already reached to EOF, a null string is returned.

(define p (open-input-string "abc\ndef"))
(read-line p)                  ⇒ "abc"
(get-remaining-input-string p) ⇒ "def"
(read-char p)                  ⇒ #\d
(read-line p)                  ⇒ "ef"
(get-remaining-input-string p) ⇒ ""

Function: open-output-string :key name ¶

[R7RS base][SRFI-6] Creates an output string port. Anything written to the port is accumulated in the buffer, and can be obtained as a string by get-output-string. This is a far more efficient way to construct a string sequentially than pre-allocate a string and fill it with string-set!.

The name keyword argument is a Gauche extension. By default, the created port is named as (output string port). It is mainly used for debugging. You can specify alternative name with this argument. As Gauche’s convention, file ports has the source file path as its name, so port names for debugging information should be parenthesized not to be taken as pathnames.

gosh> (open-output-string)
#<oport (output string port) 0x22a4c00>
gosh> (open-output-string :name "(temporary output)")
#<oport (temporary output) 0x22a49c0>

Function: get-output-string port ¶

[R7RS base][SRFI-6] Takes an output string port port and returns a string that has been accumulated to port so far. If a byte data has been written to the port, this function re-scans the buffer to see if it can consist a complete string; if not, an incomplete string is returned.

This doesn’t affect the port’s operation, so you can keep accumulating content to port after calling get-output-string.

Function: call-with-input-string string proc ¶

Function: call-with-output-string proc ¶

Function: with-input-from-string string thunk ¶

Function: with-output-to-string thunk ¶

These utility functions are trivially defined as follows. The interface is parallel to the file port version.

(define (call-with-output-string proc)
  (let ((out (open-output-string)))
    (proc out)
    (get-output-string out)))

(define (call-with-input-string str proc)
  (let ((in (open-input-string str)))
    (proc in)))

(define (with-output-to-string thunk)
  (let ((out (open-output-string)))
    (with-output-to-port out thunk)
    (get-output-string out)))

(define (with-input-from-string str thunk)
  (with-input-from-port (open-input-string str) thunk))

Function: call-with-string-io str proc ¶

Function: with-string-io str thunk ¶

(define (call-with-string-io str proc)
  (let ((out (open-output-string))
        (in  (open-input-string str)))
    (proc in out)
    (get-output-string out)))

(define (with-string-io str thunk)
  (with-output-to-string
    (lambda ()
      (with-input-from-string str
        thunk))))

Function: write-to-string obj :optional writer ¶

Function: read-from-string string :optional start end ¶

These convenience functions cover common idioms using string ports.

(write-to-string obj writer)
  ≡
  (with-output-to-string (lambda () (writer obj)))

(read-from-string string)
  ≡
  (with-input-from-string string read)

The default value of writer is the procedure write. The default values of start and end is 0 and the length of string.

Portability note: Common Lisp has these functions, with different optional arguments. STk has read-from-string without optional argument.

6.21.6 Coding-aware ports

A coding-aware port is a special type of procedural input port that is used by load to read a program source. The port recognizes the magic comment to specify the character encoding of the program source, such as ;; -*- coding: utf-8 -*-, and makes an appropriate character encoding conversion. See Multibyte scripts for the details of coding magic comment.

Function: open-coding-aware-port iport ¶

Takes an input port and returns an input coding aware port, which basically just pass through the data from iport to its reader. However, if a magic comment appears within the first two lines of data from iport, the coding aware port applies the necessary character encoding conversion to the rest of the data as they are read.

The passed port, iport, is "owned" by the created coding-aware port. That is, when the coding-aware port is closed, iport is also closed. The content read from iport is buffered in the coding-aware port, so other code shouldn’t read from iport.

By default, Gauche’s load uses a coding aware port to read the program source, so that the coding magic comment works for the Gauche source programs (see Loading Scheme file). However, since the mechanism itself is independent from load, you can use this port for other purposes; it is particularly useful to write a function that processes Scheme source programs which may have the coding magic comment.

6.21.7 Input

For the input-related procedures, the optional iport argument must be an input port, and when omitted, the current input port is assumed.

6.21.7.1 Reading data

Function: read :optional iport ¶

[R7RS base] Reads an S-expression from iport and returns it. Gauche recognizes the lexical structure specified in R7RS, and some additional lexical structures listed in Lexical structure.

If iport has already reached to the end of file, an eof object is returned.

The procedure reads up to the last character that consists the S-expression, and leaves the rest in the port. It’s not like CommonLisp’s read, which consumes whitespaces after S-expression by default.

Function: read-with-shared-structure :optional iport ¶
Function: read/ss :optional iport ¶: [SRFI-38] These procedures are defined in SRFI-38 to recognize shared substructure notation (#n=, #n#). Gauche’s builtin read recognizes the SRFI-38 notation, so these are just synonyms to read; these are only provided for SRFI-38 compatibility.

Function: read-char :optional iport ¶: [R7RS base] Reads one character from iport and returns it. If iport has already reached to the end, returns an eof object. If the byte stream in iport doesn’t consist a valid character, the behavior is undefined. (In future, a port will have a option to deal with invalid characters).

Function: peek-char :optional iport ¶: [R7RS base] Reads one character in iport and returns it, keeping the character in the port. If the byte stream in iport doesn’t consist a valid character, the behavior is undefined. (In future, a port will have a option to deal with invalid characters).

Function: read-byte :optional iport ¶

Function: read-u8 :optional iport ¶

[R7RS base] Reads one byte from an input port iport, and returns it as an integer in the range between 0 and 255. If iport has already reached EOF, an eof object is returned.

This is traditionally called read-byte, and R7RS calls it read-u8. You can use either.

Function: peek-byte :optional iport ¶

Function: peek-u8 :optional iport ¶

[R7RS base] Peeks one byte at the head of an input port iport, and returns it as an integer in the range between 0 and 255. If iport has already reached EOF, an eof object is returned.

This is traditionally called peek-byte, and R7RS calls it peek-u8. You can use either.

Function: read-line :optional iport allow-byte-string? ¶

[R7RS base] Reads one line (a sequence of characters terminated by newline or EOF) and returns a string. The terminating newline is not included. This function recognizes popular line terminators (LF only, CRLF, and CR only). If iport has already reached EOF, an eof object is returned.

If a byte sequence is read from iport which doesn’t constitute a valid character in the native encoding, read-line signals an error by default. However, if a true value is given to the argument allow-byte-string?, read-line returns a byte string (incomplete string) in such case, without reporting an error. It is particularly useful if you read from a source whose character encoding is not yet known; for example, to read XML document, you need to check the first line to see if there is a charset parameter so that you can then use an appropriate character conversion port. This optional argument is Gauche’s extension to R7RS.

Function: read-string nchars :optional iport ¶: [R7RS base] Read nchars characters, or as many characters as available before EOF, and returns a string that consists of those characters. If the input has already reached EOF, an eof object is returned.

Function: consume-trailing-whitespaces :optional iport ¶

Reads and discards consecutive whitespace characters up to (including) the first EOL, from iport. If iport is omitted, the current input port is used.

This is mainly for interactive REPL when the input is line buffered. Suppose you’re reading user’s input using read. If the input is line buffered, read won’t see any input until the user type RET. It means if the user type (read-line)RET, then read returns (read-line), and if you evaluate it immediately, it reads RET.

That is not users typically expect; the user wants to type the input for read-line. Hence, it is useful to consume trailing RET after reading the user’s initial input.

Function: read-block nbytes :optional iport ¶

This procedure is deprecated - use read-uvector instead (see Uvector block I/O).

Reads nbytes bytes from iport, and returns an incomplete string consisted by those bytes. The size of returned string may shorter than nbytes when iport doesn’t have enough bytes to fill. If nbytes is zero, a null string is always returned.

If iport has already reached EOF, an eof object is returned.

If iport is a file port, the behavior of read-block differs by the buffering mode of the port (See File ports, for the detail explanation of buffering modes).

If the buffering mode is :full, read-block waits until nbytes data is read, except it reads EOF.
If the buffering mode is :modest or :none, read-block returns shorter string than nbytes even if it doesn’t reach EOF, but the entire data is not available immediately.

If you want to write a chunk of bytes to a port, you can use either display if the data is in string, or write-uvector in gauche.uvector (see Uvector block I/O) if the data is in uniform vector.

Function: eof-object ¶: [R7RS base] Returns an EOF object.

Function: eof-object? obj ¶: [R7RS base] Returns true if obj is an EOF object.

Function: char-ready? :optional port ¶

[R7RS base] If a character is ready to be read from port, returns #t.

For now, this procedure actually checks only if next byte is immediately available from port. If the next byte is a part of a multibyte character, the attempt to read the whole character may block, even if char-ready? returns #t on the port. (It is unlikely to happen in usual situation, but theoretically it can. If you concern, use read-uvector to read the input as a byte sequence, then use input string port to read characters.)

Function: byte-ready? :optional port ¶

Function: u8-ready? :optional port ¶

[R7RS base] If one byte (octet) is ready to be read from port, returns #t.

This is traditionally called byte-ready?, and R7RS calls it u8-ready?. You can use either.

6.21.7.2 Reader lexical mode

Parameter: reader-lexical-mode ¶

Get/set the reader lexical mode. Changing this parameter switches behavior of the reader concerning some corner cases of the lexical syntax, where legacy Gauche syntax and R7RS syntax aren’t compatible.

In general, you don’t need to change this parameter directly. The lexical syntax matters at the read-time, while changing this parameter happens at the execution-time; unless you know the exact timing when each phase occurs, you might not get what you want.

The hash-bang directive #!gauche-legacy and #!r7rs indirectly affects this parameter; the first one sets the reader mode to legacy, and the second one to strict-r7.

The command-line argument -fwarn-legacy sets the default reader mode to warn-legacy.

Change to this parameter during load is delimited within that load; once load is done, the value of this parameter is reset to the value when load is started.

The parameter takes one of the following symbols as a value.

permissive

This is the default mode. It tries to find a reasonable compromise between two syntax.

In string literals, hex escape sequence is first interpreted as R7RS lexical syntax. If the syntax doesn’t conform R7RS hex escape, it is interpreted as legacy Gauche hex escape syntax. For example, "\x30;a" is read as "0a", for the hex escape sequence including the terminating semicolon is read as R7RS hex escape sequence. It also reads "\x30a" as "0a", for the legacy Gauche hex escape always takes two hexadecimal digits without the terminator. With this mode, you can use R7RS hex escape syntax for the new code, and yet almost all legacy Gauche code can be read without a problem. However, if the legacy code has a semicolon followed by hex escape, it is interpreted as R7RS syntax and the incompatibility arises.

strict-r7

Strict R7RS compatible mode. When the reader encounters the hash-bang directive #!r7rs, the rest of file is read with this mode.

In this mode, Gauche’s extended lexical syntax will raise an error.

Use this mode to ensure the code can be read on other R7RS implementations.

legacy

The reader works as the legacy Gauche (version 0.9.3.3 and before). When the reader encounters the hash-bang directive #!gauche-legacy, the rest of file is read with this mode.

This only matters when you want to read two-digit hex escape followed by semicolon as a character plus a semicolon, e.g. "\x30;a" as "0;a" instead of "0a". We expect such a sequence rarely appears in the code, but if you dump a data in a string literal format, you may have such sequence (especially in incomplete string literals).

warn-legacy

The reader works as the permissive mode, but warns if it reads legacy hex-escape syntax. This mode is default when -fwarn-legacy command-line argument is given to gosh.

This is useful to check if you have any incompatible escape sequence in your code.

6.21.7.3 Read-time constructor

Read-time constructor, defined in SRFI-10, provides an easy way to create an external representation of user-defined structures.

Reader Syntax: #,(tag arg …) ¶

[SRFI-10] Gauche maintains a global table that associates a tag (symbol) to a constructor procedure.

When the reader encounters this syntax, it reads arg …, finds a reader constructor associated with tag, and calls the constructor with arg … as arguments, then inserts the value returned by the constructor as the result of reading the syntax.

Note that this syntax is processed inside the reader—the evaluator doesn’t see any of args, but only sees the object the reader returns.

Function: define-reader-ctor tag procedure ¶

[SRFI-10] Associates a reader constructor procedure with tag.

Examples:

(define-reader-ctor 'pi (lambda () (* (atan 1) 4)))

#,(pi) ⇒ 3.141592653589793

'(#,(pi)) ⇒ (3.141592653589793)

(define-reader-ctor 'hash
  (lambda (type . pairs)
    (let ((tab (make-hash-table type)))
      (for-each (lambda (pair)
                  (hash-table-put! tab (car pair) (cdr pair)))
                pairs)
      tab)))

(define table
 #,(hash eq? (foo . bar) (duh . dah) (bum . bom)))

table ⇒ #<hash-table eq? 0x80f9398>
(hash-table-get table 'duh) ⇒ dah

Combined with write-object method (see Output), it is easy to make a user-defined class written in the form it can be read back:

(define-class <point> ()
  ((x :init-value 0 :init-keyword :x)
   (y :init-value 0 :init-keyword :y)))

(define-method write-object ((p <point>) out)
  (format out "#,(<point> ~s ~s)" (ref p 'x) (ref p 'y)))

(define-reader-ctor '<point>
  (lambda (x y) (make <point> :x x :y y)))

NOTE: The extent of the effect of define-reader-ctor is not specified in SRFI-10, and might pose a compatibility problem among implementations that support SRFI-10. (In fact, the very existence of define-reader-ctor is up to an implementation choice.)

In Gauche, at least for the time being, define-reader-ctor take effects as soon as the form is compiled and evaluated. Since Gauche compiles and evaluates each toplevel form in order, tag specified in define-reader-ctor can be used immediately after that. However, it doesn’t work if the call of define-reader-ctor and the use of tag is enclosed in a begin form, for the entire begin form is compiled at once before being evaluated.

Other implementations may require to read the entire file before making its define-reader-ctor call effective. If so, it effectively prevents one from using define-reader-ctor and the defined tag in the same file. It is desirable to separate the call of define-reader-ctor and the use of tag in the different files if possible.

Another issue about the current define-reader-ctor is that it modifies the global table of Gauche system, hence it is not modular. The code written by different people might use the same tags, and yield an unexpected result. In future versions, Gauche may have some way to encapsulate the scope of tag, although the author doesn’t have clear idea yet.

6.21.7.4 Input utility functions

Function: port->string port ¶

Function: port->list reader port ¶

Function: port->string-list port ¶

Function: port->sexp-list port ¶

Generally useful input procedures. The API is taken from scsh and STk.

port->string reads port until EOF and returns the accumulated data as a string.

port->list applies reader on port repeatedly, until reader returns an EOF, then returns the list of objects reader returned. Note that port isn’t closed.

port->string-list is a port->list specialized by read-line, and port->sexp-list is a port->list specialized by read.

If the input contains an octet sequence that’s not form a valid character in the Gauche’s native character encoding, port->string and port->string-list may return incomplete string(s). If you want to deal with binary data, consider using port->uvector in gauche.uvector (see Uvector block I/O).

Function: port-fold fn knil reader ¶

Function: port-fold-right fn knil reader ¶

Function: port-for-each fn reader ¶

Function: port-map fn reader ¶

Convenient iterators over the input read by reader.

Since these procedures are not really about ports, they are superseded by generator-fold, generator-fold-right, generator-for-each and generator-map, respectively. See Folding generated values, for the details.

We provide these only for the backward compatibility.

6.21.8 Output

6.21.8.1 Layers of output routines

Gauche has quite a few output procedures which may confuse newcomers. The following table will help to understand how to use those procedures:

Object writers

Procedures that write out Scheme objects. Although there exist more low-level procedures, these are regarded as a basic layer of output routines, since it works on a generic Scheme object as a single unit. They come in two flavors:

Write-family procedures: write, write-shared, write-simple–these are to produce external representation of Scheme objects, which can be generally read back by read without losing information as much as possible¹. The external representation of most Scheme objects are the ones you write literal data in program, so this is the default way of writing Scheme objects out.
Display-family procedures: display, print, newline. These are to produce plain-text output suitable for human readers.

High-level formatting output

To produce output in specific width, alignment, etc: format. This corresponds to C’s printf.

Low-level type-specific output

Procedures that deal with raw data.

To output a character or a byte: write-char, write-byte.
To output a string or an array of binary data: write-string, write-uvector.
To flush the output buffer: flush, flush-all-ports.

6.21.8.2 Output controls

Class: <write-controls> ¶

You can control several aspects of Lisp structure output via <write-controls> object. The object output routines (e.g. write, display) and the high-level output routines (e.g. format) can take optional write-controls.

The following example may give you some ideas on what write controls can do:

(write '(1 10 100 1000)
       (make-write-controls :base 16 :radix #t))
 prints (#x1 #xa #x64 #x3e8)

(write (iota 100)
       (make-write-controls :length 5))
 prints (0 1 2 3 4 ...)

The make-write-controls procedure returns a write-controls object, which has the following slots (those slot names are taken from Common Lisp’s print control variables):

Instance Variable of <write-controls>: length ¶: If this slot has a nonnegative integer, it determines the maximum number of items displayed for lists and vectors (including uniform vectors). If the sequence has more elements than the limit, an ellipsis is printed in place. If this slot is #f (default), sequence will be written out fully.

Instance Variable of <write-controls>: level ¶: If this slot has a nonnegative integer, it determines the maximum depth of the structure (lists and vectors) to be displayed. If the structure has deeper node, it will be printed as #. If this slot is #f (default), no depth limit is enforced.

Instance Variable of <write-controls>: base ¶: This slot must have an integer between 2 and 36, inclusive, and specifies the radix used to print exact numbers. The default value is 10.

Instance Variable of <write-controls>: radix ¶: This slot must have a boolean value. If it is true, radix prefix is always printed before exact numbers. The default value is #f.

Instance Variable of <write-controls>: pretty ¶: If this slot has true value, pretty printing is used, that is, newlines and indentations are inserted to show nested data structures fit in the specified width of columns.

Instance Variable of <write-controls>: width ¶: If this slot has a nonnegative integer, it specifies the display column width used for pretty printing.

Instance Variable of <write-controls>: indent ¶

This slot must be a nonnegative exact integer. If this is greater than zero, and the pretty slot is true, that number of whitespaces are written out after the pretty printer emits a newline. That is, the second and after lines are indented. If the pretty slot is false, this slot is ignored and no indentation is done.

Note that the first line is not indended; the intention is that pretty printing begins at a certain column, so that no whitespace insetion is needed for the first line but the rest of lines need to be indented to align to the beginning of pretty print.

Instance Variable of <write-controls>: string-length ¶: This slot must be a nonnegative exact integer or #f. If its value is an integer, string literals longer than the value are truncated during output; an ellipsis is shown after truncated content, before closing double-quotes.

A write-controls object is immutable. If you need a controls object with a slight variation of an existing controls object, use write-controls-copy.

Note: When we introduced <write-controls> object in 0.9.5, we used slot names as print-length, print-pretty etc., mirroring Common Lisp’s special variables. However, the print- part is redundant, as it is a part of a class dedicated to print control. So we changed the slot names as of 0.9.6. The procedures make-write-controls and write-controls-copy accepts both old and new names for the backward compatibility. The old code that directly refers to the slots needs to be rewritten (we think there’re a not a lot). We’ll drop the old name support in 1.0 release.

Function: make-write-controls :key length level base radix pretty width indent string-length ¶: Creates and returns a write-controls object.

Function: write-controls-copy controls :key length level base radix pretty width indent string-length ¶: Returns a copy of another write-controls object controls. If keyword arguments are given, those values override the original values.

Note: The high-level output procedures can be recursively called via write-object method. In that case, the write controls of the root output call will be automatically inherited to the recursive output calls to the same port.

6.21.8.3 Object output

For the following procedures, the optional port argument must be an output port, and when omitted, the current output port is assumed.

Some procedures take port/controls argument, which can be either an output port or <write-controls> object. For example, write takes up to two such optional arguments; that is, you can call it as (write obj), (write obj port), (write obj controls), (write obj port controls) or (write obj controls port). When omitted, the port is assumed to be the current output port, and the controls is assumed to be the default controls.

Function: write obj :optional port/controls1 port/controls2 ¶

Function: write-shared obj :optional port/controls1 port/controls2 ¶

Function: write-simple obj :optional port/controls1 port/controls2 ¶

[R7RS+ write] The write-family procedures are used to write an external representation of Scheme object, which can be read back by read procedure. The three procedures differ in a way to handle shared or circular structures.

Write is circular-safe; that is, it uses datum label notation (#n= and #n#) to show cycles. It does not use datum label notation for non-circular structures that are merely shared (see the second example).

(let1 x (list 1)
  (set-cdr! x x)   ; create a cycle
  (write x))
 ⇒ shows #0=(1 . #0#)

(let1 x (list 1)
  (write (list x x)))
 ⇒ shows ((1) (1))

Write-shared is also circular-safe, and it also shows shared structures using datum labels. Use this if you need to preserve topology of a graph structure.

(let1 x (list 1)
  (write-shared (list x x)))
 ⇒ shows (#0=(1) #0#)

Finally, write-simple writes out the object recursively without taking account of shared or circular structures. This is fast, for it doesn’t need to scan the structure before actually writing out. However, it won’t stop when a circular structure is passed.

When these procedures encounter an object of a user-defined class, they call the generic function write-object.

Historical context: Write has been in Scheme standards, but handling of circular structures hasn’t been specified until R7RS. In fact, until Gauche 0.9.4, write diverged for circular structures. SRFI-38 introduced the datum-label notation and write-with-shared-structure and write/ss procedures to produce such notation, and Gauche supported it. R7RS clarified this issue, and Gauche 0.9.4 followed.

Function: write-with-shared-structure obj :optional port ¶

Function: write/ss obj :optional port ¶

Function: write* obj :optional port ¶

[SRFI-38] These are aliases of write-shared above.

Gauche has been used the name write* for long, which is taken from STklos. SRFI-38 defines write-with-shared-structure and write/ss. These names are kept for the backward compatibility. New code should use write-shared.

Function: display obj :optional port/controls1 port/controls2 ¶

[R7RS write] Produces a human-friendly representation of an object obj to the output port.

If obj contains cycles, display uses datum-label notation.

When display encounters an object of a user-defined class, it calls the generic function write-object.

(display "\"Mahalo\", he said.")
 ⇒ shows "Mahalo", he said.

(let ((x (list "imua")))
  (set-cdr! x x)
  (display x))
 ⇒ shows #0=(imua . #0#)

Function: print expr … ¶: Displays exprs (using display) to the current output port, then writes a newline.

Function: pprint obj :key port controls width length level newline indent ¶

Pretty prints obj to port, which is defaulted to the current output port. The same effect is achieved by passing the write procedure a write control with pretty slot setting to #t (in fact, it is how pprint is implemented), but this procedure provides more convenient interface when you want to play with the pretty printer.

By default, pprint prints a newline after writing obj. You can suppress this newline by passing #f to newline keyword argument.

To customize pretty printing, you can pass a write control object to the controls keyword argument (the pretty slot of controls is ignored; it’ll always printed prettily). Furthermore, you can override width, length, level, and indent slots of controls. If you omit controls, a reasonable default value is assumed. See Output controls, for the detail of write controls.

(pprint (make-list 6 '(gauche droite)))
 ⇒ prints
  ((gauche droite) (gauche droite) (gauche droite) (gauche droite)
   (gauche droite) (gauche droite))

(pprint (make-list 6 '(gauche droite)) :width 20)
 ⇒ prints
  ((gauche droite)
   (gauche droite)
   (gauche droite)
   (gauche droite)
   (gauche droite)
   (gauche droite))

(pprint (make-list 6 '(gauche droite)) :length 3)
 ⇒ prints
  ((gauche droite) (gauche droite) (gauche droite) …)

(pprint (make-list 6 '(gauche droite)) :level 1)
 ⇒ prints
  (# # # # # #)

Method: write-object (obj <object>) port ¶: You can customize how the object is printed out by this method.

Function: newline :optional port ¶: [R7RS base] Writes a newline character to port. This is equivalent to (write-char #\newline port), (display "\n" port). It is kept for a historical reason.

Function: fresh-line :optional port ¶

If the output column is at the beginning of line, do nothing and returns #f. Otherwise, writes a newline character and returns #t. It is handy when you want to make sure the next output starts from the beginning of line, but do not want to produce an extra blank line.

This procedure is taken from Common Lisp.

Not all situations allow this procedure to know the port’s output column; especially, if you mix textual and binary data to output, the column tracking becomes unreliable. If this procedure can’t detrmine the column position, it emits a newline character.

6.21.8.4 Formatting output

Function: format [d/c1 [d/c2]] format-string arg … ¶

[SRFI-28+] Format arg … according to format-string. This function is a subset of CommonLisp’s format function, with a bit of extension. It is also a superset of SRFI-28, Basic format strings.

The procedure has unusual signature. The two optional arguments, d/c1 and d/c2, can be an object to specify the destination, or a <write-controls> object to tailor ~s and ~a output (see Output controls). You can give none, either one of them, or both, in any order.

Note: SRFI-28’s format does not take optional arguments before format-string

If you give an object to specify destination, it must be either an output port or a boolean value. If it is an output port, the formatted result is written to it; if it is #t, the result is written to the current output port; if it is #f, the formatted result is returned as a string. Dest can be omitted, as SRFI-28 format; it has the same effects as giving #f to the dest.

string is a string that contains format directives. A format directive is a character sequence begins with tilde, ‘~’, and ends with some specific characters. A format directive takes the corresponding arg and formats it. The rest of string is copied to the output as is.

(format #f "the answer is ~s" 42)
  ⇒ "the answer is 42"

The format directive can take one or more parameters, separated by comma characters. A parameter may be an integer or a character; if it is a character, it should be preceded by a quote character. Parameter can be omitted, in such case the system default value is used. The interpretation of the parameters depends on the format directive.

Furthermore, a format directive can take two additional flags: atmark ‘@’ and colon ‘:’. One or both of them may modify the behavior of the format directive. Those flags must be placed immediately before the directive character.

If a character ‘v’ or ‘V’ is in the place of the parameter, the value of the parameter is taken from the format’s argument. The argument must be either an integer, a character, or #f (indicating that the parameter is effectively omitted).

Some examples:

~10,2s: A format directive ~s, with two parameters, 10 and 2.
~12,,,'*A: A format directive ~a, with 12 for the first parameter and a character ‘*’ for the fourth parameter. The second and third parameters are omitted.
~10@d: A format directive ~d, with 10 for the first parameter and ‘@’ flag.
~v,vx: A format directive ~x, whose first and second parameter will be taken from the arguments.

The following is a complete list of the supported format directives. Either upper case or lower case character can be used for the format directive; usually they have no distinction, except noted.

~A

Parameters: mincol,colinc,minpad,padchar,maxcol

Ascii output. The corresponding argument is printed by display. If an integer mincol is given, it specifies the minimum number of characters to be output; if the formatted result is shorter than mincol, a whitespace is padded to the right (i.e. the result is left justified).

The colinc, minpad and padchar parameters control, if given, further padding. A character padchar replaces the padding character for the whitespace. If an integer minpad is given and greater than 0, at least minpad padding character is used, regardless of the resulting width. If an integer colinc is given, the padding character is added (after minpad) in chunk of colinc characters, until the entire width exceeds mincol.

If atmark-flag is given, the format result is right justified, i.e. padding is added to the left.

The maxcol parameter, if given, limits the maximum number of characters to be written. If the length of formatted string exceeds maxcol, only maxcol characters are written. If colon-flag is given as well and the length of formatted string exceeds maxcol, maxcol - 4 characters are written and a string “ ...” is attached after it.

(format #f "|~a|" "oops")
  ⇒ "|oops|"
(format #f "|~10a|" "oops")
  ⇒ "|oops      |"
(format #f "|~10@a|" "oops")
  ⇒ "|      oops|"
(format #f "|~10,,,'*@a|" "oops")
  ⇒ "|******oops|"

(format #f "|~,,,,10a|" '(abc def ghi jkl))
  ⇒ "|(abc def gh|"
(format #f "|~,,,,10:a|" '(abc def ghi jkl))
  ⇒ "|(abc de ...|"

~S

Parameters: mincol,colinc,minpad,padchar,maxcol

S-expression output. The corresponding argument is printed by write. The semantics of parameters and flags are the same as ~A directive.

(format #f "|~s|" "oops")
  ⇒ "|\"oops\"|"
(format #f "|~10s|" "oops")
  ⇒ "|\"oops\"    |"
(format #f "|~10@s|" "oops")
  ⇒ "|    \"oops\"|"
(format #f "|~10,,,'*@s|" "oops")
  ⇒ "|****\"oops\"|"

~W

Parametres: None

Write prettily. Without flags, it just works like ~S. If a colon flag is given, it works like pretty print, i.e. pretty slot of the write context is true. If an atmark flag is given, it works like length and level slots of the write context is #f, i.e. temporarily turn off length and level limits.

The indent of pretty print is set at the position where ~W begins, so that if the output spans multiple lines, the second and following lines are indented properly.

(format #t "Result: ~:w\n" (make-list 40 'aaaa))
 ⇒ prints:
Result: (aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa
         aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa
         aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa)

~C

Parameters: None

Character output. The argument must be a character, or an error is signaled. If no flags are given, the character is printed with display. If atmark-flag is given, the character is printed with write.

~D

Parameters: mincol,padchar,commachar,interval

Decimal output. The argument is formatted as an decimal integer. If the argument is not an integer, all parameters are ignored (after processing ‘v’ parameters) and it is formatted by ~A directive.

If an integer parameter mincol is given, it specifies minimum width of the formatted result; if the result is shorter than it, padchar is padded on the left (i.e. the result is right justified). The default of padchar is a whitespace.

(format #f "|~d|" 12345)
  ⇒ "|12345|"
(format #f "|~10d|" 12345)
  ⇒ "|     12345|"
(format #f "|~10,'0d|" 12345)
  ⇒ "|0000012345|"

If atmark-flag is given, the sign ‘+’ is printed for the positive argument.

If colon-flag is given, every interval-th digit of the result is grouped and commachar is inserted between them. The default of commachar is ‘,’, and the default of interval is 3.

(format #f "|~:d|" 12345)
  ⇒ "|12,345|"
(format #f "|~,,'_,4:d|" -12345678)
  ⇒ "|-1234_5678|"

~B

Parameters: mincol,padchar,commachar,interval

Binary output. The argument is formatted as a binary integer. The semantics of parameters and flags are the same as the ~D directive.

~O

Parameters: mincol,padchar,commachar,interval

Octal output. The argument is formatted as an octal integer. The semantics of parameters and flags are the same as the ~D directive.

~X

~x

Parameters: mincol,padchar,commachar,interval

Hexadecimal output. The argument is formatted as a hexadecimal integer. If ‘X’ is used, upper case alphabets are used for the digits larger than 10. If ‘x’ is used, lower case alphabets are used. The semantics of parameters and flags are the same as the ~D directive.

(format #f "~8,'0x" 259847592)
  ⇒ "0f7cf5a8"
(format #f "~8,'0X" 259847592)
  ⇒ "0F7CF5A8"

~R

~r

Parameters: radix,mincol,padchar,commachar,interval

Radix. If the argument is an exact number, the directive ~nR prints out the argument in radix n. For example, ~10R is the same as ~D. The radix parameter must be between 2 and 36, inclusive. Other paramters are are the same as ~D directive.

If the argument is other than an exact number, it is formatted just as ~A.

If uppercase R is used, uppercase alphabets are used for base-11 and above. If lowercase r is used, lowercase alphabats are used.

If atmark-flag is given, it behaves differently; it prints out the numeric argument in Roman numerals, e.g. MMXXIII for 2023. Adding colon-flag with atmark-flag, it uses a variation of Roman numerals (IIII for 4 instead of IV, for example). The argument must be a positive exact integer in this case. Common Lisp only supports up to 4,999 since the Roman numeral 5,000 can’t be written in ASCII. Unicode has Roman numeral up to 100,000, however, so we support up to 499,999. Again, the distinction of R and r is reflected to the ASCII alphabets used in the output.

If neither radix nor atmark-flag is given, Common Lisp prints the number in English. We haven’t supported it yet, and an error is raised.

(format #f "~36R" 123456789) ⇒ "21I3V9"
(format #f "~36r" 123456789) ⇒ "21i3v9"
(format #f "~@R" 1999)  ⇒ "MCMXCIX"
(format #f "~@:R" 1999) ⇒ "MDCCCCLXXXXVIIII"
(format #f "~@r" 1999)  ⇒ "mcmxcix"
(format #f "~@R" 123456) ⇒ "ↈↂↂMMMCDLVI"

~F

Parameters: width,digis,scale,ovfchar,padchar

Floating-number output. If the argument is a real number, it is formatted as a decimal floating number. The width parameter defines the width of the field; the number is written out right-justified, with the left room padded with padchar, whose default is #\space. When the formatted output can’t fit in width, ovfchar is output width times if it is given, or the entire output is shown if ovfchar is omitted.

(format "~6f" 3.14)          ⇒ "  3.14"
(format "~6f" 3.141592)      ⇒ "3.141592"
(format "~6,,,'#f" 3.141592) ⇒ "######"
(format "~6,,,,'*f" 3.14)    ⇒ "**3.14"

The digits parameter specifies number of digits shown below the decimal point. Must be nonnegative integer. When omitted, enough digits to identify the flonum uniquely is generated (same as using write and display—when you read back the number, you’ll get exactly the same flonum.)

(format "~6,3f" 3.141592)    ⇒ " 3.142"
(format "~6,0f" 3.141592)    ⇒ "    3."
(format "~10,4f" 355/113)    ⇒ "    3.1416"
(format "~10,4f" 3)          ⇒ "    3.0000"

If the scale parameter is given, the argument is multiplied by (expt 10 scale) before printing.

If the @ flag is given, plus sign is printed before the non-negative number.

(format "~8,3@f" 3.141592)   ⇒ "  +3.142"

When digits is smaller than the digits required to represent the flonum unambiguously, we round at digits+1 position. By default, it is done based on the value the flonum represents—that is, we choose the rounded value closer to the actual value of the flonum. It can sometimes lead to unintuitive results, however. Suppose you want to round 1.15 at 100ths (that is, round to nearest 10ths). Unlike elementary math class, it gives you 1.1. That’s because the flonum represented by 1.15 is actually tiny bit smaller than 1.15, so it’s closer to 1.1 than 1.2. We show it as 1.15 since no other flonums are closer to 1.15.

But in casual applications, users may perplexed with this behavior. So we support another rounding mode, which we call notational rounding. It is based on the notation used for the flonum. In that mode, rounding 1.15 to nearest 10ths yields 1.2. You can get it by adding : flag.

(format "~6,1f" 1.15)  ⇒ "   1.1"
(format "~6,1:f" 1.15) ⇒ "   1.2"

~$

Parameters: digits,idigits,width,padchar

Floating-point formatting suitable for currency display. The digits parameter specifies the number of digits after the decimal point (default 2). The idigits parameter specifies the minimum number of digits before the decimal point (default 1). The width parameter specifies the mininum number of characters for the entire display, and if the number of printed characters are smaller than it, padchar is displayed on the left to fill the blank. The default of width is 0, and the default of padchar is #\space.

If @ flag is given and the argument is nonnegative, + is displayed.

If : flag is given, the sign is displayed first, before any padding characters.

If the argument isn’t a real number, the object is formatted as if ~wD directive is given (where w is width).

Gauche specific: The number is rounded with notational rounding (see the description of ~F above for the rounding mode).

(map (cut format "~$" <>) '(1.23 4.5 6))
  ⇒ '("1.23" "4.50" "6.00")

~?

Parameters: None

Recursive formatting. The argument for this directive must be a string which is interpreted as a format string. The arguments for the given directive should be given in the next argument, as a list.

(format "~s~?~s" '< "~s ~s" '(a b) '>)
  ⇒ "<a b>"

If @ flag is given, the arguments for the given directive is taken from the following arguments.

(format "~s~@?~s" '< "~s ~s" 'a 'b '>)
  ⇒ "<a b>"

~*

Parameter: count

Moves the argument counter count times forward, effectively skips next count arguments. The default value of count is 1, hence skip the next argument. If a colon-flag is given, moves the argument counter backwards, e.g. ~:* makes the next directive to process last argument again. If an atmark-flag is given, count specifies absolute position of the arguments, starting from 0.

~P

Parameters: none

Plural. If argument is an exact integer 1, do nothing. Otherwise, print s.

If an atmark-flag is given, print y for the argument 1, and print ies for other argument values.

If a colon-flag is given, back up one argument before processing.

Note: 1.0 is regarded as plural.

(format "~a book~:p, ~a book~:p, ~a pon~:@p, ~a pon~:@p" 1 2 1 2)
 ⇒ "1 book, 2 books, 1 pony, 2 ponies"

~&

Parameter: count

Fresh line. If the output position is at the beginning of line, do nothing; otherwise, output #\newline, just like fresh-line procedure (see Object output).

If count is given and greater than 1, output (- count 1) newline characters after the fresh line operation. If count is 0, do nothing.

~~

~%

~t

~|

Parameter: count

Output a #\~, #\newline, #\tab and #\page, respectively. If count is given, output the specified number of characters.

~ + newline

Parameters: none

Ignored newline. As Scheme supports backslash-newline escape in string literals, this is not necessary, but provided for Common Lisp compatibility. Without flags, the newline after the tilde, and any following whitespace characters (char-whitespace?) are ignored. With a colon flag, the newline immediately after the tilde is ignored but the following whitespace characters are preserved. With an atmark flag, the newline immediately after the tilde is kept but the following whitespace characters are ignored.

(format "abc~\n   def") ⇒ "abcdef"
(format "abc~:\n   def") ⇒ "abc   def"
(format "abc~@\n   def") ⇒ "abc\ndef"

(format "abc~:\n   \ndef") ⇒ "abc   \ndef"

~[

Parameters: none

Conditional formatting. It can take one of three forms:

~[fmt0~;fmt1~;...~]: Consume one argument. If it is not an exact integer, an error is signaled. If it is 0, the format string fmt0 is selected. If it is 1, fmt1 is selected; and so on. If there’s no corresponding format string, it acts like an empty format string by default. However, if the last separator is ~:; instead of ~;, the following format string acts as the ’default’ format string.
~:[fmt-alternative~;fmt-consequence~]: Consume one argument. If it is #f, fmt-alternative is selected. If it is true, fmt-consequence is selected. (Note that it is the reverse of if. Think false as 0 and true as 1).
~@[fmt-consequence~]: Peek one argument. If it is #f, consume it and ignore fmt-consequence. If it is true, process fmt-consequence. Hence, fmt-consequence usually contains a directive to consume one argument.

(map (cut format "~[zero~;one~;two~]" <>) '(0 1 2 3))
  ⇒ ("zero" "one" "two" "")
(map (cut format "~[zero~;one~;two~:;other~]" <>) '(0 1 2 3 -1))
  ⇒ ("zero" "one" "two" "other" "other")
(map (cut format "~:[no~;yes~]" <>) '(#f #t))
  ⇒ ("no" "yes")
(map (cut format "~@[I got ~a~]" <>) '(#f rhythm))
  ⇒ ("" "I got rhythm")

6.21.8.5 Low-level output

Function: write-char char :optional port ¶: [R7RS base] Write a single character char to the output port port.

Function: write-byte byte :optional port ¶

Function: write-u8 byte :optional port ¶

[R7RS base] Write a byte byte to the port. byte must be an exact integer in range between 0 and 255.

This is traditionally called write-byte, and R7RS calls it write-u8. You can use either.

Function: write-string string :optional oport start end ¶: [R7RS base] If the optional start and end arguments are omitted, it is the same as (display string oport). The optional arguments restricts the range of string to be written.

Function: flush :optional port ¶

Function: flush-all-ports ¶

Output the buffered data in port, or all ports, respectively.

R7RS’s flush-output-port is the same as flush. The scheme.base module defines the name as an alias to flush (see scheme.base - R7RS base library).

The function "flush" is called in variety of ways on the various Scheme implementations: force-output (Scsh, SCM), flush-output (Gambit), or flush-output-port (Bigloo). The name flush is taken from STk and STklos.

• Ports:
• Port and threads:
• Common port operations:
• File ports:
• String ports:
• Coding-aware ports:
• Input:
• Output:

• Reading data:
• Reader lexical mode:
• Read-time constructor:
• Input utility functions:

• Layers of output routines:
• Output controls:
• Object output:
• Formatting output:
• Low-level output:

• Opening file ports:
• Character encoding of file I/O:
• File descriptor ports:

6.21 Input and Output

6.21.1 Ports

6.21.2 Port and threads

6.21.3 Common port operations

6.21.4 File ports

6.21.4.1 Opening file ports

6.21.4.2 Character encoding of file I/O

6.21.4.3 File descriptor ports

6.21.5 String ports

6.21.6 Coding-aware ports

6.21.7 Input

6.21.7.1 Reading data

6.21.7.2 Reader lexical mode

6.21.7.3 Read-time constructor

6.21.7.4 Input utility functions

6.21.8 Output

6.21.8.1 Layers of output routines

6.21.8.2 Output controls

6.21.8.3 Object output

6.21.8.4 Formatting output

6.21.8.5 Low-level output

Footnotes

(1)