For Development HEAD DRAFTSearch (procedure/syntax/module):

12.75 text.parse - Parsing input stream

Module: text.parse

A collection of utilities that does simple parsing from the input port. The API is inspired, and compatible with Oleg Kiselyov’s input parsing library (http://okmij.org/ftp/Scheme/parsing.html). His library is used in lots of other libraries, notably, a full-Scheme XML parser/generator SSAX (http://okmij.org/ftp/Scheme/xml.html).

You can use this module in place of his input-parse.scm and look-for-str.scm.

I reimplemented the functions to be efficient on Gauche. Especially, usage of string-set! is totally avoided. I extended the interface a bit so that they can deal with character sets and predicates, as well as a list of characters.

These functions work sequentially on the given input port, that is, they read from the port as much as they need, without buffering extra characters.

Function: find-string-from-port? str in-port :optional max-no-chars

{text.parse} Looks for a string str from the input port in-port. The optional argument max-no-chars limits the maximum number of characters to be read from the port; if omitted, the search span is until EOF.

If str is found, this function returns the number of characters it has read. The next read from in-port returns the next char of str. If str is not found, it returns #f.

Note: Although this procedure has ‘?’ in its name, it may return non-boolean value, contrary to the Scheme convention.

Function: peek-next-char :optional port

{text.parse} Discards the current character and peeks the next character from port. Useful to look ahead one character. If port is omitted, the current input port is used.

In the following functions, char-list refers to one of the followings:

That denotes a set of characters. If a symbol *eof* is included, the EOF condition is also included. Without *eof*, the EOF condition is regarded as an error.

Function: assert-curr-char char-list string :optional port

{text.parse} Reads a character from port. If it is included in char-list, returns the character. Otherwise, signals an error with a message containing string. If port is omitted, the current input port is used.

Function: skip-until char-list/number :optional port

{text.parse} char-list/number is either a char-list or a number. If it is a number; it reads that many characters and returns #f. If the input is not long enough, an error is signaled. If char-list/number is a char-list, it reads from port until it sees a character that belongs to the char-list. Then the character is returned. If port is omitted, the current input port is used.

Function: skip-while char-list :optional port

{text.parse} Reads from port until it sees a character that does not belong to char-list. The character remains in the stream. If it reaches EOF, an EOF is returned. If port is omitted, the current input port is used.

This example skips whitespaces from input. Next read from port returns the first non-whitespace character.

(skip-while #[\s] port)
Function: next-token prefix-char-list break-char-list :optional comment port

{text.parse} Skips any number of characters in prefix-char-list, then collects the characters until it sees break-char-list. The collected characters are returned as a string. The break character remains in the port.

If the function encounters EOF and *eof* is not included in break-char-list, an error is signaled with comment is included in the message.

Function: next-token-of char-list/pred :optional port

{text.parse} Reads and collects the characters as far as it belongs to char-list/pred, then returns them as a string. The first character that doesn’t belong to char-list/pred remains on the port.

char-list/pred may be a char-list or a predicate that takes a character. If it is a predicate, each character is passed to it, and the character is regarded to “belong to” char-list/pred when it returns a true value.

Function: read-string n :optional port

{text.parse} This is like built-in read-string (see Reading data), except that this returns "" when the input already reached EOF.

Provided for the compatibility for the code that depends Oleg’s library.



For Development HEAD DRAFTSearch (procedure/syntax/module):
DRAFT