text.parse
- Parsing input stream ¶A collection of utilities that does simple parsing from the input port. The API is inspired, and compatible with Oleg Kiselyov’s input parsing library (http://okmij.org/ftp/Scheme/parsing.html). His library is used in lots of other libraries, notably, a full-Scheme XML parser/generator SSAX (http://okmij.org/ftp/Scheme/xml.html).
You can use this module in place of his
input-parse.scm
and look-for-str.scm
.
I reimplemented the functions to be efficient on Gauche.
Especially, usage of string-set!
is totally avoided.
I extended the interface a bit so that they can deal with character sets
and predicates, as well as a list of characters.
These functions work sequentially on the given input port, that is, they read from the port as much as they need, without buffering extra characters.
{text.parse
}
Looks for a string str from the input port in-port.
The optional argument max-no-chars limits the maximum number of
characters to be read from the port; if omitted, the search span is
until EOF.
If str is found, this function returns the number of characters
it has read. The next read from in-port returns the next char
of str. If str is not found, it returns #f
.
Note: Although this procedure has ‘?
’ in its name,
it may return non-boolean value, contrary to the Scheme convention.
{text.parse
}
Discards the current character and peeks the next character from port.
Useful to look ahead one character.
If port is omitted, the current input port is used.
In the following functions, char-list refers to one of the followings:
*eof*
.
That denotes a set of characters. If a symbol *eof*
is
included, the EOF condition is also included. Without *eof*
,
the EOF condition is regarded as an error.
{text.parse
}
Reads a character from port. If it is included in char-list,
returns the character. Otherwise, signals an error with a message
containing string.
If port is omitted, the current input port is used.
{text.parse
}
char-list/number is either a char-list or a number.
If it is a number; it reads that many characters and returns #f
.
If the input is not long enough, an error is signaled.
If char-list/number is a char-list, it reads from port
until it sees a character that belongs to the char-list.
Then the character is returned.
If port is omitted, the current input port is used.
{text.parse
}
Reads from port until it sees a character that does not
belong to char-list. The character remains in the stream.
If it reaches EOF, an EOF is returned.
If port is omitted, the current input port is used.
This example skips whitespaces from input. Next read from port returns the first non-whitespace character.
(skip-while #[\s] port)
{text.parse
}
Skips any number of characters in prefix-char-list,
then collects the characters until it sees break-char-list.
The collected characters are returned as a string.
The break character remains in the port.
If the function encounters EOF and *eof*
is not included in
break-char-list, an error is signaled with comment is
included in the message.
{text.parse
}
Reads and collects the characters as far as
it belongs to char-list/pred, then returns them as a string.
The first character that doesn’t belong to char-list/pred remains
on the port.
char-list/pred may be a char-list or a predicate that takes a character. If it is a predicate, each character is passed to it, and the character is regarded to “belong to” char-list/pred when it returns a true value.
{text.parse
}
This is like built-in read-string
(see Reading data),
except that this returns ""
when the input already reached EOF.
Provided for the compatibility for the code that depends Oleg’s library.