Next: sxml.sxpath
- SXML query language, Previous: slib
- SLIB interface, Up: Library modules - Utilities [Contents][Index]
sxml.ssax
- Functional XML parsersxml.*
modules are the adaptation of
Oleg Kiselyov’s SXML framework (http://okmij.org/ftp/Scheme/xml.html),
which is based on S-expression representation of XML structure.
SSAX is a parser part of SXML framework. This is a quote from SSAX webpage:
A SSAX functional XML parsing framework consists of a DOM/SXML parser, a SAX parser, and a supporting library of lexing and parsing procedures. The procedures in the package can be used separately to tokenize or parse various pieces of XML documents. The framework supports XML Namespaces, character, internal and external parsed entities, attribute value normalization, processing instructions and CDATA sections. The package includes a semi-validating SXML parser : a DOM-mode parser that is an instantiation of a SAX parser (called SSAX).
The current version is based on the SSAX CVS version newer than
the last ’official’ release of SXML toolset (4.9), and
SXML-gauche-0.9 package which was based on SXML-4.9.
There is an important change from that release.
Now the API uses lowercase letter suffix ssax:
instead of uppercase SSAX:
—the difference matters since
Gauche is case sensitive by default.
Alias names are defined for backward compatibility,
but the use of uppercase suffixed names are deprecated.
I derived the content of this part of the manual from SSAX source code, just by converting its comments into texinfo format. The original text is by Oleg Kiselyov. Shiro Kawai should be responsible for any typographical error or formatting error introduced by conversion.
The manual entries are ordered in "bottom-up" way, beginning from
the lower-level constructs towards the high-level utilities.
If you just want to parse XML document and obtain SXML,
check out ssax:xml->sxml
in SSAX Highest-level parsers - XML to SXML.
• SSAX data types: | ||
• SSAX low-level parsing code: | ||
• SSAX higher-level parsers and scanners: | ||
• SSAX Highest-level parsers - XML to SXML: |
Next: SSAX low-level parsing code, Previous: sxml.ssax
- Functional XML parser, Up: sxml.ssax
- Functional XML parser [Contents][Index]
a symbol ’START
, ’END
, ’PI
, ’DECL
, ’COMMENT
, ’CDSECT
or ’ENTITY-REF
that identifies a markup token.
a name (called GI
in the XML Recommendation) as given in an xml
document for a markup token: start-tag, PI
target, attribute name.
If a GI
is an NCName
, UNRES-NAME is this NCName
converted into
a Scheme symbol. If a GI
is a QName
, UNRES-NAME is a pair of
symbols: (PREFIX . LOCALPART)
An expanded name, a resolved version of an UNRES-NAME. For an element or an attribute name with a non-empty namespace URI, RES-NAME is a pair of symbols, (URI-SYMB . LOCALPART). Otherwise, it’s a single symbol.
A symbol:
ANY | anything goes, expect an END tag. |
EMPTY-TAG | no content, and no END-tag is coming. |
EMPTY | no content, expect the END-tag as the next token. |
PCDATA | expect character data only, and no children elements. |
MIXED | |
ELEM-CONTENT |
A symbol representing a namespace URI – or other symbol chosen
by the user to represent URI. In the former case,
URI-SYMB is created by %
-quoting of bad URI characters and
converting the resulting string into a symbol.
A list representing namespaces in effect. An element of the list has one of the following forms:
(prefix uri-symb . uri-symb)
or,
(prefix user-prefix . uri-symb)
user-prefix is a symbol chosen by the user to represent the URI.
(#f user-prefix . uri-symb)
Specification of the user-chosen prefix and a uri-symbol.
(*DEFAULT* user-prefix . uri-symb)
Declaration of the default namespace
(*DEFAULT* #f . #f)
Un-declaration of the default namespace. This notation represents overriding of the previous declaration
A NAMESPACES list may contain several elements for the same PREFIX. The one closest to the beginning of the list takes effect.
An ordered collection of (NAME . VALUE) pairs, where NAME is a RES-NAME or an UNRES-NAME. The collection is an ADT.
A procedure of three arguments:
(string1 string2 seed)
returning a new seed.
The procedure is supposed to handle a chunk of character data
string1 followed by a chunk of character data string2.
string2 is a short string, often "\n" and even ""
An assoc list of pairs:
(named-entity-name . named-entity-body)
where named-entity-name is a symbol under which the entity was
declared, named-entity-body is either a string, or
(for an external entity) a thunk that will return an
input port (from which the entity can be read).
named-entity-body may also be #f
. This is an indication that a
named-entity-name is currently being expanded. A reference to
this named-entity-name will be an error: violation of the
WFC nonrecursion.
A record with two slots, kind and token. This record represents a markup, which is, according to the XML Recommendation, "takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, document type declarations, and processing instructions."
a TAG-KIND
an UNRES-NAME. For xml-tokens of kinds ’COMMENT
and
’CDSECT
, the head is #f
For example,
<P> => kind='START, head='P </P> => kind='END, head='P <BR/> => kind='EMPTY-EL, head='BR <!DOCTYPE OMF ...> => kind='DECL, head='DOCTYPE <?xml version="1.0"?> => kind='PI, head='xml &my-ent; => kind = 'ENTITY-REF, head='my-ent
Character references are not represented by xml-tokens as these references are transparently resolved into the corresponding characters.
A record with three slots, elems, entities, and notations.
The record represents a datatype of an XML document: the list of declared elements and their attributes, declared notations, list of replacement strings or loading procedures for parsed general entities, etc. Normally an xml-decl record is created from a DTD or an XML Schema, although it can be created and filled in in many other ways (e.g., loaded from a file).
elems: an (assoc) list of decl-elem or #f
. The latter instructs
the parser to do no validation of elements and attributes.
decl-elem: declaration of one element:
(elem-name elem-content decl-attrs)
;
elem-name is an UNRES-NAME for the element.
elem-content is an ELEM-CONTENT-MODEL.
decl-attrs is an ATTLIST,
of (attr-name . value)
associations.
This element can declare a user procedure to handle parsing of an
element (e.g., to do a custom validation, or to build a hash of
IDs as they’re encountered).
decl-attr: an element of an ATTLIST, declaration of one attribute
(attr-name content-type use-type default-value)
:
attr-name is an UNRES-NAME for the declared attribute;
content-type is a symbol: CDATA
, NMTOKEN, NMTOKENS, ...;
or a list of strings for the enumerated type.
use-type is a symbol: REQUIRED
, IMPLIED
, FIXED
default-value is a string for the default value, or #f
if not given.
{sxml.ssax} Utility procedures to deal with attribute list, which keeps name-value association.
{sxml.ssax} A constructor and a predicate for a XML-TOKEN record.
{sxml.ssax} Accessor macros of a XML-TOKEN record.
Next: SSAX higher-level parsers and scanners, Previous: SSAX data types, Up: sxml.ssax
- Functional XML parser [Contents][Index]
They deal with primitive lexical units (Names, whitespaces, tags)
and with pieces of more generic productions. Most of these parsers
must be called in appropriate context. For example, ssax:complete-start-tag
must be called only when the start-tag has been detected and its GI
has been read.
{sxml.ssax} Skip the S (whitespace) production as defined by
[3] S ::= (#x20 | #x9 | #xD | #xA)
The procedure returns the first not-whitespace character it encounters while scanning the port. This character is left on the input stream.
{sxml.ssax}
Check to see if a-char may start a NCName
.
{sxml.ssax}
Read a NCName
starting from the current position in the port and
return it as a symbol.
{sxml.ssax}
Read a (namespace-) Qualified Name, QName
, from the current
position in the port.
From REC-xml-names:
[6] QName ::= (Prefix ':')? LocalPart [7] Prefix ::= NCName [8] LocalPart ::= NCName
Return: an UNRES-NAME.
{sxml.ssax}
The prefix of the pre-defined XML namespace, i.e. ’xml
.
{sxml.ssax}
This procedure starts parsing of a markup token. The current position
in the stream must be #\<
. This procedure scans enough of the input stream
to figure out what kind of a markup token it is seeing. The procedure returns
an xml-token structure describing the token. Note, generally reading
of the current markup is not finished! In particular, no attributes of
the start-tag token are scanned.
Here’s a detailed break out of the return values and the position in the port when that particular value is returned:
PI-token
only PI
-target is read.
To finish the Processing Instruction and disregard it,
call ssax:skip-pi
. ssax:read-attributes
may be useful
as well (for PI
s whose content is attribute-value
pairs)
END-token
The end tag is read completely; the current position
is right after the terminating #\>
character.
COMMENT
is read and skipped completely. The current position
is right after "-->
" that terminates the comment.
CDSECT
The current position is right after "<!CDATA[
".
Use ssax:read-cdata-body
to read the rest.
DECL
We have read the keyword (the one that follows "<!
")
identifying this declaration markup. The current
position is after the keyword (usually a
whitespace character)
START-token
We have read the keyword (GI
) of this start tag.
No attributes are scanned yet. We don’t know if this
tag has an empty content either.
Use ssax:complete-start-tag
to finish parsing of
the token.
{sxml.ssax}
The current position is inside a PI
. Skip till the rest of the PI
.
{sxml.ssax}
The current position is right after reading the PITarget
. We read the
body of PI
and return it as a string. The port will point to the
character right after ’?>
’ combination that terminates PI
.
[16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
{sxml.ssax}
The current pos in the port is inside an internal DTD subset
(e.g., after reading #\[
that begins an internal DTD subset)
Skip until the "]>
" combination that terminates this DTD
{sxml.ssax}
This procedure must be called after we have read a string "<![CDATA[
"
that begins a CDATA
section. The current position must be the first
position of the CDATA
body. This function reads lines of the CDATA
body and passes them to a STR-HANDLER, a character data consumer.
The str-handler is a STR-HANDLER, a procedure string1
string2 seed.
The first string1 argument to STR-HANDLER never contains a newline.
The second string2 argument often will. On the first invocation of
the STR-HANDLER, the seed is the one passed to ssax:read-cdata-body
as the third argument. The result of this first invocation will be
passed as the seed argument to the second invocation of the line
consumer, and so on. The result of the last invocation of the
STR-HANDLER is returned by the ssax:read-cdata-body
. Note a
similarity to the fundamental ’fold
’ iterator.
Within a CDATA
section all characters are taken at their face value,
with only three exceptions:
CR
, LF
, and CRLF
are treated as line delimiters, and passed
as a single #\newline
to the STR-HANDLER.
]]>
" combination is the end of the CDATA
section.
>
is treated as an embedded #\>
character.
Note, <
and &
are not specially recognized (and are not expanded)!
{sxml.ssax}
[66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'
This procedure must be called after we we have read "&#
"
that introduces a char reference.
The procedure reads this reference and returns the corresponding char.
The current position in port will be after ";
" that terminates
the char reference.
Faults detected: WFC: XML-Spec.html#wf-Legalchar
.
According to Section "4.1 Character and Entity References" of the XML Recommendation:
"[Definition: A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices.]"
Therefore, we use a ucscode->char
function to convert a character
code into the character – regardless of the current character
encoding of the input stream.
{sxml.ssax} Expand and handle a parsed-entity reference
The result is the one returned by content-handler or str-handler.
Faults detected:
WFC: XML-Spec.html#wf-entdeclared WFC: XML-Spec.html#norecursion
{sxml.ssax}
This procedure reads and parses a production Attribute*
[41] Attribute ::= Name Eq AttValue [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" [25] Eq ::= S? '=' S?
The procedure returns an ATTLIST, of Name (as UNRES-NAME), Value (as string) pairs. The current character on the port is a non-whitespace character that is not an ncname-starting character.
Note the following rules to keep in mind when reading an ’AttValue’ "Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize it as follows:
#x20
, #xD
, #xA
, #x9
) is processed by appending #x20
to the normalized value, except that only a single #x20
is appended for a
"#xD#xA
" sequence that is part of an external parsed entity or the
literal entity value of an internal parsed entity
Faults detected:
WFC: XML-Spec.html#CleanAttrVals WFC: XML-Spec.html#uniqattspec
{sxml.ssax} Convert an unres-name to a res-name given the appropriate namespaces declarations. The last parameter apply-default-ns? determines if the default namespace applies (for instance, it does not for attribute names)
Per REC-xml-names/#nsc-NSDeclared
, "xml" prefix is considered pre-declared
and bound to the namespace name "http://www.w3.org/XML/1998/namespace".
This procedure tests for the namespace constraints: http://www.w3.org/TR/REC-xml-names/#nsc-NSDeclared.
{sxml.ssax} Convert a uri-str to an appropriate symbol.
{sxml.ssax}
This procedure is to complete parsing of a start-tag markup. The
procedure must be called after the start tag token has been
read. Tag is an UNRES-NAME.
Elem
s is an instance of xml-decl::elems
;
it can be #f
to tell the function to do no validation of elements
and their attributes.
This procedure returns several values:
a RES-NAME.
element’s attributes, an ATTLIST of (res-name . string)
pairs. The list does not include xmlns
attributes.
the input list of namespaces amended with namespace (re-)declarations contained within the start-tag under parsing ELEM-CONTENT-MODEL.
On exit, the current position in port will be the first character after
#\>
that terminates the start-tag markup.
Faults detected:
VC: XML-Spec.html#enum VC: XML-Spec.html#RequiredAttr VC: XML-Spec.html#FixedAttr VC: XML-Spec.html#ValueType WFC: XML-Spec.html#uniqattspec (after namespaces prefixes are resolved) VC: XML-Spec.html#elementvalid WFC: REC-xml-names/#dt-NSName
Note, although XML Recommendation does not explicitly say it, xmlns and xmlns: attributes don’t have to be declared (although they can be declared, to specify their default value).
{sxml.ssax}
This procedure parses an ExternalID
production.
[75] ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral [11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'") [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" [13] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]
This procedure is supposed to be called when an ExternalID
is expected;
that is, the current character must be either #\S
or #\P
that start
correspondingly a SYSTEM
or PUBLIC
token. This procedure returns the
SystemLiteral
as a string. A PubidLiteral
is disregarded if present.
Next: SSAX Highest-level parsers - XML to SXML, Previous: SSAX low-level parsing code, Up: sxml.ssax
- Functional XML parser [Contents][Index]
They parse productions corresponding to the whole (document) entity or its higher-level pieces (prolog, root element, etc).
{sxml.ssax}
Scan the Misc
production in the context:
[1] document ::= prolog element Misc* [22] prolog ::= XMLDecl? Misc* (doctypedec l Misc*)? [27] Misc ::= Comment | PI | S
The following function should be called in the prolog or epilog contexts.
In these contexts, whitespaces are completely ignored.
The return value from ssax:scan-Misc
is either a PI
-token,
a DECL
-token, a START
token, or EOF.
Comments are ignored and not reported.
{sxml.ssax} This procedure is to read the character content of an XML document or an XML element.
[43] content ::= (element | CharData | Reference | CDSect | PI | Comment)*
To be more precise, the procedure reads CharData
, expands CDSect
and character entities, and skips comments. The procedure stops
at a named reference, EOF, at the beginning of a PI
or a start/end tag.
a port to read
a boolean indicating if EOF is normal, i.e., the character data may be terminated by the EOF. EOF is normal while processing a parsed entity.
a STR-HANDLER.
an argument passed to the first invocation of STR-HANDLER.
The procedure returns two results: seed and token.
The seed is the result of the last invocation of str-handler, or the original seed if str-handler was never called.
Token can be either an eof-object (this can happen only if
expect-eof? was #t
), or:
PI
. It’s up to an
application to read or skip through the rest of this PI
;
CDATA
sections and character references are expanded inline and
never returned. Comments are silently disregarded.
As the XML Recommendation requires, all whitespace in character data
must be preserved. However, a CR
character (#xD
) must be disregarded
if it appears before a LF
character (#xA
), or replaced by a #xA
character
otherwise. See Secs. 2.10 and 2.11 of the XML Recommendation. See also
the canonical XML Recommendation.
{sxml.ssax} Make sure that token is of anticipated kind and has anticipated gi. Note gi argument may actually be a pair of two symbols, Namespace URI or the prefix, and of the localname. If the assertion fails, error-cont is evaluated by passing it three arguments: token kind gi. The result of error-cont is returned.
Previous: SSAX higher-level parsers and scanners, Up: sxml.ssax
- Functional XML parser [Contents][Index]
These parsers are a set of syntactic forms to instantiate a SSAX parser.
A user can instantiate the parser to do the full validation, or
no validation, or any particular validation. The user specifies
which PI
he wants to be notified about. The user tells what to do
with the parsed character and element data. The latter handlers
determine if the parsing follows a SAX or a DOM model.
{sxml.ssax}
Create a parser to parse and process one Processing Element (PI
).
My-pi-handlers:
An assoc list of pairs (PI-TAG . PI-HANDLER)
where PI-TAG is an NCName symbol, the PI
target, and
PI-HANDLER is a procedure port pi-tag seed
where port points to the first symbol after the PI
target.
The handler should read the rest of the PI
up to and including
the combination ’?>
’ that terminates the PI
. The handler should
return a new seed.
One of the PI-TAGs may be a symbol *DEFAULT*
. The corresponding
handler will handle PI
s that no other handler will. If the
*DEFAULT*
PI-TAG is not specified,
ssax:make-pi-parser
will make
one, which skips the body of the PI
.
The output of the ssax:make-pi-parser
is a procedure
port pi-tag seed,
that will parse the current PI
accoding to user-specified handlers.
{sxml.ssax} Create a parser to parse and process one element, including its character content or children elements. The parser is typically applied to the root element of a document.
procedure elem-gi attributes namespaces expected-content seed
where elem-gi is a RES-NAME of the element
about to be processed.
This procedure is to generate the seed to be passed
to handlers that process the content of the element.
procedure elem-gi attributes namespaces parent-seed seed
This procedure is called when parsing of elem-gi is finished.
The seed is the result from the last content parser (or
from my-new-level-seed if the element has the empty content).
Parent-seed is the same seed as was passed to my-new-level-seed.
The procedure is to generate a seed that will be the result
of the element parser.
A STR-HANDLER.
See ssax:make-pi-handler
above.
The generated parser is a:
procedure start-tag-head port elems entities
namespaces preserve-ws? seed.
The procedure must be called after the start tag token has been
read. Start-tag-head is an UNRES-NAME from the start-element tag.
Elems is an instance of xml-decl::elems
.
See ssax:complete-start-tag::preserve-ws?
Faults detected:
VC: XML-Spec.html#elementvalid WFC: XML-Spec.html#GIMatch
{sxml.ssax} Create an XML parser, an instance of the XML parsing framework. This will be a SAX, a DOM, or a specialized parser depending on the supplied user-handlers.
user-handler-tag is a symbol that identifies a procedural expression that follows the tag. Given below are tags and signatures of the corresponding procedures. Not all tags have to be specified. If some are omitted, reasonable defaults will apply.
tag: DOCTYPE
handler-procedure: port docname systemid internal-subset? seed
If internal-subset? is #t
, the current position in the port
is right after we have read #\[
that begins the internal DTD subset.
We must finish reading of this subset before we return
(or must call skip-internal-subset if we aren’t interested in reading it).
The port at exit must be at the first symbol after the whole
DOCTYPE declaration.
The handler-procedure must generate four values:
elems entities namespaces seed
See xml-decl::elems
for elems.
It may be #f
to switch off the validation.
namespaces will typically contain USER-PREFIXes for selected URI-SYMBs.
The default handler-procedure skips the internal subset,
if any, and returns (values #f '() '() seed)
.
tag: UNDECL-ROOT
handler-procedure: elem-gi seed
where elem-gi is an UNRES-NAME of the root element. This procedure
is called when an XML document under parsing contains no DOCTYPE
declaration.
The handler-procedure, as a DOCTYPE handler procedure above,
must generate four values:
elems entities namespaces seed
The default handler-procedure returns (values #f '() '() seed)
.
tag: DECL-ROOT
handler-procedure: elem-gi seed
where elem-gi is an UNRES-NAME of the root element. This procedure
is called when an XML document under parsing does contains the DOCTYPE
declaration.
The handler-procedure must generate a new seed
(and verify
that the name of the root element matches the doctype, if the handler
so wishes).
The default handler-procedure is the identity function.
tag: NEW-LEVEL-SEED
handler-procedure: see ssax:make-elem-parser
, my-new-level-seed
tag: FINISH-ELEMENT
handler-procedure: see ssax:make-elem-parser
, my-finish-element
tag: CHAR-DATA-HANDLER
handler-procedure: see ssax:make-elem-parser
, my-char-data-handler
tag: PI
handler-procedure: see ssax:make-pi-parser
.
The default value is '()
.
The generated parser is a
procedure PORT SEED
This procedure parses the document prolog and then exits to an element parser (created by ssax:make-elem-parser) to handle the rest.
[1] document ::= prolog element Misc* [22] prolog ::= XMLDecl? Misc* (doctypedec | Misc*)? [27] Misc ::= Comment | PI | S [28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' (markupdecl | PEReference | S)* ']' S?)? '>' [29] markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment
A few utility procedures that turned out useful.
{sxml.ssax} given the list of fragments (some of which are text strings) reverse the list and concatenate adjacent text strings.
{sxml.ssax}
given the list of fragments (some of which are text strings)
reverse the list and concatenate adjacent text strings.
We also drop "unsignificant" whitespace, that is, whitespace
in front, behind and between elements. The whitespace that
is included in character data is not affected.
We use this procedure to "intelligently" drop "insignificant"
whitespace in the parsed SXML. If the strict compliance with
the XML Recommendation regarding the whitespace is desired, please
use the ssax:reverse-collect-str
procedure instead.
{sxml.ssax}
This is an instance of a SSAX parser above that returns an SXML
representation of the XML document to be read from port.
Namespace-prefix-assig is a list of
(USER-PREFIX . URI-STRING)
that assigns USER-PREFIXes to certain namespaces identified by
particular URI-STRINGs. It may be an empty list.
The procedure returns an SXML tree. The port points out to the
first character after the root element.
Here’s a simple example:
(call-with-input-string "<book> <title>Land of Lisp</title> <author>Conrad Barski</author> <publisher>No Starch Press</publisher> </book> <book> <title>Programming Gauche</title> <author>Kahua Project</author> <author>Shiro Kawai</author> <publisher>O'Reilly Japan</publisher> </book>" (^p (ssax:xml->sxml p '()))) ⇒ (*TOP* (book (title "Land of Lisp") (author "Conrad Barski") (publisher "No Starch Press")) (book (title "Programming Gauche") (author "Kahua Project") (author "Shiro Kawai") (publisher "O'Reilly Japan")))
The entire document is put in a pseudo node *TOP*
, since
the document may contain more than one toplevel nodes.
The *TOP*
node can also keep the meta information.
In the following example, the XML declaration is kept under
a pseudo node *PI*
(processing instructions).
(call-with-input-string "<?xml version=\"1.0\" encoding=\"utf-8\"?> <book> <title>Programming Gauche</title> <author>Kahua Project</author> <author>Shiro Kawai</author> <publisher>O'Reilly Japan</publisher> </book>" (^p (ssax:xml->sxml p '()))) ⇒ (*TOP* (*PI* xml "version=\"1.0\" encoding=\"utf-8\"") (book (title "Programming Gauche") (author "Kahua Project") (author "Shiro Kawai") (publisher "O'Reilly Japan")))
Namespaces are recognized, and their aliases are fully expanded to the URI by default:
(call-with-input-string "<b:book xmlns:b=\"https://example.com/book/\"> <b:title>Programming Gauche</b:title> <b:author>Kahua Project</b:author> <b:author>Shiro Kawai</b:author> <b:publisher>O'Reilly Japan</b:publisher> </b:book>" (^p (ssax:xml->sxml p '()))) ⇒ (*TOP* (https://example.com/book/:book (https://example.com/book/:title "Programming Gauche") (https://example.com/book/:author "Kahua Project") (https://example.com/book/:author "Shiro Kawai") (https://example.com/book/:publisher "O'Reilly Japan")))
(This is because namespace aliases can have nested scopes, so just keeping aliases loses information)
However, tags with fully expanded namespace prefix is cumbersome. You can provide your own namespace aliases for more compact output with namespace-prefix-assig argument.
(call-with-input-string "<b:book xmlns:b=\"https://example.com/book/\"> <b:title>Programming Gauche</b:title> <b:author>Kahua Project</b:author> <b:author>Shiro Kawai</b:author> <b:publisher>O'Reilly Japan</b:publisher> </b:book>" (^p (ssax:xml->sxml p '((Book . "https://example.com/book/"))))) ⇒ (*TOP* (@ (*NAMESPACES* (Book "https://example.com/book/"))) (Book:book (Book:title "Programming Gauche") (Book:author "Kahua Project") (Book:author "Shiro Kawai") (Book:publisher "O'Reilly Japan")))
Next: sxml.sxpath
- SXML query language, Previous: slib
- SLIB interface, Up: Library modules - Utilities [Contents][Index]