sxml.sxpath
- SXML query language ¶SXPath is a query language for SXML, an instance of XML Information set (Infoset) in the form of s-expressions.
It is originally written by Oleg Kiselyov, and improved by Dmitry Lizorkin and Kirill Lisovsky. This module also incorporates various procedures written for SXPath by Dmitry Lizorkin and Kirill Lisovsky.
Current version is based on sxpathlib.scm,v 3.915, sxpath.scm,v 1.1, and sxpath-ext.scm,v 1.911.
This manual is mostly derived from the comments in the original source files.
The module consists of three layers.
• SXPath basic converters and applicators: | ||
• SXPath query language: | ||
• SXPath extension: |
A converter is a function
type Converter = Node|Nodeset -> Nodeset
A converter can also play a role of a predicate: in that case, if a
converter, applied to a node or a nodeset, yields a non-empty
nodeset, the converter-predicate is deemed satisfied. Throughout
this file a nil nodeset is equivalent to #f
in denoting a failure.
{sxml.sxpath
}
Returns #t
if given object is a nodeset.
{sxml.sxpath
}
If x is a nodeset - returns it as is, otherwise wrap it in a list.
{sxml.sxpath
}
Predicate which returns #t
if obj is SXML element, otherwise returns #f
.
{sxml.sxpath
}
The function ntype-names??
takes a list of acceptable node names as a
criterion and returns a function, which, when applied to a node,
will return #t
if the node name is present in criterion list
and #f
otherwise.
ntype-names?? :: ListOfNames -> Node -> Boolean
{sxml.sxpath
}
The function ntype??
takes a type criterion and returns
a function, which, when applied to a node, will tell if the node satisfies
the test.
ntype?? :: Crit -> Node -> Boolean
The criterion crit is one of the following symbols:
id
tests if the Node has the right name (id)
@
tests if the Node is an attributes-list.
*
tests if the Node is an Element.
*text*
tests if the Node is a text node.
*data*
tests if the Node is a data node (text, number, boolean, etc., but not pair).
*PI*
tests if the Node is a PI
node.
*COMMENT*
tests if the Node is a COMMENT
node.
*ENTITY*
tests if the Node is a ENTITY
node.
*any*
#t
for any type of Node.
{sxml.sxpath
}
This function takes a namespace-id, and returns a predicate
Node -> Boolean
, which is #t
for nodes with this very namespace-id.
ns-id is a string.
(ntype-namespace-id?? #f)
will be #t
for nodes with non-qualified names.
{sxml.sxpath
}
This function takes a predicate and returns it inverted .
That is if the given predicate yields #f
or ’()
the inverted one
yields the given node (#t
) and vice versa.
{sxml.sxpath
}
Curried equivalence converter-predicates, i.e.
((node-eq? a) b) ≡ (eq? a b) ((node-equal? a) b) ≡ (equal? a b)
{sxml.sxpath
}
node-pos:: N -> Nodeset -> Nodeset, or node-pos:: N -> Converter
Select the N’th element of a Nodeset and return as a singular Nodeset;
Return an empty nodeset if the Nth element does not exist.
((node-pos 1) Nodeset)
selects the node at the head of the Nodeset,
if exists;
((node-pos 2) Nodeset)
selects the Node after that, if
exists.
N can also be a negative number: in that case the node is picked from
the tail of the list.
((node-pos -1) Nodeset)
selects the last node of a non-empty nodeset;
((node-pos -2) Nodeset)
selects the last but one node, if exists.
{sxml.sxpath
}
filter:: Converter -> Converter
A filter applicator, which introduces a filtering context. The argument
converter is considered a predicate, with either #f
or nil result meaning failure.
{sxml.sxpath
}
take-until:: Converter -> Converter, or take-until:: Pred -> Node|Nodeset -> Nodeset
Given a converter-predicate and a nodeset, apply the predicate to
each element of the nodeset, until the predicate yields anything but
#f
or nil. Return the elements of the input nodeset that have
been processed
till that moment (that is, which fail the predicate).
take-until
is a variation of the filter above:
take-until
passes
elements of an ordered input set till (but not including) the first
element that satisfies the predicate.
The nodeset returned by ((take-until (not pred)) nset)
is a subset –
to be more precise, a prefix – of the nodeset returned by
((filter pred) nset)
.
{sxml.sxpath
}
take-after:: Converter -> Converter, or take-after:: Pred -> Node|Nodeset -> Nodeset
Given a converter-predicate and a nodeset, apply the predicate to
each element of the nodeset, until the predicate yields anything but
#f
or
nil. Return the elements of the input nodeset that have not been processed:
that is, return the elements of the input nodeset that follow the first
element that satisfied the predicate.
take-after
along with take-until
partition an input nodeset into three
parts: the first element that satisfies a predicate, all preceding
elements and all following elements.
{sxml.sxpath
}
Apply proc to each element of lst and return the list of results.
If proc returns a nodeset, splice it into the result.
From another point of view,
map-union
is a function Converter->Converter,
which places an argument-converter in a joining context.
{sxml.sxpath
}
node-reverse :: Converter, or node-reverse:: Node|Nodeset -> Nodeset
Reverses the order of nodes in the nodeset. This basic converter is needed to implement a reverse document order (see the XPath Recommendation).
{sxml.sxpath
}
node-trace:: String -> Converter
(node-trace title)
is an identity converter. In addition it prints out
a node or nodeset it is applied to, prefixed with the ’title’.
This converter is very useful for debugging.
What follow are Converter combinators, higher-order functions that transmogrify a converter or glue a sequence of converters into a single, non-trivial converter. The goal is to arrive at converters that correspond to XPath location paths.
From a different point of view, a combinator is a fixed, named pattern of applying converters. Given below is a complete set of such patterns that together implement XPath location path specification. As it turns out, all these combinators can be built from a small number of basic blocks: regular functional composition, map-union and filter applicators, and the nodeset union.
{sxml.sxpath
}
select-kids:: Pred -> Node -> Nodeset
Given a Node, return an (ordered) subset its children that satisfy the Pred (a converter, actually).
select-kids:: Pred -> Nodeset -> Nodeset
The same as above, but select among children of all the nodes in the Nodeset.
{sxml.sxpath
}
node-self:: Pred -> Node -> Nodeset, or node-self:: Converter -> Converter
Similar to select-kids but apply to the Node itself rather than to its children. The resulting Nodeset will contain either one component, or will be empty (if the Node failed the Pred).
{sxml.sxpath
}
node-join:: [LocPath] -> Node|Nodeset -> Nodeset, or node-join:: [Converter] -> Converter
join the sequence of location steps or paths as described in the title comments above.
{sxml.sxpath
}
node-reduce:: [LocPath] -> Node|Nodeset -> Nodeset, or node-reduce:: [Converter] -> Converter
A regular functional composition of converters.
From a different point of view,
((apply node-reduce converters) nodeset)
is equivalent to
(foldl apply nodeset converters)
i.e., folding, or reducing, a list of converters with the nodeset
as a seed.
{sxml.sxpath
}
node-or:: [Converter] -> Converter
This combinator applies all converters to a given node and
produces the union of their results.
This combinator corresponds to a union, ’|
’ operation for XPath
location paths.
{sxml.sxpath
}
node-closure:: Converter -> Converter
Select all descendants of a node that satisfy a converter-predicate.
This combinator is similar to select-kids
but applies to
grand... children as well.
This combinator implements the "descendant::
" XPath axis.
Conceptually, this combinator can be expressed as
(define (node-closure f) (node-or (select-kids f) (node-reduce (select-kids (ntype?? '*)) (node-closure f))))
This definition, as written, looks somewhat like a fixpoint, and it
will run forever. It is obvious however that sooner or later
(select-kids (ntype?? '*))
will return an empty nodeset. At
this point further iterations will no longer affect the result and
can be stopped.
{sxml.sxpath
}
Evaluates an abbreviated SXPath. Returns a procedure that when applied
on a node or nodeset will return a nodeset matching the given path.
sxpath:: AbbrPath -> Converter, or sxpath:: AbbrPath -> Node|Nodeset -> Nodeset
AbbrPath is a list or a string. If it is a list, it is translated to the full SXPath according to the following rewriting rules. More informal explanation follows shortly. If it is a string, it is an XPath query.
Note that these are abstract rules to show how it works, and not the running code examples. The nonterminals sxpath1 and sxpathr don’t exist as APIs. The term txpath is an internal function that interprets XPath query given as a string.
(sxpath '()) -> (node-join) (sxpath '(path-component ...)) -> (node-join (sxpath1 path-component) (sxpath '(...))) (sxpath1 '//) -> (node-or (node-self (ntype?? '*any*)) (node-closure (ntype?? '*any*))) (sxpath1 '(equal? x)) -> (select-kids (node-equal? x)) (sxpath1 '(eq? x)) -> (select-kids (node-eq? x)) (sxpath1 '(or@ ...)) -> (select-kids (ntype-names?? (cdr '(or@ ...)))) (sxpath1 '(not@ ...)) -> (select-kids (sxml:invert (ntype-names?? (cdr '(not@ ...))))) (sxpath1 '(ns-id:* x)) -> (select-kids (ntype-namespace-id?? x)) (sxpath1 ?symbol) -> (select-kids (ntype?? ?symbol)) (sxpath1 ?string) -> (txpath ?string) (sxpath1 procedure) -> procedure (sxpath1 '(?symbol ...)) -> (sxpath1 '((?symbol) ...)) (sxpath1 '(path reducer ...)) -> (node-reduce (sxpath path) (sxpathr reducer) ...) (sxpathr number) -> (node-pos number) (sxpathr path-filter) -> (filter (sxpath path-filter))
SXPath in its simplest form is a list of path components. The result
procedure will follow the same path and return the matching node
list. For example (one two three)
will find element one
then two
inside it and three
inside element
two
. The equivalent XPath would be one/two/three
.
There are a few special path components (see ntype??
for the
complete list):
*
matches an element node.
//
matches any one or many consecutive path components.
*text*
matches a text node (text()
in XPath).
*data*
matches any data node (e.g. text, number, boolean, etc., but not pair).
@
selects the attribute list node.
A path component could be a list in one of these forms:
(equal? x)
matches if the node under examination matches x using
node-equal?
(eq? x)
matches if the node under examination matches x using
node-eq?
(or@ ...)
matches if the element name is one of the specified symbols.
(not@ ...)
matches if the element name is not one of the specified symbols.
(ns-id:* x)
matches the node if it’s with namespace x
(<path> n)
matches the n-th node matching same path component. n
starts from 1. Negative numbers start from the end of the node list
backward. This is path[n]
syntax in XPath.
(<path> (<predicate>...))
matches a path component path and (sxpath (<predicate>...))
on those nodes are not empty. This is path[predicate...]
syntax
in XPath.
If the path component is a string, it is interpreted as an XPath query string.
If the path component is a procedure, the procedure takes three arguments: the nodeset being examined, the root node and the variable bindings.
The root node is usually the entire sxml being applied. However if you apply the result sxpath procedure with two arguments, root-node will be the second argument.
When applied with three arguments, the variable bindings are the third one. This lets you pass arguments to the procedure.
;; select all <book> elements whose style attribute value is equal to ;; the <bookstore> element's specialty attribute value. (sxpath "//book[/bookstore/@specialty=@style]") ;; a similar query but this time make sure specialty of _all_ ;; bookstores is matched (let ([match-specialty (lambda (node root var-binding) (let ([style (car ((sxpath '(@ style *text*)) node))] [all-specialty ((sxpath '(bookstore @ specialty *text*)) node)]) (fold (lambda (specialty last-result) (and last-result (string=? style specialty))) #t all-specialty)))]) (sxpath `(// (book (,match-specialty))))) ;; select all <bookstore> elements that are inside top-level <book> ;; element (sxpath '(book bookstore)) ;; select all <bookstore> elements from anywhere (sxpath '(// bookstore)) ;; select attribute "name" in the top-level <book> element (sxpath '(book @ name)) ;; select all <bookstore> and <bookshop> elements that are inside ;; top-level <book> element (sxpath '(book (or@ bookstore bookshop))) ;; select all elements except <movie> that are inside top-level <book> ;; element (sxpath '(book (not@ movie @))) ;; select the attribute "name" of the second <bookstore> element (sxpath '(book (bookstore 2) @ name)) ;; select the attribute "name" of all <bookstore> elements that has ;; attribute "recommended" (sxpath '(book (bookstore (@ recommended)) @ name)) ;; select the attribute "name" of all <bookstore> elements whose ;; "rating" attribute is 3 (sxpath '(book (bookstore (@ rating (eq? 3))) @ name)) ;; select the attribute "rating" whose value is greater than 3 from ;; all <bookstore> elements (let ([greater (lambda (nodeset root-node var-binding) (filter (lambda (node) (> (string->number (sxml:string-value node)) 3)) nodeset))]) (sxpath `(book bookstore @ rating ,greater)))
Some wrapper functions around sxpath
:
{sxml.sxpath
}
sxpath
always returns a list, which is #t
in Scheme.
if-sxpath
returns #f
instead of empty list.
{sxml.sxpath
}
Returns first node found, if any.
Otherwise returns #f
.
{sxml.sxpath
}
Returns first node found, if any.
Otherwise returns empty list.
{sxml.sxpath
}
Built an index as a list of
(ID_value . element)
pairs for given
node. lpaths are location paths for attributes of type ID.
SXML counterparts to W3C XPath Core Functions Library.
{sxml.sxpath
}
The counterpart to XPath string
function (section 4.2 XPath Rec.)
Converts a given object to a string.
NOTE:
{sxml.sxpath
}
The counterpart to XPath boolean
function (section 4.3 XPath Rec.)
Converts its argument to a boolean.
{sxml.sxpath
}
The counterpart to XPath number
function (section 4.4 XPath Rec.)
Converts its argument to a number
NOTE:
string->number
conversion is not IEEE 754 round-to-nearest.
{sxml.sxpath
}
Returns a string value for a given node in accordance to
XPath Rec. 5.1 - 5.7
{sxml.sxpath
}
According to XPath specification 2.3, this test is true for any
XPath node.
For SXML auxiliary lists and lists of attributes has to be excluded.
{sxml.sxpath
}
Returns the list of attributes for a given SXML node.
Empty list is returned if the given node is not an element,
or if it has no list of attributes
{sxml.sxpath
}
Select SXML element by its unique IDs. (XPath Rec. 4.1)
Returns a converter that takes object,
which is a nodeset or a datatype which can be converted to a string by means
of a ’string
’ function.
id-index is ( (id-value . element) (id-value . element) ... )
.
This index is used for selection of an element by its unique ID.
Comparators for XPath objects:
{sxml.sxpath
}
A helper for XPath equality operations: =
, !=
bool-op, number-op and
’string-op are comparison operations for
a pair of booleans, numbers and strings respectively.
{sxml.sxpath
}
Counterparts of XPath equality operations: =
, !=
,
using default equality tests.
{sxml.sxpath
}
Creates a relational operation ( <
, >
, <=
, >=
)
for two XPath objects.
op
is comparison procedure: <
, >
, <=
or >=
.
XPath axises. An order in resulting nodeset is preserved.
{sxml.sxpath
}
Attribute axis.
{sxml.sxpath
}
Child axis.
This function is similar to ’select-kids
’, but it returns an empty
child-list for PI, Comment and Entity nodes.
{sxml.sxpath
}
Parent axis.
Given a predicate, it returns a function
RootNode -> Converter
which yields a
node -> parent
converter then applied to a rootnode.
Thus, such a converter may be constructed using
((sxml:parent test-pred) rootnode)
and returns a parent of a node it is applied to.
If applied to a nodeset, it returns the
list of parents of nodes in the nodeset. The rootnode does not have
to be the root node of the whole SXML tree – it may be a root node
of a branch of interest.
The parent::
axis can be used with any SXML node.
{sxml.sxpath
}
Ancestor axis
{sxml.sxpath
}
Ancestor-or-self axis
{sxml.sxpath
}
Descendant axis
{sxml.sxpath
}
Descendant-or-self axis
{sxml.sxpath
}
Following axis
{sxml.sxpath
}
Following-sibling axis
{sxml.sxpath
}
Namespace axis
{sxml.sxpath
}
Preceding axis
{sxml.sxpath
}
Preceding-sibling axis
Popular shortcuts:
{sxml.sxpath
}
((sxml:child sxml:node?) nodeset)
{sxml.sxpath
}
((select-kids sxml:element?) nodeset)