For Gauche 0.9.5


Next: , Previous: , Up: Library modules - Utilities   [Contents][Index]

12.45 sxml.sxpath - SXML Query Language

Module: sxml.sxpath

SXPath is a query language for SXML, an instance of XML Information set (Infoset) in the form of s-expressions.

It is originally written by Oleg Kiselyov, and improved by Dmitry Lizorkin and Kirill Lisovsky. This module also incorporates various procedures written for SXPath by Dmitry Lizorkin and Kirill Lisovsky.

Current version is based on sxpathlib.scm,v 3.915, sxpath.scm,v 1.1, and sxpath-ext.scm,v 1.911.

This manual is mostly derived from the comments in the original source files.

The module consists of three layers.

  1. Basic converters and applicators, which provides the means to access and translate SXML tree.
  2. High-level query language compiler, which takes abbreviated SXPath and returns a Scheme function that selects a nodeset that satisfies the specified path from the given nodeset.
  3. Extension libraries, which implements SXML counterparts to W3C XPath Core Functions Library.

Next: , Previous: , Up: SXML Query Language   [Contents][Index]

12.45.1 SXPath basic converters and applicators

A converter is a function

  type Converter = Node|Nodeset -> Nodeset

A converter can also play a role of a predicate: in that case, if a converter, applied to a node or a nodeset, yields a non-empty nodeset, the converter-predicate is deemed satisfied. Throughout this file a nil nodeset is equivalent to #f in denoting a failure.

Function: nodeset? x

Returns #t if given object is a nodeset.

Function: as-nodeset x

If x is a nodeset - returns it as is, otherwise wrap it in a list.

Function: sxml:element? obj

Predicate which returns #t if obj is SXML element, otherwise returns #f.

Function: ntype-names?? crit

The function ntype-names?? takes a list of acceptable node names as a criterion and returns a function, which, when applied to a node, will return #t if the node name is present in criterion list and #f otherwise.

 ntype-names?? :: ListOfNames -> Node -> Boolean
Function: ntype?? crit

The function ntype?? takes a type criterion and returns a function, which, when applied to a node, will tell if the node satisfies the test.

  ntype?? :: Crit -> Node -> Boolean

The criterion crit is one of the following symbols:

id

tests if the Node has the right name (id)

@

tests if the Node is an attributes-list.

*

tests if the Node is an Element.

*text*

tests if the Node is a text node.

*data*

tests if the Node is a data node (text, number, boolean, etc., but not pair).

*PI*

tests if the Node is a PI node.

*COMMENT*

tests if the Node is a COMMENT node.

*ENTITY*

tests if the Node is a ENTITY node.

*any*

#t for any type of Node.

Function: ntype-namespace-id?? ns-id

This function takes a namespace-id, and returns a predicate Node -> Boolean, which is #t for nodes with this very namespace-id. ns-id is a string. (ntype-namespace-id?? #f) will be #t for nodes with non-qualified names.

Function: sxml:invert pred

This function takes a predicate and returns it inverted . That is if the given predicate yields #f or ’() the inverted one yields the given node (#t) and vice versa.

Function: node-eq? other
Function: node-equal? other

Curried equivalence converter-predicates, i.e.

  ((node-eq? a) b)    ≡ (eq? a b)
  ((node-equal? a) b) ≡ (equal? a b)
Function: node-pos n
 node-pos:: N -> Nodeset -> Nodeset, or
 node-pos:: N -> Converter

Select the N’th element of a Nodeset and return as a singular Nodeset; Return an empty nodeset if the Nth element does not exist. ((node-pos 1) Nodeset) selects the node at the head of the Nodeset, if exists; ((node-pos 2) Nodeset) selects the Node after that, if exists. N can also be a negative number: in that case the node is picked from the tail of the list. ((node-pos -1) Nodeset) selects the last node of a non-empty nodeset; ((node-pos -2) Nodeset) selects the last but one node, if exists.

Function: sxml:filter pred?
 filter:: Converter -> Converter

A filter applicator, which introduces a filtering context. The argument converter is considered a predicate, with either #f or nil result meaning failure.

Function: take-until pred?
 take-until:: Converter -> Converter, or
 take-until:: Pred -> Node|Nodeset -> Nodeset

Given a converter-predicate and a nodeset, apply the predicate to each element of the nodeset, until the predicate yields anything but #f or nil. Return the elements of the input nodeset that have been processed till that moment (that is, which fail the predicate). take-until is a variation of the filter above: take-until passes elements of an ordered input set till (but not including) the first element that satisfies the predicate. The nodeset returned by ((take-until (not pred)) nset) is a subset – to be more precise, a prefix – of the nodeset returned by ((filter pred) nset).

Function: take-after pred?
take-after:: Converter -> Converter, or
take-after:: Pred -> Node|Nodeset -> Nodeset

Given a converter-predicate and a nodeset, apply the predicate to each element of the nodeset, until the predicate yields anything but #f or nil. Return the elements of the input nodeset that have not been processed: that is, return the elements of the input nodeset that follow the first element that satisfied the predicate. take-after along with take-until partition an input nodeset into three parts: the first element that satisfies a predicate, all preceding elements and all following elements.

Function: map-union proc lst

Apply proc to each element of lst and return the list of results. If proc returns a nodeset, splice it into the result.

From another point of view, map-union is a function Converter->Converter, which places an argument-converter in a joining context.

Function: node-reverse node-or-nodeset
node-reverse :: Converter, or
node-reverse:: Node|Nodeset -> Nodeset

Reverses the order of nodes in the nodeset. This basic converter is needed to implement a reverse document order (see the XPath Recommendation).

Function: node-trace title
 node-trace:: String -> Converter

(node-trace title) is an identity converter. In addition it prints out a node or nodeset it is applied to, prefixed with the ’title’. This converter is very useful for debugging.

What follow are Converter combinators, higher-order functions that transmogrify a converter or glue a sequence of converters into a single, non-trivial converter. The goal is to arrive at converters that correspond to XPath location paths.

From a different point of view, a combinator is a fixed, named pattern of applying converters. Given below is a complete set of such patterns that together implement XPath location path specification. As it turns out, all these combinators can be built from a small number of basic blocks: regular functional composition, map-union and filter applicators, and the nodeset union.

Function: select-kids test-pred?
select-kids:: Pred -> Node -> Nodeset

Given a Node, return an (ordered) subset its children that satisfy the Pred (a converter, actually).

select-kids:: Pred -> Nodeset -> Nodeset

The same as above, but select among children of all the nodes in the Nodeset.

Function: node-self pred
 node-self:: Pred -> Node -> Nodeset, or
 node-self:: Converter -> Converter

Similar to select-kids but apply to the Node itself rather than to its children. The resulting Nodeset will contain either one component, or will be empty (if the Node failed the Pred).

Function: node-join . selectors
 node-join:: [LocPath] -> Node|Nodeset -> Nodeset, or
 node-join:: [Converter] -> Converter

join the sequence of location steps or paths as described in the title comments above.

Function: node-reduce . converters
 node-reduce:: [LocPath] -> Node|Nodeset -> Nodeset, or
 node-reduce:: [Converter] -> Converter

A regular functional composition of converters. From a different point of view, ((apply node-reduce converters) nodeset) is equivalent to (foldl apply nodeset converters) i.e., folding, or reducing, a list of converters with the nodeset as a seed.

Function: node-or . converters
 node-or:: [Converter] -> Converter

This combinator applies all converters to a given node and produces the union of their results. This combinator corresponds to a union, ’|’ operation for XPath location paths.

Function: node-closure test-pred?
 node-closure:: Converter -> Converter

Select all descendants of a node that satisfy a converter-predicate. This combinator is similar to select-kids but applies to grand... children as well. This combinator implements the "descendant::" XPath axis. Conceptually, this combinator can be expressed as

 (define (node-closure f)
      (node-or
        (select-kids f)
	 (node-reduce (select-kids (ntype?? '*)) (node-closure f))))

This definition, as written, looks somewhat like a fixpoint, and it will run forever. It is obvious however that sooner or later (select-kids (ntype?? '*)) will return an empty nodeset. At this point further iterations will no longer affect the result and can be stopped.


Next: , Previous: , Up: SXML Query Language   [Contents][Index]

12.45.2 SXPath query language

Function: sxpath abbrpath . ns-binding

Evaluates an abbreviated SXPath

 sxpath:: AbbrPath -> Converter, or
 sxpath:: AbbrPath -> Node|Nodeset -> Nodeset

AbbrPath is a list. It is translated to the full SXPath according to the following rewriting rules:

 (sxpath '()) -> (node-join)
 (sxpath '(path-component ...)) ->
                (node-join (sxpath1 path-component) (sxpath '(...)))
 (sxpath1 '//) -> (node-or
                     (node-self (ntype?? '*any*))
                     (node-closure (ntype?? '*any*)))
 (sxpath1 '(equal? x)) -> (select-kids (node-equal? x))
 (sxpath1 '(eq? x))    -> (select-kids (node-eq? x))
 (sxpath1 '(or@ ...))  -> (select-kids (ntype-names??
                                          (cdr '(or@ ...))))
 (sxpath1 '(not@ ...)) -> (select-kids (sxml:invert
                                         (ntype-names??
                                          (cdr '(not@ ...)))))
 (sxpath1 '(ns-id:* x)) -> (select-kids
                                      (ntype-namespace-id?? x))
 (sxpath1 ?symbol)     -> (select-kids (ntype?? ?symbol))
 (sxpath1 ?string)     -> (txpath ?string)
 (sxpath1 procedure)   -> procedure
 (sxpath1 '(?symbol ...)) -> (sxpath1 '((?symbol) ...))
 (sxpath1 '(path reducer ...)) ->
                (node-reduce (sxpath path) (sxpathr reducer) ...)
 (sxpathr number)      -> (node-pos number)
 (sxpathr path-filter) -> (filter (sxpath path-filter))

Some wrapper functions around sxpath:

Function: if-sxpath path

sxpath always returns a list, which is #t in Scheme. if-sxpath returns #f instead of empty list.

Function: if-car-sxpath path

Returns first node found, if any. Otherwise returns #f.

Function: car-sxpath path

Returns first node found, if any. Otherwise returns empty list.

Function: sxml:id-alist node . lpaths

Built an index as a list of (ID_value . element) pairs for given node. lpaths are location paths for attributes of type ID.


Previous: , Up: SXML Query Language   [Contents][Index]

12.45.3 SXPath extension

SXML counterparts to W3C XPath Core Functions Library.

Function: sxml:string object

The counterpart to XPath string function (section 4.2 XPath Rec.) Converts a given object to a string. NOTE:

  1. When converting a nodeset - a document order is not preserved
  2. number->string function returns the result in a form which is slightly different from XPath Rec. specification
Function: sxml:boolean object

The counterpart to XPath boolean function (section 4.3 XPath Rec.) Converts its argument to a boolean.

Function: sxml:number obj

The counterpart to XPath number function (section 4.4 XPath Rec.) Converts its argument to a number NOTE:

  1. The argument is not optional (yet?).
  2. string->number conversion is not IEEE 754 round-to-nearest.
  3. NaN is represented as 0.
Function: sxml:string-value node

Returns a string value for a given node in accordance to XPath Rec. 5.1 - 5.7

Function: sxml:node? node

According to XPath specification 2.3, this test is true for any XPath node. For SXML auxiliary lists and lists of attributes has to be excluded.

Function: sxml:attr-list obj

Returns the list of attributes for a given SXML node. Empty list is returned if the given node is not an element, or if it has no list of attributes

Function: sxml:id id-index

Select SXML element by its unique IDs. (XPath Rec. 4.1) Returns a converter that takes object, which is a nodeset or a datatype which can be converted to a string by means of a ’string’ function.

id-index is ( (id-value . element) (id-value . element) ... ).

This index is used for selection of an element by its unique ID.

Comparators for XPath objects:

Function: sxml:equality-cmp bool-op number-op string-op

A helper for XPath equality operations: = , != bool-op, number-op and ’string-op are comparison operations for a pair of booleans, numbers and strings respectively.

Function: sxml:equal? a b
Function: sxml:not-equal? a b

Counterparts of XPath equality operations: = , !=, using default equality tests.

Function: sxml:relational-cmp op

Creates a relational operation ( < , > , <= , >= ) for two XPath objects. op is comparison procedure: < , > , <= or >=.

XPath axises. An order in resulting nodeset is preserved.

Function: sxml:attribute test-pred?

Attribute axis.

Function: sxml:child test-pred?

Child axis. This function is similar to ’select-kids’, but it returns an empty child-list for PI, Comment and Entity nodes.

Function: sxml:parent test-pred?

Parent axis.

Given a predicate, it returns a function RootNode -> Converter which yields a node -> parent converter then applied to a rootnode.

Thus, such a converter may be constructed using ((sxml:parent test-pred) rootnode) and returns a parent of a node it is applied to. If applied to a nodeset, it returns the list of parents of nodes in the nodeset. The rootnode does not have to be the root node of the whole SXML tree – it may be a root node of a branch of interest. The parent:: axis can be used with any SXML node.

Function: sxml:ancestor test-pred?

Ancestor axis

Function: sxml:ancestor-or-self test-pred?

Ancestor-or-self axis

Function: sxml:descendant test-pred?

Descendant axis

Function: sxml:descendant-or-self test-pred?

Descendant-or-self axis

Function: sxml:following test-pred?

Following axis

Function: sxml:following-sibling test-pred?

Following-sibling axis

Function: sxml:namespace test-pred?

Namespace axis

Function: sxml:preceding test-pred?

Preceding axis

Function: sxml:preceding-sibling test-pred?

Preceding-sibling axis

Popular shortcuts:

Function: sxml:child-nodes nodeset
((sxml:child sxml:node?) nodeset)
Function: sxml:child-elements nodeset
((select-kids sxml:element?) nodeset)

Previous: , Up: SXML Query Language   [Contents][Index]