XML, XPath & XSLT - part 2

**Dormilich** · Sep 15 '08, 01:01 PM

4 XPath

4.1 XPath basics – walk the line
An XPath expression (Location Path) returns an object that may be of the type
• node-set (an unordered collection of nodes without duplicates)
• boolean (true or false)
• number (a floating-point number)
• string (a sequence of UCS [ ] characters)

A node-set may contain 0, 1 or more nodes.

A ‘single’ expression (Location Step) consists of three parts: an axis, a node-test and zero or more conditions (predicates):

Code:

axis::node-test conditions

of which only the node-test is mandatory. The node test can be expanded to

Code:

axis::namespace:nodename conditions

‘Axis’ and ‘namespace’ have default values that are used if not specified:
• default value for ‘axis’ is “child”
• default value for ‘namespace’ is the NULL namespace (this may be a source of confusion later)

If you want to narrow down your result further (building a LocationPath), you can apply another ‘single’ expression to the result node-set of the first ‘single’ expression. To do so, add the second expression after the first expression, separated by a (forward) slash:

Code:

expression1/expression2

You could compare that to the pattern you use when accessing files in a directory

If an expression doesn’t match the document it is applied to, it will return an empty value. Therefore XPath expressions do not return an error.

4.1.1 The context node - build your home (base)
This is the node, which you use as a starting point. Which node it actually is, depends on which object you call the expression (therefore you need a script/language, e.g. XSL, JavaScript, or others capable of XPath).
In XPath itself, there is only one absolute starting point defined. It is the root node, which you can access via a single (forward) slash at the beginning of the expression:

Code:

(In Ex02.xml the result of this expression is addressbook)

4.1.2 The axis - where do you want to go today?
An axis expresses a relation between the context node and the target node(s). This is like family business.

In XPath there are 11 axes defined:
• child
• descendant
• parent
• ancestor
• following-sibling
• preceding-sibling
• following
• preceding
• attribute
• namespace
• self
• descendant-or-self
• ancestor-or-self

A detailed description of the axes is given here [ ].

Note: The ancestor, descendant, following, preceding and self axes partition a document (ignoring attribute and namespace nodes): they do not overlap and together they contain all the nodes in the document.

Reverse axes are: ancestor, ancestor-or-self, preceding and preceding-sibling, all others are forward axes. You need this information to correctly count nodes in a node-set.

Example: If you have 3 sibling elements,

Code:

<foo>
<bar>
<item>

and you context node is item, then preceding-sibling::*[2] is foo.

Originally posted by Dormilich

Here it's starting to be less descriptive, just had no time to do all the explanations, yet

4.1.3 The node test - tell me what you eat and I tell you what you are
node-name – name of the node (e.g. entry, en:city, de)
* – principal node (element, attribute or namespace, according to axis)
node() – all children of any node type
text()
comment()
processing-instruction(lit eral?)

XPath’s document tree contains nodes of seven node types
• root node
• element node
• text node
• attribute node
• namespace node
• processing instruction node
• comment node

Note: The string-value of the root and element node is the concatenation of the string-values of all text node descendants of the root/element node in document order.

Originally posted by Dormilich

missing example

4.1.3.1 The node-name - who am I? (local name)
The node-name is the element’s or attribute’s name without the namespace prefix and the colon.

4.1.3.2 The namespace - that's not me, it's him!
A namespace is attached to the element using a namespace prefix, which stands before the node name separated by a colon. The namespace prefix must be declared either in the root element or the element that uses the prefix.

To declare a namespace (bind a namespace-URI to a namespace-prefix) you attach an URI to the prefix using the xmlns attribute

Code:

xmlns:prefix="namespace-URI"

If the default namespace is declared (xmlns="default-namespace-URI") it also applies to all child elements that are not explicitly attached to another namespace. /3/ Attribute nodes do not have a default namespace (i.e. they have the NULL namespace, even if the element has a namespace).
/3/ - attribute nodes are a part of the element node and not child nodes

Code:

<addressbook xmlns:en="http://bytes.com/languages/english">
	<en:city/> <!-- 'en' namespace -->
	<city/>    <!-- NULL or default namespace -->
</addressbook>

Code:

<en:city xmlns:en="http://bytes.com/languages/english"/>

Example: expressions within Ex03.xml

Code:

child::entry
ancestor::entry/following::de:*
entry/name/last
/descendant-or-self::en:city

Originally posted by Dormilich

maybe there's a way to make a better example

4.1.4 Shortcuts
Since the full notation may be a bit exhaustive, there are some shortcuts to often-used expressions.

Code:

/	(document root)
.	self::node()
..	parent::node()
//	/descendant-or-self::node()/
@	attribute::

So 
	//phone/@areacode 
is the same as 
	descendant-or-self::phone/attribute::areacode.

4.2 XPath advanced – rose-coloured glasses

4.2.1 The predicate – conditional expressions - A burger without tomato, additional cheese and much mustard, please
Sometimes you wish to select a node that conforms to a special condition like an element whose attribute has a certain value. Therefore you filter the result of you XPath expression with one or more predicates. A predicate consists of a condition expression between square brackets.

Code:

expression[condition]
expression[condition1][condition2]

where expression is a valid XPath expression (ok, what else should it be…) and condi-tion must evaluate to true (see Boolean functions below) otherwise an empty node-set is returned.

Conditions:

Code:

[expression2]
[function]
[expression operator value]
[expression operator function]
[expression operator expression]
[function operator value]
[function operator function]
[function operator expression]

Note: Of cause there are also shortcuts

Code:

[position() = number]
[position() = function1]

are equal to

Code:

[number]
[function1]

(as long as function returns a number.)

If expression1 returns a node-set you can select a single node by selecting its number (that’s like accessing an array). However, counting starts with 1 /2/. This is especially useful with the preceding-sibling and following-sibling axes.

/2/ because 0 evaluates to false (imho)

function1 (a XPath function) and expression2 must return a value that evaluates to true (neither false, 0, NULL nor the empty node-set).

Since < and > are not allowed except for tag denoting (and in CDATA sections) they are to be masked with < and >.

4.2.2 XPath functions - any ideas here?
A function will be described here as

Code:

(return type) function-name(argument)

If argument is optional the according regular expression control character is added (?, *)

Node-set functions

Code:

(number)   last()
(number)   position()
(number)   count(node-set)
(node-set) id(object)
(string)   local-name(node-set?)
(string)   namespace-uri(node-set?)
(string)   name(node-set?)

String functions

Code:

(string)   string(object?)
(string)   concat(string, string, string*)
(boolean)  starts-with(string, string)
(boolean)  contains(string, string)
(string)   substring-before(string, string)
(string)   substring-after(string, string)
(string)   substring(string, number, number?)
(string)   string(object?)
(number)   string-length(string?) /4/ defaults to node’s string-value
(string)   normalize-space(string?)
(string)   translate(string, string, string)

Boolean functions

Code:

(boolean)  boolean(object)
	The function returns true if:
	•	number > 0 and not NaN
	•	node-set not empty
	•	string length is non-zero


(boolean)  not(object)
(true)     true()
(false)    false()
(boolean)  lang(string)
Returns true if the value of the xml:lang attribute of the context (nearest ancestor) node matches string.

Number functions

Code:

(number)   number(object?)
	If the argument is omitted, object is the context node.

(number)   sum(node-set)
	The string-value of each node is converted to a number first.

(number)   floor(number)
(number)   ceiling(number)
(number)   round(number)

Originally posted by Dormilich

Now definitively the examples are missing

comments/improvements welcome

there's still so much to...

Dormilich