4 XPath
4.1 XPath basics – walk the line
An XPath expression (Location Path) returns an object that may be of the type
• node-set (an unordered collection of nodes without duplicates)
• boolean (true or false)
• number (a floating-point number)
• string (a sequence of UCS [ ] characters)
A node-set may contain 0, 1 or more nodes.
A ‘single’ expression (Location Step) consists of three parts: an axis, a node-test and zero or more conditions (predicates):
of which only the node-test is mandatory. The node test can be expanded to
‘Axis’ and ‘namespace’ have default values that are used if not specified:
• default value for ‘axis’ is “child”
• default value for ‘namespace’ is the NULL namespace (this may be a source of confusion later)
If you want to narrow down your result further (building a LocationPath), you can apply another ‘single’ expression to the result node-set of the first ‘single’ expression. To do so, add the second expression after the first expression, separated by a (forward) slash:
You could compare that to the pattern you use when accessing files in a directory
If an expression doesn’t match the document it is applied to, it will return an empty value. Therefore XPath expressions do not return an error.
4.1.1 The context node - build your home (base)
This is the node, which you use as a starting point. Which node it actually is, depends on which object you call the expression (therefore you need a script/language, e.g. XSL, JavaScript, or others capable of XPath).
In XPath itself, there is only one absolute starting point defined. It is the root node, which you can access via a single (forward) slash at the beginning of the expression:
(In Ex02.xml the result of this expression is addressbook)
4.1.2 The axis - where do you want to go today?
An axis expresses a relation between the context node and the target node(s). This is like family business.
In XPath there are 11 axes defined:
• child
• descendant
• parent
• ancestor
• following-sibling
• preceding-sibling
• following
• preceding
• attribute
• namespace
• self
• descendant-or-self
• ancestor-or-self
A detailed description of the axes is given here [ ].
Note: The ancestor, descendant, following, preceding and self axes partition a document (ignoring attribute and namespace nodes): they do not overlap and together they contain all the nodes in the document.
Reverse axes are: ancestor, ancestor-or-self, preceding and preceding-sibling, all others are forward axes. You need this information to correctly count nodes in a node-set.
Example: If you have 3 sibling elements,
and you context node is item, then preceding-sibling::*[2] is foo.
4.1.3 The node test - tell me what you eat and I tell you what you are
node-name – name of the node (e.g. entry, en:city, de)
* – principal node (element, attribute or namespace, according to axis)
node() – all children of any node type
text()
comment()
processing-instruction(lit eral?)
XPath’s document tree contains nodes of seven node types
• root node
• element node
• text node
• attribute node
• namespace node
• processing instruction node
• comment node
Note: The string-value of the root and element node is the concatenation of the string-values of all text node descendants of the root/element node in document order.
4.1.3.1 The node-name - who am I? (local name)
The node-name is the element’s or attribute’s name without the namespace prefix and the colon.
4.1.3.2 The namespace - that's not me, it's him!
A namespace is attached to the element using a namespace prefix, which stands before the node name separated by a colon. The namespace prefix must be declared either in the root element or the element that uses the prefix.
To declare a namespace (bind a namespace-URI to a namespace-prefix) you attach an URI to the prefix using the xmlns attribute
If the default namespace is declared (xmlns="default-namespace-URI") it also applies to all child elements that are not explicitly attached to another namespace. /3/ Attribute nodes do not have a default namespace (i.e. they have the NULL namespace, even if the element has a namespace).
/3/ - attribute nodes are a part of the element node and not child nodes
Example: expressions within Ex03.xml
4.1.4 Shortcuts
Since the full notation may be a bit exhaustive, there are some shortcuts to often-used expressions.
4.2 XPath advanced – rose-coloured glasses
4.2.1 The predicate – conditional expressions - A burger without tomato, additional cheese and much mustard, please
Sometimes you wish to select a node that conforms to a special condition like an element whose attribute has a certain value. Therefore you filter the result of you XPath expression with one or more predicates. A predicate consists of a condition expression between square brackets.
where expression is a valid XPath expression (ok, what else should it be…) and condi-tion must evaluate to true (see Boolean functions below) otherwise an empty node-set is returned.
Conditions:
Note: Of cause there are also shortcuts
are equal to
(as long as function returns a number.)
If expression1 returns a node-set you can select a single node by selecting its number (that’s like accessing an array). However, counting starts with 1 /2/. This is especially useful with the preceding-sibling and following-sibling axes.
/2/ because 0 evaluates to false (imho)
function1 (a XPath function) and expression2 must return a value that evaluates to true (neither false, 0, NULL nor the empty node-set).
Since < and > are not allowed except for tag denoting (and in CDATA sections) they are to be masked with < and >.
4.2.2 XPath functions - any ideas here?
A function will be described here as
If argument is optional the according regular expression control character is added (?, *)
Node-set functions
String functions
Boolean functions
Number functions
comments/improvements welcome
there's still so much to...
Dormilich
4.1 XPath basics – walk the line
An XPath expression (Location Path) returns an object that may be of the type
• node-set (an unordered collection of nodes without duplicates)
• boolean (true or false)
• number (a floating-point number)
• string (a sequence of UCS [ ] characters)
A node-set may contain 0, 1 or more nodes.
A ‘single’ expression (Location Step) consists of three parts: an axis, a node-test and zero or more conditions (predicates):
Code:
axis::node-test conditions
Code:
axis::namespace:nodename conditions
• default value for ‘axis’ is “child”
• default value for ‘namespace’ is the NULL namespace (this may be a source of confusion later)
If you want to narrow down your result further (building a LocationPath), you can apply another ‘single’ expression to the result node-set of the first ‘single’ expression. To do so, add the second expression after the first expression, separated by a (forward) slash:
Code:
expression1/expression2
If an expression doesn’t match the document it is applied to, it will return an empty value. Therefore XPath expressions do not return an error.
4.1.1 The context node - build your home (base)
This is the node, which you use as a starting point. Which node it actually is, depends on which object you call the expression (therefore you need a script/language, e.g. XSL, JavaScript, or others capable of XPath).
In XPath itself, there is only one absolute starting point defined. It is the root node, which you can access via a single (forward) slash at the beginning of the expression:
Code:
/
4.1.2 The axis - where do you want to go today?
An axis expresses a relation between the context node and the target node(s). This is like family business.
In XPath there are 11 axes defined:
• child
• descendant
• parent
• ancestor
• following-sibling
• preceding-sibling
• following
• preceding
• attribute
• namespace
• self
• descendant-or-self
• ancestor-or-self
A detailed description of the axes is given here [ ].
Note: The ancestor, descendant, following, preceding and self axes partition a document (ignoring attribute and namespace nodes): they do not overlap and together they contain all the nodes in the document.
Reverse axes are: ancestor, ancestor-or-self, preceding and preceding-sibling, all others are forward axes. You need this information to correctly count nodes in a node-set.
Example: If you have 3 sibling elements,
Code:
<foo> <bar> <item>
Originally posted by Dormilich
node-name – name of the node (e.g. entry, en:city, de)
* – principal node (element, attribute or namespace, according to axis)
node() – all children of any node type
text()
comment()
processing-instruction(lit eral?)
XPath’s document tree contains nodes of seven node types
• root node
• element node
• text node
• attribute node
• namespace node
• processing instruction node
• comment node
Note: The string-value of the root and element node is the concatenation of the string-values of all text node descendants of the root/element node in document order.
Originally posted by Dormilich
The node-name is the element’s or attribute’s name without the namespace prefix and the colon.
4.1.3.2 The namespace - that's not me, it's him!
A namespace is attached to the element using a namespace prefix, which stands before the node name separated by a colon. The namespace prefix must be declared either in the root element or the element that uses the prefix.
To declare a namespace (bind a namespace-URI to a namespace-prefix) you attach an URI to the prefix using the xmlns attribute
Code:
xmlns:prefix="namespace-URI"
/3/ - attribute nodes are a part of the element node and not child nodes
Code:
<addressbook xmlns:en="http://bytes.com/languages/english"> <en:city/> <!-- 'en' namespace --> <city/> <!-- NULL or default namespace --> </addressbook>
Code:
<en:city xmlns:en="http://bytes.com/languages/english"/>
Code:
child::entry ancestor::entry/following::de:* entry/name/last /descendant-or-self::en:city
Originally posted by Dormilich
Since the full notation may be a bit exhaustive, there are some shortcuts to often-used expressions.
Code:
/ (document root) . self::node() .. parent::node() // /descendant-or-self::node()/ @ attribute:: So //phone/@areacode is the same as descendant-or-self::phone/attribute::areacode.
4.2.1 The predicate – conditional expressions - A burger without tomato, additional cheese and much mustard, please
Sometimes you wish to select a node that conforms to a special condition like an element whose attribute has a certain value. Therefore you filter the result of you XPath expression with one or more predicates. A predicate consists of a condition expression between square brackets.
Code:
expression[condition] expression[condition1][condition2]
Conditions:
Code:
[expression2] [function] [expression operator value] [expression operator function] [expression operator expression] [function operator value] [function operator function] [function operator expression]
Code:
[position() = number] [position() = function1]
Code:
[number] [function1]
If expression1 returns a node-set you can select a single node by selecting its number (that’s like accessing an array). However, counting starts with 1 /2/. This is especially useful with the preceding-sibling and following-sibling axes.
/2/ because 0 evaluates to false (imho)
function1 (a XPath function) and expression2 must return a value that evaluates to true (neither false, 0, NULL nor the empty node-set).
Since < and > are not allowed except for tag denoting (and in CDATA sections) they are to be masked with < and >.
4.2.2 XPath functions - any ideas here?
A function will be described here as
Code:
(return type) function-name(argument)
Node-set functions
Code:
(number) last() (number) position() (number) count(node-set) (node-set) id(object) (string) local-name(node-set?) (string) namespace-uri(node-set?) (string) name(node-set?)
Code:
(string) string(object?) (string) concat(string, string, string*) (boolean) starts-with(string, string) (boolean) contains(string, string) (string) substring-before(string, string) (string) substring-after(string, string) (string) substring(string, number, number?) (string) string(object?) (number) string-length(string?) /4/ defaults to node’s string-value (string) normalize-space(string?) (string) translate(string, string, string)
Code:
(boolean) boolean(object) The function returns true if: • number > 0 and not NaN • node-set not empty • string length is non-zero (boolean) not(object) (true) true() (false) false() (boolean) lang(string) Returns true if the value of the xml:lang attribute of the context (nearest ancestor) node matches string.
Code:
(number) number(object?) If the argument is omitted, object is the context node. (number) sum(node-set) The string-value of each node is converted to a number first. (number) floor(number) (number) ceiling(number) (number) round(number)
Originally posted by Dormilich
there's still so much to...
Dormilich