XML, XPath & XSLT - part 2

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Dormilich
    Recognized Expert Expert
    • Aug 2008
    • 8694

    XML, XPath & XSLT - part 2

    4 XPath

    4.1 XPath basics – walk the line
    An XPath expression (Location Path) returns an object that may be of the type
    • node-set (an unordered collection of nodes without duplicates)
    • boolean (true or false)
    • number (a floating-point number)
    • string (a sequence of UCS [ ] characters)

    A node-set may contain 0, 1 or more nodes.

    A ‘single’ expression (Location Step) consists of three parts: an axis, a node-test and zero or more conditions (predicates):
    Code:
    axis::node-test conditions
    of which only the node-test is mandatory. The node test can be expanded to
    Code:
    axis::namespace:nodename conditions
    ‘Axis’ and ‘namespace’ have default values that are used if not specified:
    • default value for ‘axis’ is “child”
    • default value for ‘namespace’ is the NULL namespace (this may be a source of confusion later)

    If you want to narrow down your result further (building a LocationPath), you can apply another ‘single’ expression to the result node-set of the first ‘single’ expression. To do so, add the second expression after the first expression, separated by a (forward) slash:
    Code:
    expression1/expression2
    You could compare that to the pattern you use when accessing files in a directory

    If an expression doesn’t match the document it is applied to, it will return an empty value. Therefore XPath expressions do not return an error.

    4.1.1 The context node - build your home (base)
    This is the node, which you use as a starting point. Which node it actually is, depends on which object you call the expression (therefore you need a script/language, e.g. XSL, JavaScript, or others capable of XPath).
    In XPath itself, there is only one absolute starting point defined. It is the root node, which you can access via a single (forward) slash at the beginning of the expression:
    Code:
    /
    (In Ex02.xml the result of this expression is addressbook)

    4.1.2 The axis - where do you want to go today?
    An axis expresses a relation between the context node and the target node(s). This is like family business.

    In XPath there are 11 axes defined:
    • child
    • descendant
    • parent
    • ancestor
    • following-sibling
    • preceding-sibling
    • following
    • preceding
    • attribute
    • namespace
    • self
    • descendant-or-self
    • ancestor-or-self

    A detailed description of the axes is given here [ ].

    Note: The ancestor, descendant, following, preceding and self axes partition a document (ignoring attribute and namespace nodes): they do not overlap and together they contain all the nodes in the document.

    Reverse axes are: ancestor, ancestor-or-self, preceding and preceding-sibling, all others are forward axes. You need this information to correctly count nodes in a node-set.

    Example: If you have 3 sibling elements,
    Code:
    <foo>
    <bar>
    <item>
    and you context node is item, then preceding-sibling::*[2] is foo.
    Originally posted by Dormilich
    Here it's starting to be less descriptive, just had no time to do all the explanations, yet
    4.1.3 The node test - tell me what you eat and I tell you what you are
    node-name – name of the node (e.g. entry, en:city, de)
    * – principal node (element, attribute or namespace, according to axis)
    node() – all children of any node type
    text()
    comment()
    processing-instruction(lit eral?)

    XPath’s document tree contains nodes of seven node types
    • root node
    • element node
    • text node
    • attribute node
    • namespace node
    • processing instruction node
    • comment node

    Note: The string-value of the root and element node is the concatenation of the string-values of all text node descendants of the root/element node in document order.
    Originally posted by Dormilich
    missing example
    4.1.3.1 The node-name - who am I? (local name)
    The node-name is the element’s or attribute’s name without the namespace prefix and the colon.

    4.1.3.2 The namespace - that's not me, it's him!
    A namespace is attached to the element using a namespace prefix, which stands before the node name separated by a colon. The namespace prefix must be declared either in the root element or the element that uses the prefix.

    To declare a namespace (bind a namespace-URI to a namespace-prefix) you attach an URI to the prefix using the xmlns attribute
    Code:
    xmlns:prefix="namespace-URI"
    If the default namespace is declared (xmlns="default-namespace-URI") it also applies to all child elements that are not explicitly attached to another namespace. /3/ Attribute nodes do not have a default namespace (i.e. they have the NULL namespace, even if the element has a namespace).
    /3/ - attribute nodes are a part of the element node and not child nodes
    Code:
    <addressbook xmlns:en="http://bytes.com/languages/english">
    	<en:city/> <!-- 'en' namespace -->
    	<city/>    <!-- NULL or default namespace -->
    </addressbook>
    Code:
    <en:city xmlns:en="http://bytes.com/languages/english"/>
    Example: expressions within Ex03.xml
    Code:
    child::entry
    ancestor::entry/following::de:*
    entry/name/last
    /descendant-or-self::en:city
    Originally posted by Dormilich
    maybe there's a way to make a better example
    4.1.4 Shortcuts
    Since the full notation may be a bit exhaustive, there are some shortcuts to often-used expressions.
    Code:
    /	(document root)
    .	self::node()
    ..	parent::node()
    //	/descendant-or-self::node()/
    @	attribute::
    
    So 
    	//phone/@areacode 
    is the same as 
    	descendant-or-self::phone/attribute::areacode.
    4.2 XPath advanced – rose-coloured glasses

    4.2.1 The predicate – conditional expressions - A burger without tomato, additional cheese and much mustard, please
    Sometimes you wish to select a node that conforms to a special condition like an element whose attribute has a certain value. Therefore you filter the result of you XPath expression with one or more predicates. A predicate consists of a condition expression between square brackets.
    Code:
    expression[condition]
    expression[condition1][condition2]
    where expression is a valid XPath expression (ok, what else should it be…) and condi-tion must evaluate to true (see Boolean functions below) otherwise an empty node-set is returned.

    Conditions:
    Code:
    [expression2]
    [function]
    [expression operator value]
    [expression operator function]
    [expression operator expression]
    [function operator value]
    [function operator function]
    [function operator expression]
    Note: Of cause there are also shortcuts
    Code:
    [position() = number]
    [position() = function1]
    are equal to
    Code:
    [number]
    [function1]
    (as long as function returns a number.)

    If expression1 returns a node-set you can select a single node by selecting its number (that’s like accessing an array). However, counting starts with 1 /2/. This is especially useful with the preceding-sibling and following-sibling axes.

    /2/ because 0 evaluates to false (imho)

    function1 (a XPath function) and expression2 must return a value that evaluates to true (neither false, 0, NULL nor the empty node-set).

    Since < and > are not allowed except for tag denoting (and in CDATA sections) they are to be masked with &lt; and &gt;.

    4.2.2 XPath functions - any ideas here?
    A function will be described here as
    Code:
    (return type) function-name(argument)
    If argument is optional the according regular expression control character is added (?, *)

    Node-set functions
    Code:
    (number)   last()
    (number)   position()
    (number)   count(node-set)
    (node-set) id(object)
    (string)   local-name(node-set?)
    (string)   namespace-uri(node-set?)
    (string)   name(node-set?)
    String functions
    Code:
    (string)   string(object?)
    (string)   concat(string, string, string*)
    (boolean)  starts-with(string, string)
    (boolean)  contains(string, string)
    (string)   substring-before(string, string)
    (string)   substring-after(string, string)
    (string)   substring(string, number, number?)
    (string)   string(object?)
    (number)   string-length(string?) /4/ defaults to node’s string-value
    (string)   normalize-space(string?)
    (string)   translate(string, string, string)
    Boolean functions
    Code:
    (boolean)  boolean(object)
    	The function returns true if:
    	•	number > 0 and not NaN
    	•	node-set not empty
    	•	string length is non-zero
    
    
    (boolean)  not(object)
    (true)     true()
    (false)    false()
    (boolean)  lang(string)
    Returns true if the value of the xml:lang attribute of the context (nearest ancestor) node matches string.
    Number functions
    Code:
    (number)   number(object?)
    	If the argument is omitted, object is the context node.
    
    (number)   sum(node-set)
    	The string-value of each node is converted to a number first.
    
    (number)   floor(number)
    (number)   ceiling(number)
    (number)   round(number)
    Originally posted by Dormilich
    Now definitively the examples are missing
    comments/improvements welcome

    there's still so much to...

    Dormilich
Working...