Tree splitting/merging

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • William Ahern

    Tree splitting/merging

    I'm looking for resources on splitting and merging XML trees. Specifically,
    on methods to pare large XML documents into smaller documents which can be
    merged later.

    Off of the top of my head, I can envision unions of node sets, and unions of
    node text. But I know there's much more to the subject than that, if not
    more alternatives than greater technical detail.

    TIA,

    Bill
  • sylvain.loiseau

    #2
    Re: Tree splitting/merging

    > I'm looking for resources on splitting and merging XML trees.
    Specifically,[color=blue]
    > on methods to pare large XML documents into smaller documents which can be
    > merged later.[/color]

    I have something for a problem (perhaps) close to yours: I need to perform
    XSLT transformation on very large document which doesn't fit in memory. I
    use a SAX parser with three XMLFilter (concretely, sub-classes of
    org.xml.sax.hel pers.XMLFilterI mpl). The first class "split" the stream (i.e.
    it throw a "start document" and a "end document" events) when it encouters a
    specific start and endElement. So the next filter receive several (smaller)
    documents one at once. This second filter is a TransformerHand ler which
    perform the transformation. Then it pass the event to a last filter, a
    "merger", who discard the "start" and "endDocumen t" event except the very
    first and the very last one.
    I was inspired by a Perl module by Barrie Slaymaker.
    (inccidentaly, I noticed that there is nothing as convenient for Java that
    the XML::SAX::Pipel ine Perl module)

    In fact I was coming on this list for a question close to this one: it's in
    a new thread...
    [color=blue]
    > Off of the top of my head, I can envision unions of node sets, and unions[/color]
    of[color=blue]
    > node text. But I know there's much more to the subject than that, if not
    > more alternatives than greater technical detail.[/color]

    Which level of well-formedness have your merging problem, i.e. do you want
    only add node to existing nodes in a DOM mode (you just need standard method
    of the Node interface), or do you want to insert mixed content checking for
    well-formedness, tag nesting, etc?
    [color=blue]
    > TIA,[/color]



    Comment

    • William Ahern

      #3
      Re: Tree splitting/merging

      sylvain.loiseau <sylvain.loisea u@wanadoo.fr> wrote:[color=blue][color=green]
      >> I'm looking for resources on splitting and merging XML trees.[/color]
      > Specifically,[color=green]
      >> on methods to pare large XML documents into smaller documents which can be
      >> merged later.[/color]
      >
      > I have something for a problem (perhaps) close to yours: I need to perform
      > XSLT transformation on very large document which doesn't fit in memory. I
      > use a SAX parser with three XMLFilter (concretely, sub-classes of
      > org.xml.sax.hel pers.XMLFilterI mpl). The first class "split" the stream (i.e.
      > it throw a "start document" and a "end document" events) when it encouters a
      > specific start and endElement. So the next filter receive several (smaller)
      > documents one at once. This second filter is a TransformerHand ler which
      > perform the transformation. Then it pass the event to a last filter, a
      > "merger", who discard the "start" and "endDocumen t" event except the very
      > first and the very last one.
      > I was inspired by a Perl module by Barrie Slaymaker.
      > (inccidentaly, I noticed that there is nothing as convenient for Java that
      > the XML::SAX::Pipel ine Perl module)[/color]

      Right after posting I tripped over the XPipe project (http://xpipe.sf.net/).
      XPipe associates this w/ the scatter/gather pattern, and they seem to have
      put a lot of thought into the issues. Specifically, they elaborate on a
      notion of a "fulcra", or the node-depth I suppose you could call it, that a
      document can be split on. Probably you're already thought this through, but
      maybe you can find more info on that site. They have code and list
      discussions you can wade through.

      - Bill

      Comment

      • sylvain.loiseau

        #4
        Re: Tree splitting/merging

        Thanks, it looks very interesting.

        Sylvain

        "William Ahern" <william@wilbur .25thandClement .com> a écrit dans le message
        de news: g4ol71-0jq.ln1@wilbur. 25thandClement. com...[color=blue]
        > sylvain.loiseau <sylvain.loisea u@wanadoo.fr> wrote:[color=green][color=darkred]
        > >> I'm looking for resources on splitting and merging XML trees.[/color]
        > > Specifically,[color=darkred]
        > >> on methods to pare large XML documents into smaller documents which can[/color][/color][/color]
        be[color=blue][color=green][color=darkred]
        > >> merged later.[/color]
        > >
        > > I have something for a problem (perhaps) close to yours: I need to[/color][/color]
        perform[color=blue][color=green]
        > > XSLT transformation on very large document which doesn't fit in memory.[/color][/color]
        I[color=blue][color=green]
        > > use a SAX parser with three XMLFilter (concretely, sub-classes of
        > > org.xml.sax.hel pers.XMLFilterI mpl). The first class "split" the stream[/color][/color]
        (i.e.[color=blue][color=green]
        > > it throw a "start document" and a "end document" events) when it[/color][/color]
        encouters a[color=blue][color=green]
        > > specific start and endElement. So the next filter receive several[/color][/color]
        (smaller)[color=blue][color=green]
        > > documents one at once. This second filter is a TransformerHand ler which
        > > perform the transformation. Then it pass the event to a last filter, a
        > > "merger", who discard the "start" and "endDocumen t" event except the[/color][/color]
        very[color=blue][color=green]
        > > first and the very last one.
        > > I was inspired by a Perl module by Barrie Slaymaker.
        > > (inccidentaly, I noticed that there is nothing as convenient for Java[/color][/color]
        that[color=blue][color=green]
        > > the XML::SAX::Pipel ine Perl module)[/color]
        >
        > Right after posting I tripped over the XPipe project[/color]
        (http://xpipe.sf.net/).[color=blue]
        > XPipe associates this w/ the scatter/gather pattern, and they seem to have
        > put a lot of thought into the issues. Specifically, they elaborate on a
        > notion of a "fulcra", or the node-depth I suppose you could call it, that[/color]
        a[color=blue]
        > document can be split on. Probably you're already thought this through,[/color]
        but[color=blue]
        > maybe you can find more info on that site. They have code and list
        > discussions you can wade through.
        >
        > - Bill[/color]


        Comment

        Working...