serialization and circular references

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Eric Eggermann

    serialization and circular references

    I'm having a problem with really large file sizes when serializing the
    classes that describe my little document. There are some circular references
    which result in the same object getting written to disk multiple times. Now
    I'm using just basic serialization as described in MSDN. Clearly, I need to
    stop serializing these parent references, but I do need to re-instate them
    to the proper objects when de-serializing. Can anyone help me with a
    strategy to do that?

    TIA,
    Eric Eggermann


  • Bob Powell [MVP]

    #2
    Re: serialization and circular references

    References to parents are difficult. Generally you should remove all
    references before you serialize, serialize the object tree and when you
    deserialize, replace all the references because the object obviously won't
    be the same one that was serialized.

    Such as....

    foreach(myDooda a o in myObjectTree)
    o.Parent=null;
    //serialize...

    then...

    MyObjectTree myObjectTree=bf .Deserialize(.. );
    foreach(myDooda a o in myObjectTree)
    o.Parent=this;

    etc..

    If you have references to items in a deeply nested tree which refer to other
    items in the tree you'll need to have some other way of identifying and
    reconstructing the references such as keeping all the objects in a flat list
    as-well and referring to their index in that list or something.
    --
    Bob Powell [MVP]
    C#, System.Drawing

    The November edition of Well Formed is now available.
    Learn how to create Shell Extensions in managed code.


    Answer those GDI+ questions with the GDI+ FAQ


    Read my Blog at http://bobpowelldotnet.blogspot.com

    "Eric Eggermann >" <<none> wrote in message
    news:OMQ144vsDH A.1788@tk2msftn gp13.phx.gbl...[color=blue]
    > I'm having a problem with really large file sizes when serializing the
    > classes that describe my little document. There are some circular[/color]
    references[color=blue]
    > which result in the same object getting written to disk multiple times.[/color]
    Now[color=blue]
    > I'm using just basic serialization as described in MSDN. Clearly, I need[/color]
    to[color=blue]
    > stop serializing these parent references, but I do need to re-instate them
    > to the proper objects when de-serializing. Can anyone help me with a
    > strategy to do that?
    >
    > TIA,
    > Eric Eggermann
    >
    >[/color]


    Comment

    • Eric Eggermann

      #3
      Re: serialization and circular references


      "Bob Powell [MVP]" <bob@_spamkille r_bobpowell.net > wrote in message
      news:u%23eTJ1zs DHA.1888@TK2MSF TNGP10.phx.gbl. ..

      snip[color=blue]
      >
      > If you have references to items in a deeply nested tree which refer to[/color]
      other[color=blue]
      > items in the tree you'll need to have some other way of identifying and
      > reconstructing the references such as keeping all the objects in a flat[/color]
      list[color=blue]
      > as-well and referring to their index in that list or something.[/color]

      Thanks Bob. Yeah, There's a tree, but it's not all that deep. Just three
      levels, not counting two collection classes. Objects only ever refer to one
      above, skipping the collection classes.

      Anyway, I've been thinking about doing the custom serialization thing, and
      since the references are all in a 'straight line' so to speak, each object
      can restore the parent properties of it's field objects, provided it knows
      when to do this. How can I tell when my top level object is deserialized ?

      Eric Eggermann


      Comment

      • Jay B. Harlow [MVP - Outlook]

        #4
        Re: serialization and circular references

        Eric,[color=blue]
        > There are some circular references
        > which result in the same object getting written to disk multiple times.[/color]
        Are you writing a single graph or multiple graphs to your file? In other
        words are you calling Formatter.Seria lize once or are you calling
        Formatter.Seria lize multiple times, for a given stream?

        If you call Formatter.Seria lize multiple times, the serializer cannot track
        references to the same object, so it will serialize that object multiple
        times. I find its easier & better to serialize the root graph and call
        Formatter.Seria lize a single time!
        [color=blue]
        > Clearly, I need to
        > stop serializing these parent references, but I do need to re-instate them[/color]
        If you have a tree, all you need to do is serialize the root of the tree and
        the entire tree will be serialized, if 50% of the nodes of the tree refer to
        a single common object, this single common object will be serialized once.

        For places where I do not want to serialize 'parent references', yet be able
        to re-establish them on deserialization . I implement the ISerializable
        interface and serialize an identifier that can be used to lookup the 'parent
        reference' when I deserialize. Another option is implementing the
        IObjectReferenc e interface as identified in Part 2 of the following
        articles. This is useful for objects that implement the Singleton pattern.

        Find official documentation, practical know-how, and expert guidance for builders working and troubleshooting in Microsoft products.

        Find official documentation, practical know-how, and expert guidance for builders working and troubleshooting in Microsoft products.

        Find official documentation, practical know-how, and expert guidance for builders working and troubleshooting in Microsoft products.


        Note I find all three articles invaluable when working with .NET
        serialization.

        Hope this helps
        Jay

        "Eric Eggermann >" <<none> wrote in message
        news:OMQ144vsDH A.1788@tk2msftn gp13.phx.gbl...[color=blue]
        > I'm having a problem with really large file sizes when serializing the
        > classes that describe my little document. There are some circular[/color]
        references[color=blue]
        > which result in the same object getting written to disk multiple times.[/color]
        Now[color=blue]
        > I'm using just basic serialization as described in MSDN. Clearly, I need[/color]
        to[color=blue]
        > stop serializing these parent references, but I do need to re-instate them
        > to the proper objects when de-serializing. Can anyone help me with a
        > strategy to do that?
        >
        > TIA,
        > Eric Eggermann
        >
        >[/color]


        Comment

        • Eric Eggermann

          #5
          Re: serialization and circular references


          "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP @msn.com> wrote in message
          news:elxi9Q7sDH A.700@TK2MSFTNG P11.phx.gbl...[color=blue]
          > Eric,[color=green]
          > > There are some circular references
          > > which result in the same object getting written to disk multiple times.[/color]
          > Are you writing a single graph or multiple graphs to your file? In other
          > words are you calling Formatter.Seria lize once or are you calling
          > Formatter.Seria lize multiple times, for a given stream?
          >
          > If you call Formatter.Seria lize multiple times, the serializer cannot[/color]
          track[color=blue]
          > references to the same object, so it will serialize that object multiple
          > times. I find its easier & better to serialize the root graph and call
          > Formatter.Seria lize a single time!
          > ...[/color]

          Thanks a lot Jay,
          I've sort of solved my problem, but your post showed me where I was
          wrong in diagnosing it. I was writing one graph, and did have all the refs
          in a straight line down the tree. I'd first noticed the problem when using
          serialization to create a deep copy of an object, which is in the center of
          my tree, and it was not the parent itself, being copied too many times, but
          all the siblings of my object, through the parent reference, which were
          pushing up the file sizes WAAAY too much, causing the copy process to take a
          very long time.

          I fixed (hacked) the problem by marking all parent refs in the tree
          non-serializable, re-instating the parent reference at the end of the clone
          method, then implementing SaveToFile, and FromFile methods in my root
          object, and then re-instating the refs again at the end of the FromFile
          method. So it is working as expected.

          When I save the whole tree, the files are still way too big, but the size
          increases in proper proportions, and not exponentially, so that problem
          belongs under another subject, and of course, I'll give tightening up the
          size a good whack before posting again.

          Thanks for the help.

          Eric


          Comment

          • Jay B. Harlow [MVP - Outlook]

            #6
            Re: serialization and circular references

            Eric,
            How many nodes have the parent ref?
            [color=blue]
            > I fixed (hacked) the problem by marking all parent refs in the tree
            > non-serializable, re-instating the parent reference at the end of the[/color]
            clone[color=blue]
            > method, then implementing SaveToFile, and FromFile methods in my root
            > object, and then re-instating the refs again at the end of the FromFile
            > method. So it is working as expected.[/color]
            I would suggest you put the NonSerializedAt tribute on each parent ref you do
            not want serialized. Of course when you deserialize these references will be
            null.

            [Serializable]
            class Node
            {
            [NonSerialized]
            SomeClass parent;

            SomeOtherClass data;
            }

            In the above, contents of the data field will be serialized with the class,
            while parent will not.

            Alternatively I would suggest you check out the ISerializable interface, as
            I suspect that will be 'cleaner' implementation, then what you are currently
            doing.

            Hope this helps
            Jay

            "Eric Eggermann >" <<none> wrote in message
            news:uNVdWCDtDH A.2244@TK2MSFTN GP09.phx.gbl...[color=blue]
            >
            > "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP @msn.com> wrote in message
            > news:elxi9Q7sDH A.700@TK2MSFTNG P11.phx.gbl...[color=green]
            > > Eric,[color=darkred]
            > > > There are some circular references
            > > > which result in the same object getting written to disk multiple[/color][/color][/color]
            times.[color=blue][color=green]
            > > Are you writing a single graph or multiple graphs to your file? In other
            > > words are you calling Formatter.Seria lize once or are you calling
            > > Formatter.Seria lize multiple times, for a given stream?
            > >
            > > If you call Formatter.Seria lize multiple times, the serializer cannot[/color]
            > track[color=green]
            > > references to the same object, so it will serialize that object multiple
            > > times. I find its easier & better to serialize the root graph and call
            > > Formatter.Seria lize a single time!
            > > ...[/color]
            >
            > Thanks a lot Jay,
            > I've sort of solved my problem, but your post showed me where I was
            > wrong in diagnosing it. I was writing one graph, and did have all the refs
            > in a straight line down the tree. I'd first noticed the problem when using
            > serialization to create a deep copy of an object, which is in the center[/color]
            of[color=blue]
            > my tree, and it was not the parent itself, being copied too many times,[/color]
            but[color=blue]
            > all the siblings of my object, through the parent reference, which were
            > pushing up the file sizes WAAAY too much, causing the copy process to take[/color]
            a[color=blue]
            > very long time.
            >
            > I fixed (hacked) the problem by marking all parent refs in the tree
            > non-serializable, re-instating the parent reference at the end of the[/color]
            clone[color=blue]
            > method, then implementing SaveToFile, and FromFile methods in my root
            > object, and then re-instating the refs again at the end of the FromFile
            > method. So it is working as expected.
            >
            > When I save the whole tree, the files are still way too big, but the size
            > increases in proper proportions, and not exponentially, so that problem
            > belongs under another subject, and of course, I'll give tightening up the
            > size a good whack before posting again.
            >
            > Thanks for the help.
            >
            > Eric
            >
            >[/color]


            Comment

            • Eric Eggermann

              #7
              Re: serialization and circular references


              "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP @msn.com> wrote in message
              news:u8OLn2EtDH A.560@TK2MSFTNG P11.phx.gbl...[color=blue]
              > Eric,
              > How many nodes have the parent ref?[/color]

              Almost all, and they point up level. So under the root, there are 4 levels.
              I looked for ways to do away with the parent entirely, but that isn't really
              possible, without moving some routines out of the class they logically
              belong in.

              I'm going to have a good look at ISerializable, and I may have to make big
              changes to my model anyway.

              Thanks,

              Eric


              Comment

              • Jay B. Harlow [MVP - Outlook]

                #8
                Re: serialization and circular references

                Eric,
                I misstated my question.

                How many node types have the parent ref?

                What I'm asking is: Do you have a one or two node types, implementing
                ISerializable would not be that much effort, however if you have 50 or 60
                distinct node types, ISeralizable may be more effort, even with a base class
                that has "just" the parent ref in it.

                Hope this helps
                Jay

                "Eric Eggermann >" <<none> wrote in message
                news:O905X8HtDH A.1740@TK2MSFTN GP12.phx.gbl...[color=blue]
                >
                > "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP @msn.com> wrote in message
                > news:u8OLn2EtDH A.560@TK2MSFTNG P11.phx.gbl...[color=green]
                > > Eric,
                > > How many nodes have the parent ref?[/color]
                >
                > Almost all, and they point up level. So under the root, there are 4[/color]
                levels.[color=blue]
                > I looked for ways to do away with the parent entirely, but that isn't[/color]
                really[color=blue]
                > possible, without moving some routines out of the class they logically
                > belong in.
                >
                > I'm going to have a good look at ISerializable, and I may have to make big
                > changes to my model anyway.
                >
                > Thanks,
                >
                > Eric
                >
                >[/color]


                Comment

                • Eric Eggermann

                  #9
                  Re: serialization and circular references


                  "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP @msn.com> wrote in message
                  news:el0Z1%23Ht DHA.628@tk2msft ngp13.phx.gbl.. .[color=blue]
                  > Eric,
                  > I misstated my question.
                  >
                  > How many node types have the parent ref?
                  >[/color]

                  Still 4. Yeah, I can implement it. It's not so much. The model looks like
                  this
                  FlashSet(root)
                  CardsCollection
                  Card (ref to FlashSet)
                  Panel (ref to Card)
                  ElementsCollect ion (ref to Panel)
                  Element (ref to Panel)

                  Element is a base class for two other types.
                  So it's not such a big deal. Think I'll use ISerializable anyway, and then I
                  can easily see which bits are taking up the most space.

                  Eric


                  Comment

                  Working...