multiset segfault

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Arthur J. O'Dwyer

    multiset segfault


    I'm seeing a bug at the moment that I can't track down. It's
    part of a moderately large program, but here is a small program
    that exhibits the bug on gcc. (The program code follows at the
    bottom of the message.)
    The symptom is this:

    % g++ --version
    g++ (GCC) 3.2.1
    [...]
    % g++ -W -Wall -pedantic test.cpp
    % ./a.out
    push_back(3402, tian).
    push_back(3401, tian).
    push_back(3405, wu)Segmentation fault
    %

    So it appears something is going wrong in multiset::inser t.
    I'm not very familiar with the STL, so I suppose I'm abusing
    multiset in some way, but for the life of me I can't figure
    out how, or how to fix it.
    (The goal of the TerminalList class is to be a container
    of Terminals, but to allow very fast access to the list of
    Terminals containing a certain 'syllable' or 'unino'. I'm
    using 'lower_bound' and 'upper_bound' to extract a vector
    of Terminals with a given syllable from myStrHash; please
    tell me if you can see a better way, but this is unrelated
    to the segfault bug, which is why I snipped the code that
    does all that from the program below.)

    Thanks much,
    -Arthur

    ==code begins==

    #include <cstdio>
    #include <cctype>
    #include <vector>
    #include <set>
    #include <string>
    using namespace std;

    struct Terminal
    {
    int unino;
    string syllable;
    int d;

    Terminal(int u, const string& y, int c):
    unino(u), syllable(y), d(c)
    {}
    };

    struct myStrPred {
    bool operator() (Terminal* a, Terminal* b)
    { return (a->syllable < b->syllable); }
    };

    struct TerminalList
    {
    void push_back(const Terminal& t);

    typedef multiset<Termin al*, myStrPred> StrHashT;
    vector<Terminal > myVector;
    StrHashT myStrHash;
    };

    void TerminalList::p ush_back(const Terminal& t)
    {
    myVector.push_b ack(t);
    Terminal* pt = &myVector.back( );

    printf("push_ba ck(%04x,%s)",
    pt->unino, pt->syllable.c_str ());
    fflush(stdout);

    myStrHash.inser t(pt);

    printf(".\n");
    }

    int main()
    {
    TerminalList characters;
    characters.push _back(Terminal( 0x3402, "tian", 1));
    characters.push _back(Terminal( 0x3401, "tian", 1));
    characters.push _back(Terminal( 0x3405, "wu", 3));
    return 0;
    }

  • Victor Bazarov

    #2
    Re: multiset segfault

    "Arthur J. O'Dwyer" <ajo@nospam.and rew.cmu.edu> wrote...[color=blue]
    > I'm seeing a bug at the moment that I can't track down. It's
    > part of a moderately large program, but here is a small program
    > that exhibits the bug on gcc. (The program code follows at the
    > bottom of the message.)
    > The symptom is this:
    >
    > % g++ --version
    > g++ (GCC) 3.2.1
    > [...]
    > % g++ -W -Wall -pedantic test.cpp
    > % ./a.out
    > push_back(3402, tian).
    > push_back(3401, tian).
    > push_back(3405, wu)Segmentation fault
    > %
    >
    > So it appears something is going wrong in multiset::inser t.
    > I'm not very familiar with the STL, so I suppose I'm abusing
    > multiset in some way, but for the life of me I can't figure
    > out how, or how to fix it.[/color]

    The only way to fix it is to insert something that doesn't change
    between calls to your 'push_back' function. I am nor really sure
    what to recommend. An index should definitely be OK, but then
    you need a mechanism to extract the values from the vector...
    [color=blue]
    > (The goal of the TerminalList class is to be a container
    > of Terminals, but to allow very fast access to the list of
    > Terminals containing a certain 'syllable' or 'unino'. I'm
    > using 'lower_bound' and 'upper_bound' to extract a vector
    > of Terminals with a given syllable from myStrHash; please
    > tell me if you can see a better way, but this is unrelated
    > to the segfault bug, which is why I snipped the code that
    > does all that from the program below.)
    >
    > Thanks much,
    > -Arthur
    >
    > ==code begins==
    >
    > #include <cstdio>
    > #include <cctype>
    > #include <vector>
    > #include <set>
    > #include <string>
    > using namespace std;
    >
    > struct Terminal
    > {
    > int unino;
    > string syllable;
    > int d;
    >
    > Terminal(int u, const string& y, int c):
    > unino(u), syllable(y), d(c)
    > {}
    > };
    >
    > struct myStrPred {
    > bool operator() (Terminal* a, Terminal* b)
    > { return (a->syllable < b->syllable); }
    > };
    >
    > struct TerminalList
    > {
    > void push_back(const Terminal& t);
    >
    > typedef multiset<Termin al*, myStrPred> StrHashT;
    > vector<Terminal > myVector;
    > StrHashT myStrHash;
    > };
    >
    > void TerminalList::p ush_back(const Terminal& t)
    > {
    > myVector.push_b ack(t);[/color]

    This invalidates all references and pointers to elements of that
    vector _if_ reallocation occurs. In your case, reallocation does
    not occur until the third element is inserted (probably).
    [color=blue]
    > Terminal* pt = &myVector.back( );[/color]

    Here you take an address to an element of the vector. That address
    _can_ and _will_ be invalidated by the next push_back.
    [color=blue]
    >
    > printf("push_ba ck(%04x,%s)",
    > pt->unino, pt->syllable.c_str ());
    > fflush(stdout);
    >
    > myStrHash.inser t(pt);[/color]

    Here you attempt to preserve that address which can and will become
    invalid next time you call this function.
    [color=blue]
    >
    > printf(".\n");
    > }[/color]

    I'd rewrite your class as

    ------------------------ Only changed parts
    struct myStrPred {
    vector<Terminal >& container;
    myStrPred(vecto r<Terminal>& c) : container(c) {}
    bool operator() (int i1, int i2)
    { return (container[i1].syllable < container[i2].syllable); }
    };

    struct TerminalList
    {
    void push_back(const Terminal& t);

    typedef multiset<int, myStrPred> StrHashT;
    vector<Terminal > myVector;
    StrHashT myStrHash;

    TerminalList() : myStrHash(myStr Pred(myVector)) {}
    };

    void TerminalList::p ush_back(const Terminal& t)
    {
    int next_ind = myVector.size() ;
    myVector.push_b ack(t);

    printf("push_ba ck(%04x,%s)",
    myVector[next_ind].unino, myVector[next_ind].syllable.c_str ());
    fflush(stdout);

    myStrHash.inser t(next_ind);

    printf(".\n");
    }
    --------------------------------------------------------
    [color=blue]
    >
    > int main()
    > {
    > TerminalList characters;
    > characters.push _back(Terminal( 0x3402, "tian", 1));
    > characters.push _back(Terminal( 0x3401, "tian", 1));
    > characters.push _back(Terminal( 0x3405, "wu", 3));
    > return 0;
    > }
    >[/color]


    Comment

    • Joaquín Mª López Muñoz

      #3
      Re: multiset segfault


      "Arthur J. O'Dwyer" ha escrito:

      [segfault stuff deleted]
      [color=blue]
      > (The goal of the TerminalList class is to be a container
      > of Terminals, but to allow very fast access to the list of
      > Terminals containing a certain 'syllable' or 'unino'. I'm
      > using 'lower_bound' and 'upper_bound' to extract a vector
      > of Terminals with a given syllable from myStrHash; please
      > tell me if you can see a better way, but this is unrelated
      > to the segfault bug, which is why I snipped the code that
      > does all that from the program below.)
      >[/color]

      I've written a library called Boost.MultiInde x, to be promptly released
      in Boost 1.32, that can help you construct this sort of hybrid
      containers. Documentation can be consulted at



      In particular, the section "A bidirectional list with fast lookup" in
      the
      tutorial shows how to construct a container that, AFAICS, can
      effectively replace your TerminalList. Maybe this is helpful to you.

      Regards,

      Joaquín M López Muñoz
      Telefónica, Investigación y Desarrollo

      Comment

      • Arthur J. O'Dwyer

        #4
        Re: multiset segfault


        On Thu, 17 Jun 2004, [iso-8859-1] Joaquín Mª López Muñoz wrote:[color=blue]
        >
        > "Arthur J. O'Dwyer" ha escrito:[color=green]
        > > (The goal of the TerminalList class is to be a container
        > > of Terminals, but to allow very fast access to the list of
        > > Terminals containing a certain 'syllable' or 'unino'.[/color]
        >
        > I've written a library called Boost.MultiInde x, to be promptly released
        > in Boost 1.32, that can help you construct this sort of hybrid
        > containers. Documentation can be consulted at
        >
        > http://boost-consulting.com/boost/li...doc/index.html
        >
        > In particular, the section "A bidirectional list with fast lookup" in
        > the tutorial shows how to construct a container that, AFAICS, can
        > effectively replace your TerminalList. Maybe this is helpful to you.[/color]

        Yes and no. ;-( The 'multi_index_co ntainer' template *is* exactly
        what I'm looking for; I certainly hope Boost becomes at least a
        /de facto/ standard for C++ library support, if not part of the
        standard itself! But as I understand it, to use 'multi_index_co ntainer'
        I would need to download and install *all* of Boost (or at least the
        large portion of it #included by multi_index_con tainer.hpp) on all
        the machines on which I intended to compile my program. And that's
        a little much, for my humble purposes. :) [Being as there are three
        of them, and one of them is a Linux system on which I have neither
        administrative privileges nor megabytes of userspace to spare.]
        Thank you for bringing it to my attention, though; I'm going to
        have to start looking at what else Boost can do that I've been doing
        by hand. Maybe one of these days it will become worth my while to
        tie all my larger programs to Boost.

        Meanwhile, I'm still wondering what's wrong with my use of
        'multiset'...

        -Arthur

        Comment

        • Joaquín Mª López Muñoz

          #5
          Re: multiset segfault


          "Arthur J. O'Dwyer" ha escrito:
          [...]
          [color=blue]
          >
          > Yes and no. ;-( The 'multi_index_co ntainer' template *is* exactly
          > what I'm looking for; I certainly hope Boost becomes at least a
          > /de facto/ standard for C++ library support, if not part of the
          > standard itself! But as I understand it, to use 'multi_index_co ntainer'
          > I would need to download and install *all* of Boost (or at least the
          > large portion of it #included by multi_index_con tainer.hpp) on all
          > the machines on which I intended to compile my program.[/color]

          Yep, that's true.
          [color=blue]
          > And that's
          > a little much, for my humble purposes. :) [Being as there are three
          > of them, and one of them is a Linux system on which I have neither
          > administrative privileges nor megabytes of userspace to spare.][/color]

          Well, there's a recurrent problem with the entry barriers to using
          Boost, the main one being the huge size of the library.
          But then again, if your main obstacle is how to convince your manager
          to use Boost as part of the common development base, you can
          put the matter like this: How long will it take you to craft your
          TerminalList solution and debug it? A week? A week of your time
          is orders of magnitude more expensive than the HD space taken
          by Boost.

          That said, there's an utility called bcp for extracting libraries
          out of Boost, dragging their dependencies along:



          I can comment on how covnenient this tool is, but you might want to
          have a look at it.

          [color=blue]
          >
          > Thank you for bringing it to my attention, though; I'm going to
          > have to start looking at what else Boost can do that I've been doing
          > by hand. Maybe one of these days it will become worth my while to
          > tie all my larger programs to Boost.
          >
          > Meanwhile, I'm still wondering what's wrong with my use of
          > 'multiset'...[/color]

          Ummm... The problem lies in that you're storing pointers to a
          *vector* inside myStrHash. But these pointers are *not* stable,
          the elements inside myVector will be relocated when its size
          grows; in your code this happens at about the second insertion.
          You have two solutions:

          1. Use an std::list insiead of an std::vector. std::list references
          are stable, i.e elements won't be relocated over time.
          2. Redefine myStrHash to use indices (numbers) instead of
          pointers.

          I'd rather use #1, though.

          Joaquín M López Muñoz
          Telefónica, Investigación y Desarrollo

          Comment

          • Arthur J. O'Dwyer

            #6
            Re: multiset segfault


            On Thu, 17 Jun 2004, [iso-8859-1] Joaquín Mª López Muñoz wrote:[color=blue]
            >
            > "Arthur J. O'Dwyer" ha escrito:
            > [re: don't want to install Boost everywhere myself][color=green]
            > >[/color][/color]
            [color=blue]
            > Well, there's a recurrent problem with the entry barriers to using
            > Boost, the main one being the huge size of the library.
            > But then again, if your main obstacle is how to convince your manager
            > to use Boost as part of the common development base, you can
            > put the matter like this: How long will it take you to craft your
            > TerminalList solution and debug it? A week? A week of your time
            > is orders of magnitude more expensive than the HD space taken
            > by Boost.[/color]

            *If* that were my main obstacle. ;) The program in question is
            just a hobby project anyway.
            [color=blue]
            > That said, there's an utility called bcp for extracting libraries
            > out of Boost, dragging their dependencies along:
            >
            > http://boost-consulting.com/boost/tools/bcp/bcp.html[/color]

            I'll take a look.

            [color=blue][color=green]
            > > Meanwhile, I'm still wondering what's wrong with my use of
            > > 'multiset'...[/color]
            >
            > Ummm... The problem lies in that you're storing pointers to a
            > *vector* inside myStrHash. But these pointers are *not* stable,[/color]

            Oh, bother! See, I knew it was something simple... :(
            [color=blue]
            > the elements inside myVector will be relocated when its size
            > grows; in your code this happens at about the second insertion.
            > You have two solutions:
            >
            > 1. Use an std::list insiead of an std::vector. std::list references
            > are stable, i.e elements won't be relocated over time.[/color]

            Aha. This would be the easiest solution, then. Thanks.
            For future reference, are there any other STL containers for
            which the elements' addresses are guaranteed to be stable?
            And I seem to recall that in a std::vector, the elements are
            guaranteed to be in contiguous memory --- but this is not true
            of std::list --- is it true of any other containers?

            I've been relying mostly on the Dinkum C++ library reference
            page and on Google for my STL information. Do you have any
            suggestions of a better STL reference for the casual C++ user?
            [Read: one that might manage to warn me about dumb mistakes like
            that one.]

            -Arthur

            Comment

            • Joaquín Mª López Muñoz

              #7
              Re: multiset segfault



              "Arthur J. O'Dwyer" ha escrito:
              [...]
              [color=blue][color=green][color=darkred]
              > > > Meanwhile, I'm still wondering what's wrong with my use of
              > > > 'multiset'...[/color]
              > >
              > > Ummm... The problem lies in that you're storing pointers to a
              > > *vector* inside myStrHash. But these pointers are *not* stable,[/color]
              >
              > Oh, bother! See, I knew it was something simple... :(
              >[color=green]
              > > the elements inside myVector will be relocated when its size
              > > grows; in your code this happens at about the second insertion.
              > > You have two solutions:
              > >
              > > 1. Use an std::list insiead of an std::vector. std::list references
              > > are stable, i.e elements won't be relocated over time.[/color]
              >
              > Aha. This would be the easiest solution, then. Thanks.
              > For future reference, are there any other STL containers for
              > which the elements' addresses are guaranteed to be stable?
              > And I seem to recall that in a std::vector, the elements are
              > guaranteed to be in contiguous memory --- but this is not true
              > of std::list --- is it true of any other containers?[/color]

              std::vector is the only container guaranteeing that elements
              are in contiguous memory.

              [color=blue]
              >
              >
              > I've been relying mostly on the Dinkum C++ library reference
              > page and on Google for my STL information. Do you have any
              > suggestions of a better STL reference for the casual C++ user?
              > [Read: one that might manage to warn me about dumb mistakes like
              > that one.]
              >[/color]

              Well,

              * C++ FAQ Lite (http://www.parashift.com/c++-faq-lite/) contains
              some sections on STL containers, and it is anyway worth reading the
              whole contents even if not specifically dealing with STL.
              * The reference at cppreference (http://www.cppreference.com/cpp_stl.html)
              is a little more readable than Dinkumware's stuff.
              * Thinking in C++ (http://www.bruceeckel.com/) is a C++ programming
              guide downloadable for free. Vol II has some stuff on STL.

              In general, if you google for "STL tutorial" you'll find many more
              links. Beware to check the date of the docs you're reading: if it
              is prior to 1998 (when the standard was published) then you'd rather
              not use it, much info will be outdated.

              HTH

              Joaquín M López Muñoz
              Telefónica, Investigación y Desarrollo

              Comment

              • Arthur J. O'Dwyer

                #8
                Re: multiset segfault


                On Thu, 17 Jun 2004, Arthur J. O'Dwyer wrote:[color=blue]
                >[color=green]
                > > 1. Use an std::list instead of an std::vector. std::list references
                > > are stable, i.e elements won't be relocated over time.[/color]
                >
                > Aha. This would be the easiest solution, then. Thanks.[/color]

                FYI: changed from std::vector to std::list in the two places I
                used pointers to elements [thus ickifying a couple of loops that
                expected random access to the underlying vectors, but...], and
                the program now works just fine![1] Thanks for your help!



                -Arthur

                [1] - Modulo the ickiness of the source code; it's still a
                very rough version. Refactoring is coming... ;)

                Comment

                • P.J. Plauger

                  #9
                  Re: multiset segfault

                  "Joaquín Mª López Muñoz" <joaquin@tid.es > wrote in message
                  news:40D1C483.4 224A7D4@tid.es. ..
                  [color=blue]
                  > * The reference at cppreference (http://www.cppreference.com/cpp_stl.html)
                  > is a little more readable than Dinkumware's stuff.[/color]

                  It has more tutorial material, to be sure, but it it also riddled
                  with inaccuracies and omissions. You'll want to supplement it
                  with a good reference.

                  P.J. Plauger
                  Dinkumware, Ltd.



                  Comment

                  • Arthur J. O'Dwyer

                    #10
                    Re: multiset segfault


                    On Thu, 17 Jun 2004, P.J. Plauger wrote:[color=blue]
                    >
                    > "Joaquín Mª López Muñoz" <joaquin@tid.es > wrote in message
                    > news:40D1C483.4 224A7D4@tid.es. ..
                    >[color=green]
                    > > * The reference at cppreference (http://www.cppreference.com/cpp_stl.html)
                    > > is a little more readable than Dinkumware's stuff.[/color]
                    >
                    > It has more tutorial material, to be sure, but it it also riddled
                    > with inaccuracies and omissions. You'll want to supplement it
                    > with a good reference.[/color]

                    Yes; that was the site where I found

                    ^^^^^^^^[color=blue]
                    > The function insert() either:
                    > * inserts val after the element at pos, and returns an iterator
                    > to that element.
                    > * inserts a range of elements from start to end.
                    > * inserts val, but only if val doesn't already exist. The return[/color]
                    ^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^ ^^^[color=blue]
                    > value is an iterator to the element inserted, and a boolean describing
                    > whether an insertion took place.[/color]

                    That was a little disconcerting. :)

                    -Arthur

                    Comment

                    • Dave Townsend

                      #11
                      Re: multiset segfault


                      This odious piece of code is your trouble....
                      [color=blue]
                      > void TerminalList::p ush_back(const Terminal& t)
                      > {
                      > myVector.push_b ack(t);
                      > Terminal* pt = &myVector.back( );
                      >
                      > printf("push_ba ck(%04x,%s)",
                      > pt->unino, pt->syllable.c_str ());
                      > fflush(stdout);
                      >
                      > myStrHash.inser t(pt);
                      >
                      > printf(".\n");[/color]

                      It is not a good idea to take addresses of objects inside a vector for the
                      simple
                      reason as you insert more objects the objects are relocated in memory, and
                      the
                      pointer values you are holding onto no longer point to those objects. When
                      you
                      do the insert(pt), the compare function you provide will eventually hit one
                      of these
                      invalid pt values and you're a gonner sunshine.

                      dave



                      "Arthur J. O'Dwyer" <ajo@nospam.and rew.cmu.edu> wrote in message
                      news:Pine.LNX.4 .58-035.04061620580 30.10554@unix43 .andrew.cmu.edu ...[color=blue]
                      >
                      > I'm seeing a bug at the moment that I can't track down. It's
                      > part of a moderately large program, but here is a small program
                      > that exhibits the bug on gcc. (The program code follows at the
                      > bottom of the message.)
                      > The symptom is this:
                      >
                      > % g++ --version
                      > g++ (GCC) 3.2.1
                      > [...]
                      > % g++ -W -Wall -pedantic test.cpp
                      > % ./a.out
                      > push_back(3402, tian).
                      > push_back(3401, tian).
                      > push_back(3405, wu)Segmentation fault
                      > %
                      >
                      > So it appears something is going wrong in multiset::inser t.
                      > I'm not very familiar with the STL, so I suppose I'm abusing
                      > multiset in some way, but for the life of me I can't figure
                      > out how, or how to fix it.
                      > (The goal of the TerminalList class is to be a container
                      > of Terminals, but to allow very fast access to the list of
                      > Terminals containing a certain 'syllable' or 'unino'. I'm
                      > using 'lower_bound' and 'upper_bound' to extract a vector
                      > of Terminals with a given syllable from myStrHash; please
                      > tell me if you can see a better way, but this is unrelated
                      > to the segfault bug, which is why I snipped the code that
                      > does all that from the program below.)
                      >
                      > Thanks much,
                      > -Arthur
                      >
                      > ==code begins==
                      >
                      > #include <cstdio>
                      > #include <cctype>
                      > #include <vector>
                      > #include <set>
                      > #include <string>
                      > using namespace std;
                      >
                      > struct Terminal
                      > {
                      > int unino;
                      > string syllable;
                      > int d;
                      >
                      > Terminal(int u, const string& y, int c):
                      > unino(u), syllable(y), d(c)
                      > {}
                      > };
                      >
                      > struct myStrPred {
                      > bool operator() (Terminal* a, Terminal* b)
                      > { return (a->syllable < b->syllable); }
                      > };
                      >
                      > struct TerminalList
                      > {
                      > void push_back(const Terminal& t);
                      >
                      > typedef multiset<Termin al*, myStrPred> StrHashT;
                      > vector<Terminal > myVector;
                      > StrHashT myStrHash;
                      > };
                      >
                      > void TerminalList::p ush_back(const Terminal& t)
                      > {
                      > myVector.push_b ack(t);
                      > Terminal* pt = &myVector.back( );
                      >
                      > printf("push_ba ck(%04x,%s)",
                      > pt->unino, pt->syllable.c_str ());
                      > fflush(stdout);
                      >
                      > myStrHash.inser t(pt);
                      >
                      > printf(".\n");
                      > }
                      >
                      > int main()
                      > {
                      > TerminalList characters;
                      > characters.push _back(Terminal( 0x3402, "tian", 1));
                      > characters.push _back(Terminal( 0x3401, "tian", 1));
                      > characters.push _back(Terminal( 0x3405, "wu", 3));
                      > return 0;
                      > }
                      >[/color]


                      Comment

                      Working...