python string comparison oddity

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Faheem Mitha

    python string comparison oddity


    Hi everybody,

    I was wondering if anyone can explain this. My understanding is that 'is'
    checks if the object is the same. However, in that case, why this
    inconsistency for short strings? I would expect a 'False' for all three
    comparisons. This is reproducible across two different machines, so it is
    not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
    default).
    Thanks, Faheem.

    In [1]: a = '--'

    In [2]: a is '--'
    Out[2]: False

    In [4]: a = '-'

    In [5]: a is '-'
    Out[5]: True

    In [6]: a = 'foo'

    In [7]: a is 'foo'
    Out[7]: True
  • Lie

    #2
    Re: python string comparison oddity

    On Jun 19, 2:26 am, Faheem Mitha <fah...@email.u nc.eduwrote:
    Hi everybody,
    >
    I was wondering if anyone can explain this. My understanding is that 'is'
    checks if the object is the same. However, in that case, why this
    inconsistency for short strings? I would expect a 'False' for all three
    comparisons. This is reproducible across two different machines, so it is
    not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
    default).
                                                                Thanks, Faheem.
    >
    In [1]: a = '--'
    >
    In [2]: a is '--'
    Out[2]: False
    >
    In [4]: a = '-'
    >
    In [5]: a is '-'
    Out[5]: True
    >
    In [6]: a = 'foo'
    >
    In [7]: a is 'foo'
    Out[7]: True
    Yes, this happens because of small objects caching. When small
    integers or short strings are created, there are possibility that they
    might refer to the same objects behind-the-scene. Don't rely on this
    behavior.

    Comment

    • Faheem Mitha

      #3
      Re: python string comparison oddity

      On Wed, 18 Jun 2008 12:57:44 -0700 (PDT), Lie <Lie.1296@gmail .comwrote:
      On Jun 19, 2:26 am, Faheem Mitha <fah...@email.u nc.eduwrote:
      >Hi everybody,
      >>
      >I was wondering if anyone can explain this. My understanding is that 'is'
      >checks if the object is the same. However, in that case, why this
      >inconsistenc y for short strings? I would expect a 'False' for all three
      >comparisons. This is reproducible across two different machines, so it is
      >not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
      >default).
      >                                                            Thanks, Faheem.
      >>
      >In [1]: a = '--'
      >>
      >In [2]: a is '--'
      >Out[2]: False
      >>
      >In [4]: a = '-'
      >>
      >In [5]: a is '-'
      >Out[5]: True
      >>
      >In [6]: a = 'foo'
      >>
      >In [7]: a is 'foo'
      >Out[7]: True
      >
      Yes, this happens because of small objects caching. When small
      integers or short strings are created, there are possibility that they
      might refer to the same objects behind-the-scene. Don't rely on this
      behavior.
      Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
      the basis of the choice is?
      Faheem.

      Comment

      • Robert Kern

        #4
        Re: python string comparison oddity

        Faheem Mitha wrote:
        On Wed, 18 Jun 2008 12:57:44 -0700 (PDT), Lie <Lie.1296@gmail .comwrote:
        >On Jun 19, 2:26 am, Faheem Mitha <fah...@email.u nc.eduwrote:
        >>Hi everybody,
        >>>
        >>I was wondering if anyone can explain this. My understanding is that 'is'
        >>checks if the object is the same. However, in that case, why this
        >>inconsisten cy for short strings? I would expect a 'False' for all three
        >>comparisons . This is reproducible across two different machines, so it is
        >>not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
        >>default).
        >> Thanks, Faheem.
        >>>
        >>In [1]: a = '--'
        >>>
        >>In [2]: a is '--'
        >>Out[2]: False
        >>>
        >>In [4]: a = '-'
        >>>
        >>In [5]: a is '-'
        >>Out[5]: True
        >>>
        >>In [6]: a = 'foo'
        >>>
        >>In [7]: a is 'foo'
        >>Out[7]: True
        >Yes, this happens because of small objects caching. When small
        >integers or short strings are created, there are possibility that they
        >might refer to the same objects behind-the-scene. Don't rely on this
        >behavior.
        >
        Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
        the basis of the choice is?
        Shortish Python identifiers and operators, I think. Plus a handful like '\x00'.
        The source would know for sure, but alas, I am lazy.

        --
        Robert Kern

        "I have come to believe that the whole world is an enigma, a harmless enigma
        that is made terrible by our own mad attempt to interpret it as though it had
        an underlying truth."
        -- Umberto Eco

        Comment

        • Lie

          #5
          Re: python string comparison oddity

          On Jun 19, 5:13 am, Faheem Mitha <fah...@email.u nc.eduwrote:
          On Wed, 18 Jun 2008 12:57:44 -0700 (PDT), Lie <Lie.1...@gmail .comwrote:
          On Jun 19, 2:26 am, Faheem Mitha <fah...@email.u nc.eduwrote:
          Hi everybody,
          >
          I was wondering if anyone can explain this. My understanding is that 'is'
          checks if the object is the same. However, in that case, why this
          inconsistency for short strings? I would expect a 'False' for all three
          comparisons. This is reproducible across two different machines, so itis
          not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
          default).
                                                                      Thanks, Faheem.
          >
          In [1]: a = '--'
          >
          In [2]: a is '--'
          Out[2]: False
          >
          In [4]: a = '-'
          >
          In [5]: a is '-'
          Out[5]: True
          >
          In [6]: a = 'foo'
          >
          In [7]: a is 'foo'
          Out[7]: True
          >
          Yes, this happens because of small objects caching. When small
          integers or short strings are created, there are possibility that they
          might refer to the same objects behind-the-scene. Don't rely on this
          behavior.
          >
          Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
          the basis of the choice is?
                                                                       Faheem.
          Yes, but we're already warned not to rely on it since the basis of
          what may be cached and what-not might be arbitrary. Personally, I'd
          not delve deeply into them, they aren't a reliable behavior.

          Comment

          • Duncan Booth

            #6
            Re: python string comparison oddity

            Faheem Mitha <faheem@email.u nc.eduwrote:
            >>In [1]: a = '--'
            >>>
            >>In [2]: a is '--'
            >>Out[2]: False
            >>>
            >>In [4]: a = '-'
            >>>
            >>In [5]: a is '-'
            >>Out[5]: True
            >>>
            >>In [6]: a = 'foo'
            >>>
            >>In [7]: a is 'foo'
            >>Out[7]: True
            >>
            >Yes, this happens because of small objects caching. When small
            >integers or short strings are created, there are possibility that they
            >might refer to the same objects behind-the-scene. Don't rely on this
            >behavior.
            >
            Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
            the basis of the choice is?
            Also note that the behaviour you saw above changes if you put code into a
            script rather than running it interactively (the string '--' will be re-
            used within a single compilation unit). So even if you understand all of
            the choices made in your particular release of Python (and they do vary
            between releases) it would be very unwise to rely on this behaviour.

            --
            Duncan Booth http://kupuguy.blogspot.com

            Comment

            • Hrvoje Niksic

              #7
              Re: python string comparison oddity

              Faheem Mitha <faheem@email.u nc.eduwrites:
              Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
              the basis of the choice is?
              Caches such as intern dictionary/set and one-character cache are
              specific to the implementation (and also to its version version,
              etc.). In this case '-' is a 1-character string, all of which are
              cached. Python also interns strings that show up in Python source as
              literals that can be interpreted as identifiers. It also reuses
              string literals within a single expression. None of this should be
              relied on, but it's interesting to get insight into the implementation
              by examining the different cases:
              >>'--' is '--'
              True # string repeated within an expression is simply reused
              >>a = '--'
              >>b = '--'
              >>a is b
              False # not cached
              >>a = '-'
              >>b = '-'
              >>a is b
              False # all 1-character strings are cached
              >>a = 'flobozz'
              >>b = 'flobozz'
              >>a is b
              True # flobozz is a valid identifier, so it's cached
              >>a = 'flo-bozz'
              >>b = 'flo-bozz'
              >>a is b
              False

              Comment

              Working...