unicode memory usage

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Gary Robinson

    unicode memory usage

    We have an application which involves storing a lot of strings in RAM. It
    would be most convenient to use Unicode strings, but I am wary of doubling
    memory usage. My fear is based on the idea that unicode strings may take two
    bytes per character in order to accomodate non-ascii characters.

    But I don't know whether that's actually how Python strings work internally.

    So, my question: Do unicode strings in Python take substantially more memory
    than classic python strings or not, assuming the strings are generally 99%
    ASCII characters (but not 100%)?


    --Gary

    --
    Putting http://wecanstopspam.org in your email helps it pass through
    overzealous spam filters.

    Gary Robinson
    CEO
    Transpose, LLC
    grobinson@trans pose.com
    207-942-3463





  • Martin v. Löwis

    #2
    Re: unicode memory usage

    Gary Robinson wrote:
    [color=blue]
    > But I don't know whether that's actually how Python strings work internally.[/color]

    Python Unicode objects use normally 2 bytes per character, unless Python
    is built in UCS-4 mode, in which case they use 4 bytes per character.
    [color=blue]
    > So, my question: Do unicode strings in Python take substantially more memory
    > than classic python strings or not, assuming the strings are generally 99%
    > ASCII characters (but not 100%)?[/color]

    Yes; you can expect that 99% of the storage for characters are null
    bytes, then. Whether this is substantial depends on the total amount of
    storage that you need for string objects, compared to the storage needed
    for other things, or the storage available.

    Regards,
    Martin

    Comment

    Working...