very large dictionary

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Simon Strobl

    #16
    Re: very large dictionary

    Have you considered that the operating system imposes per-process limits
    on memory usage? You say that your server has 128 GB of memory, but that
    doesn't mean the OS will make anything like that available.
    According to our system administrator, I can use all of the 128G.
    I thought it would be practical not to create the
    dictionary from a text file each time I needed it. I.e. I thought
    loading the .pyc-file should be faster. Yet, Python failed to create a
    .pyc-file
    >
    Probably a good example of premature optimization.
    Well, as I was using Python, I did not expect to have to care about
    the language's internal affairs that much. I thought I could simply do
    always the same no matter how large my files get. In other words, I
    thought Python was really scalable.
    Out of curiosity, how
    long does it take to create it from a text file?
    I do not remember this exactly. But I think it was not much more than
    an hour.



    Comment

    • Gabriel Genellina

      #17
      Re: very large dictionary

      En Mon, 04 Aug 2008 11:02:16 -0300, Simon Strobl <Simon.Strobl@g mail.com>
      escribió:
      I created a python file that contained the dictionary. The size of
      this file was 6.8GB. I thought it would be practical not to create the
      dictionary from a text file each time I needed it. I.e. I thought
      loading the .pyc-file should be faster. Yet, Python failed to create
      a .pyc-file
      Looks like the marshal format (used to create the .pyc file) can't handle
      sizes so big - and that limitation will stay for a while:

      So follow any of the previous suggestions and store your dictionary as
      data, not code.

      --
      Gabriel Genellina

      Comment

      • Steven D'Aprano

        #18
        Re: very large dictionary

        On Tue, 05 Aug 2008 01:20:08 -0700, Simon Strobl wrote:

        I thought it would be practical not to create the dictionary from a
        text file each time I needed it. I.e. I thought loading the .pyc-file
        should be faster. Yet, Python failed to create a .pyc-file
        >>
        >Probably a good example of premature optimization.
        >
        Well, as I was using Python, I did not expect to have to care about the
        language's internal affairs that much. I thought I could simply do
        always the same no matter how large my files get. In other words, I
        thought Python was really scalable.
        Yeah, it really is a pain when abstractions leak.

        There’s a key piece of magic in the engineering of the Internet which you rely on every single day. It happens in the TCP protocol, one of the fundamental building blocks of the Internet. TCP…


        >Out of curiosity, how
        >long does it take to create it from a text file?
        >
        I do not remember this exactly. But I think it was not much more than an
        hour.
        Hmmm... longer than I expected. Perhaps not as premature as I thought.
        Have you tried the performance of the pickle and marshal modules?



        --
        Steven

        Comment

        • Terry Reedy

          #19
          Re: very large dictionary



          Simon Strobl wrote:
          >
          Well, as I was using Python, I did not expect to have to care about
          the language's internal affairs that much. I thought I could simply do
          always the same no matter how large my files get. In other words, I
          thought Python was really scalable.
          Python the language is indefinitely scalable. Finite implementations
          are not. CPython is a C program compiled to a system executable. Most
          OSes run executables with a fairly limited call stack space.

          CPython programs are, when possible, cached as .pyc files. The
          existence and format of .pyc's is an internal affair of the CPython
          implementation. They are most definitely not a language requirement or
          language feature.

          Have you tried feeding multigigabytes source code files to other
          compilers? Most, if not all, could be broken by the 'right' big-enough
          code.

          tjr

          Comment

          • Bruno Desthuilliers

            #20
            Re: very large dictionary

            Simon Strobl a écrit :
            (snip)
            I would prefer to be able to use the same type of
            scripts with data of all sizes, though.
            Since computers have a limited RAM, this is to remain a wish. You can't
            obviously expect to deal with terabytes of data like you do with a 1kb
            text file.

            Comment

            • Jake Anderson

              #21
              Re: very large dictionary

              Bruno Desthuilliers wrote:
              Simon Strobl a écrit :
              (snip)
              I would prefer to be able to use the same type of
              >scripts with data of all sizes, though.
              >
              Since computers have a limited RAM, this is to remain a wish. You
              can't obviously expect to deal with terabytes of data like you do with
              a 1kb text file.
              --
              http://mail.python.org/mailman/listinfo/python-list
              You can, you just start off handling the multi GB case and your set.
              databases are really easy, I often use them for manipulating pretty
              small amounts of data because its just an easy way to group and join etc.

              Comment

              Working...