Python Data Utils

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jesse Aldridge

    Python Data Utils

    In an effort to experiment with open source, I put a couple of my
    utility files up <a href="http://github.com/jessald/python_data_uti ls/
    tree/master">here</a>. What do you think?
  • Gabriel Genellina

    #2
    Re: Python Data Utils

    En Sun, 06 Apr 2008 01:43:29 -0300, Jesse Aldridge
    <JesseAldridge@ gmail.comescrib ió:
    In an effort to experiment with open source, I put a couple of my
    utility files up <a href="http://github.com/jessald/python_data_uti ls/
    tree/master">here</a>. What do you think?
    Some names are a bit obscure - "universify "?
    Docstrings would help too, and blank lines, and in general following PEP8
    style guide.
    find_string is a much slower version of the find method of string objects,
    same for find_string_las t, contains and others.
    And I don't see what you gain from things like:
    def count( s, sub ):
    return s.count( sub )
    it's slower and harder to read (because one has to *know* what S.count
    does).
    Other functions may be useful but without even a docstring it's hard to
    tell what they do.
    delete_string, as a function, looks like it should delete some string, not
    return a character; I'd use a string constant DELETE_CHAR, or just DEL,
    it's name in ASCII.

    In general, None should be compared using `is` instead of `==`, and
    instead of `type(x) is type(0)` or `type(x) == type(0)` I'd use
    `isinstance(x, int)` (unless you use Python 2.1 or older, int, float, str,
    list... are types themselves)

    Files.py is similar - a lot of more or less common things with a different
    name, and a few wheels reinvented :)

    Don't feel bad, but I would not use those modules because there is no net
    gain, and even a loss in legibility. If you develop your code alone,
    that's fine, you know what you wrote and can use it whenever you please.
    But for others to use it, it means that they have to learn new ways to say
    the same old thing.

    --
    Gabriel Genellina

    Comment

    • Konstantin Veretennicov

      #3
      Re: Python Data Utils

      On Sun, Apr 6, 2008 at 7:43 AM, Jesse Aldridge <JesseAldridge@ gmail.comwrote:
      In an effort to experiment with open source, I put a couple of my
      utility files up <a href="http://github.com/jessald/python_data_uti ls/
      tree/master">here</a>. What do you think?
      Would you search for, install, learn and use these modules if *someone
      else* created them?

      --
      kv

      Comment

      • Jesse Aldridge

        #4
        Re: Python Data Utils

        Thanks for the detailed feedback. I made a lot of modifications based
        on your advice. Mind taking another look?
        Some names are a bit obscure - "universify "?
        Docstrings would help too, and blank lines
        I changed the name of universify and added a docstrings to every
        function.
        ...PEP8
        I made a few changes in this direction, feel free to take it the rest
        of the way ;)
        find_string is a much slower version of the find method of string objects, 
        Got rid of find_string, and contains. What are the others?
        And I don't see what you gain from things like:
        def count( s, sub ):
             return s.count( sub )
        Yeah, got rid of that stuff too. I ported these files from Java a
        while ago, so there was a bit of junk like this lying around.
        delete_string, as a function, looks like it should delete some string, not 
        return a character; I'd use a string constant DELETE_CHAR, or just DEL,  
        it's name in ASCII.
        Got rid of that too :)
        In general, None should be compared using `is` instead of `==`, and  
        instead of `type(x) is type(0)` or `type(x) == type(0)` I'd use  
        `isinstance(x, int)` (unless you use Python 2.1 or older, int, float, str, 
        list... are types themselves)
        Changed.

        So, yeah, hopefully things are better now.

        Soon developers will flock from all over the world to build this into
        the greatest data manipulation library the world has ever seen! ...or
        not...

        I'm tired. Making code for other people is too much work :)

        Comment

        • Jesse Aldridge

          #5
          Re: Python Data Utils

          On Apr 6, 6:14 am, "Konstantin Veretennicov" <kveretenni...@ gmail.com>
          wrote:
          On Sun, Apr 6, 2008 at 7:43 AM, Jesse Aldridge <JesseAldri...@ gmail.comwrote:
          In an effort to experiment with open source, I put a couple of my
           utility files up <a href="http://github.com/jessald/python_data_uti ls/
           tree/master">here</a>.  What do you think?
          >
          Would you search for, install, learn and use these modules if *someone
          else* created them?
          >
          --
          kv
          Yes, I would. I searched a bit for a library that offered similar
          functionality. I didn't find anything. Maybe I'm just looking in the
          wrong place. Any suggestions?

          Comment

          • Jesse Aldridge

            #6
            Re: Python Data Utils

            Docstrings go *after* the def statement.

            Fixed.
            changing "( " to "(" and " )" to ")".
            Changed.


            I attempted to take out everything that could be trivially implemented
            with the standard library.
            This has left me with... 4 functions in S.py. 1 one of them is used
            internally, and the others aren't terribly awesome :\ But I think the
            ones that remain are at least a bit useful :)
            The penny drops :-)
            yeah, yeah
            Not in all places ... look at the ends_with function. BTW, this should
            be named something like "fuzzy_ends_wit h".
            fixed
            fuzzy_match(Non e, None) should return False.
            changed
            2. make_fuzzy function: first two statements should read "s =
            s.replace(..... )" instead of "s.replace(.... .)".
            fixed
            3. Fuzzy matching functions are specialised to an application; I can't
            imagine that anyone would be particularly interested in those that you
            provide.
            I think it's useful in many cases. I use it all the time. It helps
            guard against annoying input errors.
            A basic string normalisation-before-comparison function would
            usefully include replacing multiple internal whitespace characters by
            a single space.
            I added this functionality.

            5. Casual inspection of your indentation function gave the impression
            that it was stuffed
            Fixed

            Thanks for the feedback.

            Comment

            • John Machin

              #7
              Re: Python Data Utils

              On Apr 7, 4:22 pm, Jesse Aldridge <JesseAldri...@ gmail.comwrote:
              >
              changing "( " to "(" and " )" to ")".
              >
              Changed.
              But then you introduced more.
              >
              I attempted to take out everything that could be trivially implemented
              with the standard library.
              This has left me with... 4 functions in S.py.  1 one of them is used
              internally, and the others aren't terribly awesome :\  But I think the
              ones that remain are at least a bit useful :)
              If you want to look at stuff that can't be implemented trivially using
              str/unicode methods, and is more than a bit useful, google for
              mxTextTools.
              >
              A basic string normalisation-before-comparison function would
              usefully include replacing multiple internal whitespace characters by
              a single space.
              >
              I added this functionality.
              Not quite. I said "whitespace ", not "space".

              The following is the standard Python idiom for removing leading and
              trailing whitespace and replacing one or more whitespace characters
              with a single space:

              def normalise_white space(s):
              return ' '.join(s.split( ))

              If your data is obtained by web scraping, you may find some people use
              '\xA0' aka NBSP to pad out fields. The above code will get rid of
              these if s is unicode; if s is str, you need to chuck
              a .replace('\xA0' , ' ') in there somewhere.

              HTH,
              John

              Comment

              Working...