difflib.ndiff broken?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Humpdydum

    difflib.ndiff broken?

    Can anyone try the following in their python interpreter?

    These give correct output:
    [color=blue][color=green][color=darkred]
    >>> print list(ndiff(['saving2 <<A'],['saving <<a>>']))[/color][/color][/color]
    ['- saving2 <<A', '? - ^\n', '+ saving <<a>>', '? ^^^\n'][color=blue][color=green][color=darkred]
    >>> print list(ndiff(['saving2 <<AA'],['saving <<a>>']))[/color][/color][/color]
    ['- saving2 <<AA', '? - ^^\n', '+ saving <<a>>', '? ^^^\n'][color=blue][color=green][color=darkred]
    >>> print list(ndiff(['saving2 <<A'],['saving <<aa>>']))[/color][/color][/color]
    ['- saving2 <<A', '? - ^\n', '+ saving <<aa>>', '? ^^^^\n'][color=blue][color=green][color=darkred]
    >>> print list(ndiff(['saving <<A'],['saving <<aa>>']))[/color][/color][/color]
    ['- saving <<A', '? ^\n', '+ saving <<aa>>', '? ^^^^\n']

    Now try the very slight variations:
    [color=blue][color=green][color=darkred]
    >>> print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))[/color][/color][/color]
    ['- saving2 <<AA', '+ saving <<aa>>'][color=blue][color=green][color=darkred]
    >>> print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))[/color][/color][/color]
    ['- saving2 <<AA', '+ saving <<aa>>']

    This can't be right... or is it? Where are the '? ...' lines? It does this
    for both Python 2.3.2 on Windows 2000 and Python 2.3.3 on SGI. If it's
    correct, how come???

    Oliver


  • Tim Peters

    #2
    Re: difflib.ndiff broken?

    [Humpdydum][color=blue]
    > Can anyone try the following in their python interpreter?
    >
    > These give correct output:
    >[color=green][color=darkred]
    > >>> print list(ndiff(['saving2 <<A'],['saving <<a>>']))[/color][/color]
    > ['- saving2 <<A', '? - ^\n', '+ saving <<a>>', '? ^^^\n'][color=green][color=darkred]
    > >>> print list(ndiff(['saving2 <<AA'],['saving <<a>>']))[/color][/color]
    > ['- saving2 <<AA', '? - ^^\n', '+ saving <<a>>', '? ^^^\n'][color=green][color=darkred]
    > >>> print list(ndiff(['saving2 <<A'],['saving <<aa>>']))[/color][/color]
    > ['- saving2 <<A', '? - ^\n', '+ saving <<aa>>', '? ^^^^\n'][color=green][color=darkred]
    > >>> print list(ndiff(['saving <<A'],['saving <<aa>>']))[/color][/color]
    > ['- saving <<A', '? ^\n', '+ saving <<aa>>', '? ^^^^\n']
    >
    > Now try the very slight variations:
    >[color=green][color=darkred]
    > >>> print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))[/color][/color]
    > ['- saving2 <<AA', '+ saving <<aa>>'][color=green][color=darkred]
    > >>> print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))[/color][/color]
    > ['- saving2 <<AA', '+ saving <<aa>>']
    >
    > This can't be right... or is it? Where are the '? ...' lines? It does this
    > for both Python 2.3.2 on Windows 2000 and Python 2.3.3 on SGI. If it's
    > correct, how come???[/color]

    ndiff produces intraline difference marking if and only if it thinks
    the inputs are "reasonably close". The cutoff between "reasonably
    close" and "not reasonably close" is necessarily heuristic. '?' lines
    are more irritating than helpful when they have a lot of markup in
    them, so it certainly wan't intended that '?' lines *always* be
    produced. The '+' and '-' lines contain all the information about how
    to change one sequence into another; the '?' lines are fluff (abeit
    sometimes helpful fluff -- that's why they're (sometimes) there).

    Concretely, ndiff produces intraline marking iff two lines have a
    similarity ratio of at least 0.75. In your first examples, the lines
    do:
    [color=blue][color=green][color=darkred]
    >>> import difflib
    >>> m = difflib.Sequenc eMatcher()
    >>> m.set_seqs('sav ing2 <<A', 'saving <<a>>')
    >>> print m.ratio()[/color][/color][/color]
    0.782608695652

    In your last examples, the lines don't:
    [color=blue][color=green][color=darkred]
    >>> m.set_seqs('sav ing2 <<AA', 'saving <<aa>>')
    >>> print m.ratio()[/color][/color][/color]
    0.72[color=blue][color=green][color=darkred]
    >>>[/color][/color][/color]

    Internally, 0.75 is the default value of FancyReplacer's optional
    minimal_cutoff argument.

    Comment

    • Humpdydum

      #3
      Re: difflib.ndiff broken?

      OK, forget it, sorry it was my mistake: it wasn't obvious from the difflib
      docs, but it appears that ndiff points out the sub-line differences (lines
      that start with ?) only if it was able to figure out operations that could
      be applied to substrings on the line. Though often such operations are
      obvious by looking at the strings being compared, ndiff doesn't always find
      them, and so marks the whole line as + or -.

      Anyone know of web site that explains ndiff output? I coulnd't figure out a
      good set of search terms in google, didn't get anything useful. Thanks,

      Oliver

      "Humpdydum" <oliver.schoenb orn@utoronto.ca > wrote in message
      news:cd6r1o$50e $1@nrc-news.nrc.ca...[color=blue]
      > Can anyone try the following in their python interpreter?
      >
      > These give correct output:
      >[color=green][color=darkred]
      > >>> print list(ndiff(['saving2 <<A'],['saving <<a>>']))[/color][/color]
      > ['- saving2 <<A', '? - ^\n', '+ saving <<a>>', '? ^^^\n'][color=green][color=darkred]
      > >>> print list(ndiff(['saving2 <<AA'],['saving <<a>>']))[/color][/color]
      > ['- saving2 <<AA', '? - ^^\n', '+ saving <<a>>', '?[/color]
      ^^^\n'][color=blue][color=green][color=darkred]
      > >>> print list(ndiff(['saving2 <<A'],['saving <<aa>>']))[/color][/color]
      > ['- saving2 <<A', '? - ^\n', '+ saving <<aa>>', '?[/color]
      ^^^^\n'][color=blue][color=green][color=darkred]
      > >>> print list(ndiff(['saving <<A'],['saving <<aa>>']))[/color][/color]
      > ['- saving <<A', '? ^\n', '+ saving <<aa>>', '? ^^^^\n']
      >
      > Now try the very slight variations:
      >[color=green][color=darkred]
      > >>> print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))[/color][/color]
      > ['- saving2 <<AA', '+ saving <<aa>>'][color=green][color=darkred]
      > >>> print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))[/color][/color]
      > ['- saving2 <<AA', '+ saving <<aa>>']
      >
      > This can't be right... or is it? Where are the '? ...' lines?[/color]


      Comment

      • Tim Peters

        #4
        Re: difflib.ndiff broken?

        [Humpdydum][color=blue]
        > OK, forget it, sorry it was my mistake:[/color]

        I didn't see a mistake, just a question.
        [color=blue]
        > it wasn't obvious from the difflib docs, but it appears that ndiff points out the
        > sub-line differences (lines that start with ?) only if it was able to figure out
        > operations that could be applied to substrings on the line. Though often such
        > operations are obvious by looking at the strings being compared,[/color]

        They can be for a program but often aren't for people. That's why
        ndiff produces '?' lines when it thinks they might help. This is a
        heuristic -- a guess. Sometimes it's not the same guess you'd make.
        There's always a sequence of operations that can be applied to change
        any line into any other line, but *usually* they're uninteresting.
        '?' lines attempt to point out "minor edits".
        [color=blue]
        > ndiff doesn't always find them, and so marks the whole line as + or -.[/color]

        It marks two input lines that differ with - and + regardless of
        whether it produces two ? lines too.
        [color=blue]
        > Anyone know of web site that explains ndiff output? I coulnd't figure out a
        > good set of search terms in google, didn't get anything useful. Thanks,[/color]

        ndiff is unique to Python, and you have the source code for it.
        Because '?' lines are fluff, precise docs for them would be
        counterproducti ve. They're meant to guide the eye to minor intraline
        differences, and that's all.

        If a ? line appears, there are always two of them, interleaved between
        a -+ pair, in this pattern:

        -
        ?
        +
        ?

        Each ? line implicitly refers to the line immediately above it. Four
        meaningful characters appear in ? lines. A caret (^) means the
        character immediately above it was replaced, in going from the - to
        the + line. "-" means the character immediately above it was deleted;
        '+' means it was inserted; and a blank means the character immediately
        above it is the same in both (- and +) lines. A '-' can appear only
        in the ? line following a - line, and a '+' can appear only in the ?
        line following a + line, because we're picturing the edits needed to
        change the - line into the + line.

        Comment

        Working...