word shifts

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • dave

    word shifts

    Hello,

    I made a function that takes a word list (one word per line, text file)
    and searches for all the words in the list that are 'shifts' of
    eachother. 'abc' shifted 1 is 'bcd'

    Please take a look and tell me if this is a viable solution.

    def shift(word, amt):
    ans = ''
    for letter in word:
    ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 + ord('a'))
    return ans

    def fileshift(x):
    fin = open(x)
    d = {}
    for line in fin:
    d[line.strip()] = [1]
    for i in range(1, 26):
    ite = shift(line.stri p(), i)
    if ite in d:
    print ite


    Any tips/suggestions/critiques greatly appreciated.. I'm trying to
    teach myself Python (and still beginning) and would love any helpful
    info.

    thanks!

  • Gabriel Genellina

    #2
    Re: word shifts

    En Sun, 04 May 2008 02:17:07 -0300, dave <squareswallowe r@invalid.comes cribió:
    Hello,
    >
    I made a function that takes a word list (one word per line, text file)
    and searches for all the words in the list that are 'shifts' of
    eachother. 'abc' shifted 1 is 'bcd'
    >
    Please take a look and tell me if this is a viable solution.
    >
    def shift(word, amt):
    ans = ''
    for letter in word:
    ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 + ord('a'))
    return ans
    >
    def fileshift(x):
    fin = open(x)
    d = {}
    for line in fin:
    d[line.strip()] = [1]
    for i in range(1, 26):
    ite = shift(line.stri p(), i)
    if ite in d:
    print ite
    >
    >
    Any tips/suggestions/critiques greatly appreciated.. I'm trying to
    teach myself Python (and still beginning) and would love any helpful
    info.
    First, looking at the code, you're evaluating line.strip() a lot of times; I'd avoid it.
    Looks like you're using a dictionary just for the keys - the set type is more adequate here (see http://docs.python.org/lib/types-set.html ). In any case, I'd use a value like None instead of [1]
    But I'd use a different algorithm. Instead of generating all posible shifts for a given word, I'd substract the newly read word from each previous words (here, "substract two words" means substract the character code for corresponding positions, modulo 26). Shifted words, when substracted, have the same number on all positions.

    --
    Gabriel Genellina

    Comment

    • George Sakkis

      #3
      Re: word shifts

      On May 4, 2:04 am, "Gabriel Genellina" <gagsl-...@yahoo.com.a rwrote:
      En Sun, 04 May 2008 02:17:07 -0300, dave <squareswallo.. .@invalid.comes cribió:
      >
      >
      >
      Hello,
      >
      I made a function that takes a word list (one word per line, text file)
      and searches for all the words in the list that are 'shifts' of
      eachother.  'abc' shifted 1 is 'bcd'
      >
      Please take a look and tell me if this is a viable solution.
      >
       def shift(word, amt):
         ans = ''
         for letter in word:
                 ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 + ord('a'))
         return ans
      >
      def fileshift(x):
         fin = open(x)
         d = {}
         for line in fin:
                 d[line.strip()] = [1]
                 for i in range(1, 26):
                         ite = shift(line.stri p(), i)
                         if ite in d:
                                 print ite
      >
      Any tips/suggestions/critiques greatly appreciated.. I'm trying to
      teach myself Python (and still beginning) and would love any helpful
      info.
      >
      First, looking at the code, you're evaluating line.strip() a lot of times;I'd avoid it.
      Looks like you're using a dictionary just for the keys - the set type is more adequate here (seehttp://docs.python.org/lib/types-set.html). In any case, I'd use a value like None instead of [1]
      But I'd use a different algorithm. Instead of generating all posible shifts for a given word, I'd substract the newly read word from each previous words (here, "substract two words" means substract the character code for corresponding positions, modulo 26). Shifted words, when substracted, have the same number on all positions.

      A faster algorithm is to create a 'key' for each word, defined as the
      tuple of ord differences (modulo 26) of consecutive characters. E.g.
      the key of 'acf' is (2,3); 'c' is 2 positions after 'a' and 'f' 3
      positions after 'c'. Shifted words (and only these) have the same key.
      Here's a straightforward implementation that generates all the lists
      of equally-shifted words:

      from collections import defaultdict

      def iter_shifted(wo rds):
      key2shifted = defaultdict(lis t)
      for word in words:
      ords = map(ord,word)
      key = tuple((ords[i]-ords[i-1]) % 26
      for i in xrange(1,len(or ds)))
      key2shifted[key].append(word)
      for shifted in key2shifted.ite rvalues():
      if len(shifted) 1:
      yield shifted

      if __name__ == '__main__':
      words = 'abc bef jas cba cde zab azy hkl'.split()
      for shifted in iter_shifted(wo rds):
      print shifted

      # *** output ***
      #['bef', 'hkl']
      #['abc', 'cde', 'zab']
      #['cba', 'azy']

      Comment

      • Peter Otten

        #4
        Re: word shifts

        dave wrote:
        I made a function that takes a word list (one word per line, text file)
        and searches for all the words in the list that are 'shifts' of
        eachother.  'abc' shifted 1 is 'bcd'
        >
        Please take a look and tell me if this is a viable solution.
        >
         def shift(word, amt):
                ans = ''
                for letter in word:
                        ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 +
        ord('a'))
                return ans
        >
        def fileshift(x):
                fin = open(x)
                d = {}
                for line in fin:
                        d[line.strip()] = [1]
                        for i in range(1, 26):
                                ite = shift(line.stri p(), i)
                                if ite in d:
                                        print ite
        >
        >
        Any tips/suggestions/critiques greatly appreciated.. I'm trying to
        teach myself Python (and still beginning) and would love any helpful
        info.
        My ideas:

        Move the open() call out of fileshift() and have it working with arbitrary
        iterables.
        Store a normalized version of the word in the dictionary. if every word in
        the dict is guaranteed to start with an "a" you can reduce the number of
        tests to one.
        If you have many words you can build a fast shift() function based on
        str.translate() .

        Invoke the following script both with and without a filename argument:

        from string import ascii_lowercase as chars, maketrans

        tables = [maketrans(chars , chars[i:] + chars[:i]) for i in range(26)]

        def shift(word, n):
        return word.translate( tables[n%26])

        def normalize(word) :
        a = word[0]
        return shift(word, ord("a") - ord(a))

        def findshifted(wor ds):
        d = {}
        for word in words:
        d.setdefault(no rmalize(word), []).append(word)
        return d


        if __name__ == "__main__":
        import sys
        args = sys.argv[1:]
        if args:
        words = (line.strip() for line in open(args[0]))
        else:
        import random
        sample = """\
        alpha
        beta
        gamma
        delta
        """.split()

        words = sample + [shift(random.ch oice(sample), random.randrang e(26))
        for _ in range(10)]

        items = findshifted(wor ds).items()
        items.sort()
        for k, v in items:
        print k, "-->", v

        Peter

        Comment

        • Arnaud Delobelle

          #5
          Re: word shifts

          dave <squareswallowe r@invalid.comwr ites:
          Hello,
          >
          I made a function that takes a word list (one word per line, text
          file) and searches for all the words in the list that are 'shifts' of
          eachother. 'abc' shifted 1 is 'bcd'
          >
          Please take a look and tell me if this is a viable solution.
          >
          def shift(word, amt):
          ans = ''
          for letter in word:
          ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 + ord('a'))
          return ans
          In Python, if you want to build a string from lots of parts you can
          use ''.join(parts). I think it is considered more efficient.

          >
          def fileshift(x):
          fin = open(x)
          d = {}
          for line in fin:
          d[line.strip()] = [1]
          for i in range(1, 26):
          ite = shift(line.stri p(), i)
          if ite in d:
          print ite
          >

          You could write:

          line = line.strip()

          after your for line in fin: statement, rather than strip the line twice.

          Let me suggest a different solution base on the str.translate() method

          from string import lowercase, maketrans

          shifted_lowerca se = lowercase[1:] +lowercase[0]
          table = string.maketran s(lowercase, shifted_lowerca se)

          def shift1(line):
          return line.translate( table)

          def fileshift(path) :
          lines = set()
          for line in open(path):
          lines.add(line)
          for i in xrange(25):
          line = shift1(line)
          if line in lines:
          print line

          (untested)

          --
          Arnaud

          Comment

          • bearophileHUGS@lycos.com

            #6
            Re: word shifts

            George Sakkis:
            A faster algorithm is to create a 'key' for each word, defined as the
            tuple of ord differences (modulo 26) of consecutive characters.
            Very nice solution, it uses the same strategy used to find anagrams,
            where keys are
            "".join(sorted( word))
            Such general strategy to look for a possible invariant key for the
            subsets of the required solutions is quite useful.

            Bye,
            bearophile

            Comment

            • Arnaud Delobelle

              #7
              Re: word shifts

              bearophileHUGS@ lycos.com writes:
              George Sakkis:
              >A faster algorithm is to create a 'key' for each word, defined as the
              >tuple of ord differences (modulo 26) of consecutive characters.
              >
              Very nice solution, it uses the same strategy used to find anagrams,
              where keys are
              "".join(sorted( word))
              Such general strategy to look for a possible invariant key for the
              subsets of the required solutions is quite useful.
              Or even:

              def get_key(word):
              initial = ord(word[0])
              return tuple((ord(x) - initial) % 26 for x in word[1:])

              --
              Arnaud

              Comment

              • Gabriel Genellina

                #8
                Re: word shifts

                En Sun, 04 May 2008 03:35:05 -0300, George Sakkis <george.sakkis@ gmail.comescrib ió:
                On May 4, 2:04 am, "Gabriel Genellina" <gagsl-...@yahoo.com.a rwrote:
                >En Sun, 04 May 2008 02:17:07 -0300, dave <squareswallo.. .@invalid.comes cribió:
                >>
                I made a function that takes a word list (one word per line, text file)
                and searches for all the words in the list that are 'shifts' of
                eachother.  'abc' shifted 1 is 'bcd'
                >>
                >But I'd use a different algorithm. Instead of generating all posible shifts for a given word, I'd substract the newly read word from each previous words (here, "substract two words" means substract the character code for corresponding positions, modulo 26). Shifted words, when substracted, have the same number on all positions.
                >
                A faster algorithm is to create a 'key' for each word, defined as the
                tuple of ord differences (modulo 26) of consecutive characters. E.g.
                the key of 'acf' is (2,3); 'c' is 2 positions after 'a' and 'f' 3
                positions after 'c'. Shifted words (and only these) have the same key.
                Much better! I like it.

                --
                Gabriel Genellina

                Comment

                • dave

                  #9
                  Re: word shifts

                  On 2008-05-04 01:10:40 -0600, Arnaud Delobelle <arnodel@google mail.comsaid:
                  dave <squareswallowe r@invalid.comwr ites:
                  >
                  >Hello,
                  >>
                  >I made a function that takes a word list (one word per line, text
                  >file) and searches for all the words in the list that are 'shifts' of
                  >eachother. 'abc' shifted 1 is 'bcd'
                  >>
                  >Please take a look and tell me if this is a viable solution.
                  >>
                  >def shift(word, amt):
                  > ans = ''
                  > for letter in word:
                  > ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 + ord('a'))
                  > return ans
                  >
                  In Python, if you want to build a string from lots of parts you can
                  use ''.join(parts). I think it is considered more efficient.

                  what would be the best way to write a "ans = ans + chr" into a
                  ''.join(parts) ??


                  Comment

                  • George Sakkis

                    #10
                    Re: word shifts

                    On May 5, 11:02 pm, dave <squareswallo.. .@yahoo.comwrot e:
                    On 2008-05-04 01:10:40 -0600, Arnaud Delobelle <arno...@google mail.comsaid:
                    >
                    >
                    >
                    dave <squareswallo.. .@invalid.comwr ites:
                    >
                    Hello,
                    >
                    I made a function that takes a word list (one word per line, text
                    file) and searches for all the words in the list that are 'shifts' of
                    eachother. 'abc' shifted 1 is 'bcd'
                    >
                    Please take a look and tell me if this is a viable solution.
                    >
                    def shift(word, amt):
                    ans = ''
                    for letter in word:
                    ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 + ord('a'))
                    return ans
                    >
                    In Python, if you want to build a string from lots of parts you can
                    use ''.join(parts). I think it is considered more efficient.
                    >
                    what would be the best way to write a "ans = ans + chr" into a
                    ''.join(parts) ??
                    Well if you do it once, that would be simply "ans += chr". Arnaud was
                    referring to the case where you do it in a loop, like in your snippet.
                    A direct translation to use ''.join would be:

                    ans = ''.join(chr((or d(letter) - ord('a') + amt) % 26 + ord('a'))
                    for letter in word)

                    Of course it is simpler and more efficient if you factor out of the
                    loop the subexpressions that don't need to be recomputed:

                    ord_a = ord('a')
                    shift = ord_a - amt
                    ans = ''.join(chr((or d(letter) - shift) % 26 + ord_a)
                    for letter in word)

                    George

                    Comment

                    • Arnaud Delobelle

                      #11
                      Re: word shifts

                      George Sakkis <george.sakkis@ gmail.comwrites :
                      On May 5, 11:02 pm, dave <squareswallo.. .@yahoo.comwrot e:
                      >On 2008-05-04 01:10:40 -0600, Arnaud Delobelle <arno...@google mail.comsaid:
                      >>
                      >>
                      >>
                      dave <squareswallo.. .@invalid.comwr ites:
                      >>
                      >Hello,
                      >>
                      >I made a function that takes a word list (one word per line, text
                      >file) and searches for all the words in the list that are 'shifts' of
                      >eachother. 'abc' shifted 1 is 'bcd'
                      >>
                      >Please take a look and tell me if this is a viable solution.
                      >>
                      >def shift(word, amt):
                      > ans = ''
                      > for letter in word:
                      > ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 + ord('a'))
                      > return ans
                      >>
                      In Python, if you want to build a string from lots of parts you can
                      use ''.join(parts). I think it is considered more efficient.
                      >>
                      >what would be the best way to write a "ans = ans + chr" into a
                      >''.join(part s) ??
                      >
                      Well if you do it once, that would be simply "ans += chr". Arnaud was
                      referring to the case where you do it in a loop, like in your snippet.
                      A direct translation to use ''.join would be:
                      >
                      ans = ''.join(chr((or d(letter) - ord('a') + amt) % 26 + ord('a'))
                      for letter in word)
                      >
                      Of course it is simpler and more efficient if you factor out of the
                      loop the subexpressions that don't need to be recomputed:
                      >
                      ord_a = ord('a')
                      shift = ord_a - amt
                      ans = ''.join(chr((or d(letter) - shift) % 26 + ord_a)
                      for letter in word)
                      >
                      George
                      Or keep the same layout as your original effort, but build a list of
                      chars instead of a string, then join them all at the end.

                      def shift(word, amt):
                      shifted = ''
                      ord_a = ord('a')
                      shift = ord_a - amt
                      for letter in word:
                      shifted.append( chr((ord(letter ) - shift) % 26 + ord_a))
                      return ''.join(shifted )

                      (Which exactly the same as George's snippet, except that the generator
                      expression is unwound)

                      --
                      Arnaud

                      Comment

                      Working...