word shifts

**Gabriel Genellina** · Jun 27 '08, 04:23 PM

Re: word shifts

En Sun, 04 May 2008 02:17:07 -0300, dave <squareswallowe r@invalid.comes cribió:

Hello,
>
I made a function that takes a word list (one word per line, text file)
and searches for all the words in the list that are 'shifts' of
eachother. 'abc' shifted 1 is 'bcd'
>
Please take a look and tell me if this is a viable solution.
>
def shift(word, amt):
ans = ''
for letter in word:
ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 + ord('a'))
return ans
>
def fileshift(x):
fin = open(x)
d = {}
for line in fin:
d[line.strip()] = [1]
for i in range(1, 26):
ite = shift(line.stri p(), i)
if ite in d:
print ite
>
>
Any tips/suggestions/critiques greatly appreciated.. I'm trying to
teach myself Python (and still beginning) and would love any helpful
info.

First, looking at the code, you're evaluating line.strip() a lot of times; I'd avoid it.
Looks like you're using a dictionary just for the keys - the set type is more adequate here (see http://docs.python.org/lib/types-set.html ). In any case, I'd use a value like None instead of [1]
But I'd use a different algorithm. Instead of generating all posible shifts for a given word, I'd substract the newly read word from each previous words (here, "substract two words" means substract the character code for corresponding positions, modulo 26). Shifted words, when substracted, have the same number on all positions.

--
Gabriel Genellina

**George Sakkis** · Jun 27 '08, 04:23 PM

Re: word shifts

On May 4, 2:04 am, "Gabriel Genellina" <gagsl-...@yahoo.com.a rwrote:

En Sun, 04 May 2008 02:17:07 -0300, dave <squareswallo.. .@invalid.comes cribió:
>
>
>

Hello,

>

I made a function that takes a word list (one word per line, text file)
and searches for all the words in the list that are 'shifts' of
eachother. 'abc' shifted 1 is 'bcd'

>

Please take a look and tell me if this is a viable solution.

>

def shift(word, amt):
ans = ''
for letter in word:
ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 + ord('a'))
return ans

>

def fileshift(x):
fin = open(x)
d = {}
for line in fin:
d[line.strip()] = [1]
for i in range(1, 26):
ite = shift(line.stri p(), i)
if ite in d:
print ite

>

Any tips/suggestions/critiques greatly appreciated.. I'm trying to
teach myself Python (and still beginning) and would love any helpful
info.

>
First, looking at the code, you're evaluating line.strip() a lot of times;I'd avoid it.
Looks like you're using a dictionary just for the keys - the set type is more adequate here (seehttp://docs.python.org/lib/types-set.html). In any case, I'd use a value like None instead of [1]
But I'd use a different algorithm. Instead of generating all posible shifts for a given word, I'd substract the newly read word from each previous words (here, "substract two words" means substract the character code for corresponding positions, modulo 26). Shifted words, when substracted, have the same number on all positions.

A faster algorithm is to create a 'key' for each word, defined as the
tuple of ord differences (modulo 26) of consecutive characters. E.g.
the key of 'acf' is (2,3); 'c' is 2 positions after 'a' and 'f' 3
positions after 'c'. Shifted words (and only these) have the same key.
Here's a straightforward implementation that generates all the lists
of equally-shifted words:

from collections import defaultdict

def iter_shifted(wo rds):
key2shifted = defaultdict(lis t)
for word in words:
ords = map(ord,word)
key = tuple((ords[i]-ords[i-1]) % 26
for i in xrange(1,len(or ds)))
key2shifted[key].append(word)
for shifted in key2shifted.ite rvalues():
if len(shifted) 1:
yield shifted

if __name__ == '__main__':
words = 'abc bef jas cba cde zab azy hkl'.split()
for shifted in iter_shifted(wo rds):
print shifted

# *** output ***
#['bef', 'hkl']
#['abc', 'cde', 'zab']
#['cba', 'azy']

**Peter Otten** · Jun 27 '08, 04:23 PM

Re: word shifts

dave wrote:

I made a function that takes a word list (one word per line, text file)
and searches for all the words in the list that are 'shifts' of
eachother. Â 'abc' shifted 1 is 'bcd'
>
Please take a look and tell me if this is a viable solution.
>
Â def shift(word, amt):
Â Â Â Â Â Â Â Â ans = ''
Â Â Â Â Â Â Â Â for letter in word:
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 +

ord('a'))

Â Â Â Â Â Â Â Â return ans
>
def fileshift(x):
Â Â Â Â Â Â Â Â fin = open(x)
Â Â Â Â Â Â Â Â d = {}
Â Â Â Â Â Â Â Â for line in fin:
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â d[line.strip()] = [1]
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â for i in range(1, 26):
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ite = shift(line.stri p(), i)
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â if ite in d:
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â print ite
>
>
Any tips/suggestions/critiques greatly appreciated.. I'm trying to
teach myself Python (and still beginning) and would love any helpful
info.

My ideas:

Move the open() call out of fileshift() and have it working with arbitrary
iterables.
Store a normalized version of the word in the dictionary. if every word in
the dict is guaranteed to start with an "a" you can reduce the number of
tests to one.
If you have many words you can build a fast shift() function based on
str.translate() .

Invoke the following script both with and without a filename argument:

from string import ascii_lowercase as chars, maketrans

tables = [maketrans(chars , chars[i:] + chars[:i]) for i in range(26)]

def shift(word, n):
return word.translate( tables[n%26])

def normalize(word) :
a = word[0]
return shift(word, ord("a") - ord(a))

def findshifted(wor ds):
d = {}
for word in words:
d.setdefault(no rmalize(word), []).append(word)
return d

if __name__ == "__main__":
import sys
args = sys.argv[1:]
if args:
words = (line.strip() for line in open(args[0]))
else:
import random
sample = """\
alpha
beta
gamma
delta
""".split()

words = sample + [shift(random.ch oice(sample), random.randrang e(26))
for _ in range(10)]

items = findshifted(wor ds).items()
items.sort()
for k, v in items:
print k, "-->", v

Peter

**Arnaud Delobelle** · Jun 27 '08, 04:23 PM

Re: word shifts

dave <squareswallowe r@invalid.comwr ites:

Hello,
>
I made a function that takes a word list (one word per line, text
file) and searches for all the words in the list that are 'shifts' of
eachother. 'abc' shifted 1 is 'bcd'
>
Please take a look and tell me if this is a viable solution.
>
def shift(word, amt):
ans = ''
for letter in word:
ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 + ord('a'))
return ans

In Python, if you want to build a string from lots of parts you can
use ''.join(parts). I think it is considered more efficient.

>
def fileshift(x):
fin = open(x)
d = {}
for line in fin:
d[line.strip()] = [1]
for i in range(1, 26):
ite = shift(line.stri p(), i)
if ite in d:
print ite
>

You could write:

line = line.strip()

after your for line in fin: statement, rather than strip the line twice.

Let me suggest a different solution base on the str.translate() method

from string import lowercase, maketrans

shifted_lowerca se = lowercase[1:] +lowercase[0]
table = string.maketran s(lowercase, shifted_lowerca se)

def shift1(line):
return line.translate( table)

def fileshift(path) :
lines = set()
for line in open(path):
lines.add(line)
for i in xrange(25):
line = shift1(line)
if line in lines:
print line

(untested)

--
Arnaud

**bearophileHUGS@lycos.com** · Jun 27 '08, 04:23 PM

Re: word shifts

George Sakkis:

A faster algorithm is to create a 'key' for each word, defined as the
tuple of ord differences (modulo 26) of consecutive characters.

Very nice solution, it uses the same strategy used to find anagrams,
where keys are
"".join(sorted( word))
Such general strategy to look for a possible invariant key for the
subsets of the required solutions is quite useful.

Bye,
bearophile

**Arnaud Delobelle** · Jun 27 '08, 04:23 PM

Re: word shifts

bearophileHUGS@ lycos.com writes:

George Sakkis:

>A faster algorithm is to create a 'key' for each word, defined as the
>tuple of ord differences (modulo 26) of consecutive characters.

>
Very nice solution, it uses the same strategy used to find anagrams,
where keys are
"".join(sorted( word))
Such general strategy to look for a possible invariant key for the
subsets of the required solutions is quite useful.

Or even:

def get_key(word):
initial = ord(word[0])
return tuple((ord(x) - initial) % 26 for x in word[1:])

--
Arnaud

**Gabriel Genellina** · Jun 27 '08, 04:23 PM

Re: word shifts

En Sun, 04 May 2008 03:35:05 -0300, George Sakkis <george.sakkis@ gmail.comescrib ió:

On May 4, 2:04 am, "Gabriel Genellina" <gagsl-...@yahoo.com.a rwrote:

>En Sun, 04 May 2008 02:17:07 -0300, dave <squareswallo.. .@invalid.comes cribió:
>>

I made a function that takes a word list (one word per line, text file)
and searches for all the words in the list that are 'shifts' of
eachother. 'abc' shifted 1 is 'bcd'

>>
>But I'd use a different algorithm. Instead of generating all posible shifts for a given word, I'd substract the newly read word from each previous words (here, "substract two words" means substract the character code for corresponding positions, modulo 26). Shifted words, when substracted, have the same number on all positions.

>
A faster algorithm is to create a 'key' for each word, defined as the
tuple of ord differences (modulo 26) of consecutive characters. E.g.
the key of 'acf' is (2,3); 'c' is 2 positions after 'a' and 'f' 3
positions after 'c'. Shifted words (and only these) have the same key.

Much better! I like it.

--
Gabriel Genellina

**dave** · Jun 27 '08, 04:23 PM

Re: word shifts

On 2008-05-04 01:10:40 -0600, Arnaud Delobelle <arnodel@google mail.comsaid:

dave <squareswallowe r@invalid.comwr ites:
>

>Hello,
>>
>I made a function that takes a word list (one word per line, text
>file) and searches for all the words in the list that are 'shifts' of
>eachother. 'abc' shifted 1 is 'bcd'
>>
>Please take a look and tell me if this is a viable solution.
>>
>def shift(word, amt):
> ans = ''
> for letter in word:
> ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 + ord('a'))
> return ans

>
In Python, if you want to build a string from lots of parts you can
use ''.join(parts). I think it is considered more efficient.

what would be the best way to write a "ans = ans + chr" into a
''.join(parts) ??

**George Sakkis** · Jun 27 '08, 04:23 PM

Re: word shifts

On May 5, 11:02 pm, dave <squareswallo.. .@yahoo.comwrot e:

On 2008-05-04 01:10:40 -0600, Arnaud Delobelle <arno...@google mail.comsaid:
>
>
>

dave <squareswallo.. .@invalid.comwr ites:

>

Hello,

>

I made a function that takes a word list (one word per line, text
file) and searches for all the words in the list that are 'shifts' of
eachother. 'abc' shifted 1 is 'bcd'

>

Please take a look and tell me if this is a viable solution.

>

def shift(word, amt):
ans = ''
for letter in word:
ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 + ord('a'))
return ans

>

In Python, if you want to build a string from lots of parts you can
use ''.join(parts). I think it is considered more efficient.

>
what would be the best way to write a "ans = ans + chr" into a
''.join(parts) ??

Well if you do it once, that would be simply "ans += chr". Arnaud was
referring to the case where you do it in a loop, like in your snippet.
A direct translation to use ''.join would be:

ans = ''.join(chr((or d(letter) - ord('a') + amt) % 26 + ord('a'))
for letter in word)

Of course it is simpler and more efficient if you factor out of the
loop the subexpressions that don't need to be recomputed:

ord_a = ord('a')
shift = ord_a - amt
ans = ''.join(chr((or d(letter) - shift) % 26 + ord_a)
for letter in word)

George

**Arnaud Delobelle** · Jun 27 '08, 04:23 PM

Re: word shifts

George Sakkis <george.sakkis@ gmail.comwrites :

On May 5, 11:02 pm, dave <squareswallo.. .@yahoo.comwrot e:

>On 2008-05-04 01:10:40 -0600, Arnaud Delobelle <arno...@google mail.comsaid:
>>
>>
>>

dave <squareswallo.. .@invalid.comwr ites:

>>

>Hello,

>>

>I made a function that takes a word list (one word per line, text
>file) and searches for all the words in the list that are 'shifts' of
>eachother. 'abc' shifted 1 is 'bcd'

>>

>Please take a look and tell me if this is a viable solution.

>>

>def shift(word, amt):
> ans = ''
> for letter in word:
> ans = ans + chr((ord(letter ) - ord('a') + amt) % 26 + ord('a'))
> return ans

>>

In Python, if you want to build a string from lots of parts you can
use ''.join(parts). I think it is considered more efficient.

>>
>what would be the best way to write a "ans = ans + chr" into a
>''.join(part s) ??

>
Well if you do it once, that would be simply "ans += chr". Arnaud was
referring to the case where you do it in a loop, like in your snippet.
A direct translation to use ''.join would be:
>
ans = ''.join(chr((or d(letter) - ord('a') + amt) % 26 + ord('a'))
for letter in word)
>
Of course it is simpler and more efficient if you factor out of the
loop the subexpressions that don't need to be recomputed:
>
ord_a = ord('a')
shift = ord_a - amt
ans = ''.join(chr((or d(letter) - shift) % 26 + ord_a)
for letter in word)
>
George

Or keep the same layout as your original effort, but build a list of
chars instead of a string, then join them all at the end.

def shift(word, amt):
shifted = ''
ord_a = ord('a')
shift = ord_a - amt
for letter in word:
shifted.append( chr((ord(letter ) - shift) % 26 + ord_a))
return ''.join(shifted )

(Which exactly the same as George's snippet, except that the generator
expression is unwound)

--
Arnaud

word shifts

word shifts

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment