File to dict

**Chris** · Dec 7 '07, 11:55 AM

Re: File to dict

On Dec 7, 1:31 pm, mrk...@gmail.co m wrote:

Hello everyone,
>
I have written this small utility function for transforming legacy
file to Python dict:
>
def lookupdmo(domai n):
lines = open('/etc/virtual/domainowners',' r').readlines()
lines = [ [y.lstrip().rstr ip() for y in x.split(':')] for x in
lines]
lines = [ x for x in lines if len(x) == 2 ]
d = dict()
for line in lines:
d[line[0]]=line[1]
return d[domain]
>
The /etc/virtual/domainowners file contains double-colon separated
entries:
domain1.tld: owner1
domain2.tld: own2
domain3.another : somebody
...
>
Now, the above lookupdmo function works. However, it's rather tedious
to transform files into dicts this way and I have quite a lot of such
files to transform (like custom 'passwd' files for virtual email
accounts etc).
>
Is there any more clever / more pythonic way of parsing files like
this? Say, I would like to transform a file containing entries like
the following into a list of lists with doublecolon treated as
separators, i.e. this:
>
tm:$1$aaaa$bbbb :1010:6::/home/owner1/imap/domain1.tld/tm:/sbin/nologin
>
would get transformed into this:
>
[ ['tm', '$1$aaaa$bbbb', '1010', '6', , '/home/owner1/imap/domain1.tld/
tm', '/sbin/nologin'] [...] [...] ]

For the first one you are parsing the entire file everytime you want
to lookup just one domain. If it is something reused several times
during your code execute you could think of rather storing it so it's
just a simple lookup away, for eg.

_domain_dict = dict()
def generate_dict(i nput_file):
finput = open(input_file , 'rb')
global _domain_dict
for each_line in enumerate(finpu t):
line = each_line.strip ().split(':')
if len(line)==2: _domain_dict[line[0]] = line[1]

finput.close()

def domain_lookup(d omain_name):
global _domain_dict
try:
return _domain_dict[domain_name]
except KeyError:
return 'Unknown.Domain '

Your second parsing example would be a simple case of:

finput = open('input_fil e.ext', 'rb')
results_list = []
for each_line in enumerate(finpu t.readlines()):
results_list.ap pend( each_line.strip ().split(':') )
finput.close()

**Duncan Booth** · Dec 7 '07, 12:05 PM

Re: File to dict

mrkafk@gmail.co m wrote:

def lookupdmo(domai n):
lines = open('/etc/virtual/domainowners',' r').readlines()
lines = [ [y.lstrip().rstr ip() for y in x.split(':')] for x in
lines]
lines = [ x for x in lines if len(x) == 2 ]
d = dict()
for line in lines:
d[line[0]]=line[1]
return d[domain]

Just some minor points without changing the basis of what you have done
here:

Don't bother with 'readlines', file objects are directly iterable.
Why are you calling both lstrip and rstrip? The strip method strips
whitespace from both ends for you.

It is usually a good idea with code like this to limit the split method to
a single split in case there is more than one colon on the line: i.e.
x.split(':',1)

When you have a sequence whose elements are sequences with two elements
(which is what you have here), you can construct a dict directly from the
sequence.

But why do you construct a dict from that input data simply to throw it
away? If you only want 1 domain from the file just pick it out of the list.
If you want to do multiple lookups build the dict once and keep it around.

So something like the following (untested code):

from __future__ import with_statement

def loaddomainowner s(domain):
with open('/etc/virtual/domainowners',' r') as infile:
pairs = [ line.split(':', 1) for line in infile if ':' in line ]
pairs = [ (domain.strip() , owner.strip())
for (domain,owner) in pairs ]
return dict(lines)

DOMAINOWNERS = loaddomainowner s()

def lookupdmo(domai n):
return DOMAINOWNERS[domain]

**Matt Nordhoff** · Dec 7 '07, 12:15 PM

Re: File to dict

Chris wrote:

For the first one you are parsing the entire file everytime you want
to lookup just one domain. If it is something reused several times
during your code execute you could think of rather storing it so it's
just a simple lookup away, for eg.
>
_domain_dict = dict()
def generate_dict(i nput_file):
finput = open(input_file , 'rb')
global _domain_dict
for each_line in enumerate(finpu t):
line = each_line.strip ().split(':')
if len(line)==2: _domain_dict[line[0]] = line[1]
>
finput.close()
>
def domain_lookup(d omain_name):
global _domain_dict
try:
return _domain_dict[domain_name]
except KeyError:

What about this?

_domain_dict = dict()
def generate_dict(i nput_file):
global _domain_dict
# If it's already been run, do nothing. You might want to change
# this.
if _domain_dict:
return
fh = open(input_file , 'rb')
try:
for line in fh:
line = line.strip().sp lit(':', 1)
if len(line) == 2:
_domain_dict[line[0]] = line[1]
finally:
fh.close()

def domain_lookup(d omain_name):
return _domain_dict.ge t(domain_name)

I changed generate_dict to do nothing if it's already been run. (You
might want it to run again with a fresh dict, or throw an error or
something.)

I removed enumerate() because it's unnecessary (and wrong -- you were
trying to split a tuple of (index, line)).

I also changed the split to only split once, like Duncan Booth suggested.

The try-finally is to ensure that the file is closed if an exception is
thrown for some reason.

domain_lookup doesn't need to declare _domain_dict as global because
it's not assigning to it. .get() returns None if the key doesn't exist,
so now the function returns None. You might want to use a different
value or throw an exception (use _domain_dict[domain_name] and not catch
the KeyError if it doesn't exist, perhaps).

Other than that, I just reformatted it and renamed variables, because I
do that. :-P
--

**Matt Nordhoff** · Dec 7 '07, 12:15 PM

Re: File to dict

Duncan Booth wrote:

Just some minor points without changing the basis of what you have done
here:
>
Don't bother with 'readlines', file objects are directly iterable.
Why are you calling both lstrip and rstrip? The strip method strips
whitespace from both ends for you.
>
It is usually a good idea with code like this to limit the split method to
a single split in case there is more than one colon on the line: i.e.
x.split(':',1)
>
When you have a sequence whose elements are sequences with two elements
(which is what you have here), you can construct a dict directly from the
sequence.
>
But why do you construct a dict from that input data simply to throw it
away? If you only want 1 domain from the file just pick it out of the list.
If you want to do multiple lookups build the dict once and keep it around.
>
So something like the following (untested code):
>
from __future__ import with_statement
>
def loaddomainowner s(domain):
with open('/etc/virtual/domainowners',' r') as infile:
pairs = [ line.split(':', 1) for line in infile if ':' in line ]
pairs = [ (domain.strip() , owner.strip())
for (domain,owner) in pairs ]
return dict(lines)
>
DOMAINOWNERS = loaddomainowner s()
>
def lookupdmo(domai n):
return DOMAINOWNERS[domain]

Using two list comprehensions mean you construct two lists, which sucks
if it's a large file.

Also, you could pass the list comprehension (or better yet a generator
expression) directly to dict() without saving it to a variable:

with open('/etc/virtual/domainowners',' r') as fh:
return dict(line.strip ().split(':', 1) for line in fh)

(Argh, that doesn't .strip() the key and value, which means it won't
work, but it's so simple and elegant and I'm tired enough that I'm not
going to add that. :-P Just use another genexp. Makes for a line
complicated enough that it could be turned into a for loop, though.)
--

**Chris** · Dec 7 '07, 12:25 PM

Re: File to dict

Ta Matt, wasn't paying attention to what I typed. :)
And didn't know that about .get() and not having to declare the
global.
Thanks for my mandatory new thing for the day ;)

**Bruno Desthuilliers** · Dec 7 '07, 12:25 PM

Re: File to dict

mrkafk@gmail.co m a écrit :

Hello everyone,

(snip)

Say, I would like to transform a file containing entries like
the following into a list of lists with doublecolon treated as
separators, i.e. this:
>
tm:$1$aaaa$bbbb :1010:6::/home/owner1/imap/domain1.tld/tm:/sbin/nologin
>
would get transformed into this:
>
[ ['tm', '$1$aaaa$bbbb', '1010', '6', , '/home/owner1/imap/domain1.tld/
tm', '/sbin/nologin'] [...] [...] ]

The csv module is your friend.

**Matt Nordhoff** · Dec 7 '07, 12:25 PM

Re: File to dict

Chris wrote:

Ta Matt, wasn't paying attention to what I typed. :)
And didn't know that about .get() and not having to declare the
global.
Thanks for my mandatory new thing for the day ;)

:-)
--

**mrkafk@gmail.com** · Dec 7 '07, 12:45 PM

Re: File to dict

Duncan Booth wrote:

Just some minor points without changing the basis of what you have done
here:

All good points, thanks. Phew, there's nothing like peer review for
your code...

But why do you construct a dict from that input data simply to throw it
away?

Because comparing strings for equality in a loop is writing C in
Python, and that's
exactly what I'm trying to unlearn.

The proper way to do it is to produce a dictionary and look up a value
using a key.

>If you only want 1 domain from the file just pick it out of the list.

for item in list:
if item == 'searched.domai n':
return item...

Yuck.

with open('/etc/virtual/domainowners',' r') as infile:
pairs = [ line.split(':', 1) for line in infile if ':' in line ]

Didn't think about doing it this way. Good point. Thx

**mrkafk@gmail.com** · Dec 7 '07, 12:55 PM

Re: File to dict

The csv module is your friend.

(slapping forehead) why the Holy Grail didn't I think about this? That
should be much simpler than using SimpleParse or SPARK.

Thx Bruno & everyone.

**Marc 'BlackJack' Rintsch** · Dec 7 '07, 01:05 PM

Re: File to dict

On Fri, 07 Dec 2007 04:44:25 -0800, mrkafk wrote:

Duncan Booth wrote:

>But why do you construct a dict from that input data simply to throw it
>away?

>
Because comparing strings for equality in a loop is writing C in
Python, and that's exactly what I'm trying to unlearn.
>
The proper way to do it is to produce a dictionary and look up a value
using a key.
>

>>If you only want 1 domain from the file just pick it out of the list.

>
for item in list:
if item == 'searched.domai n':
return item...
>
Yuck.

I guess Duncan's point wasn't the construction of the dictionary but the
throw it away part. If you don't keep it, the loop above is even more
efficient than building a dictionary with *all* lines of the file, just to
pick one value afterwards.

Ciao,
Marc 'BlackJack' Rintsch

**Bruno Desthuilliers** · Dec 7 '07, 01:15 PM

Re: File to dict

mrkafk@gmail.co m a écrit :

>

>The csv module is your friend.

>
(slapping forehead) why the Holy Grail didn't I think about this?

If that can make you feel better, a few years ago, I spent two days
writing my own (SquaredWheel(t m) of course) csv reader/writer... before
realizing there was such a thing as the csv module :-/

Should have known better...

**mrkafk@gmail.com** · Dec 7 '07, 01:55 PM

Re: File to dict

I guess Duncan's point wasn't the construction of the dictionary but the
throw it away part. If you don't keep it, the loop above is even more
efficient than building a dictionary with *all* lines of the file, just to
pick one value afterwards.

Sure, but I have two options here, none of them nice: either "write C
in Python" or do it inefficient and still elaborate way.

Anyway, I found my nirvana at last:

>>def shelper(line):

.... return x.replace(' ','').strip('\n ').split(':',1)
....

>>ownerslist = [ shelper(x)[1] for x in it if len(shelper(x)) == 2 and shelper(x)[0] == domain ]

>>ownerslist

['da2']

Python rulez. :-)

**mrkafk@gmail.com** · Dec 7 '07, 02:05 PM

Re: File to dict

>def shelper(line):

... return x.replace(' ','').strip('\n ').split(':',1)

Argh, typo, should be def shelper(x) of course.

**Neil Cerutti** · Dec 7 '07, 02:45 PM

Re: File to dict

On 2007-12-07, Duncan Booth <duncan.booth@i nvalid.invalidw rote:

from __future__ import with_statement
>
def loaddomainowner s(domain):
with open('/etc/virtual/domainowners',' r') as infile:

I've been thinking I have to use contextlib.clos ing for
auto-closing files. Is that not so?

--
Neil Cerutti

File to dict

File to dict

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment