any other best way of reading the file

**psbasha** · Mar 21 '07, 09:30 PM

Originally posted by psbasha

Code:

Sample1.txt
 
Sample.txt
Pnt      100100123.      0.      0.
Pnt      200200035.      0.      0.
Pnt      3040000010.     0.      0.
Pnt      4000000015.     0.      0.
Pnt      5005000020.     0.      0.
Pnt      600008000.      5.      0.
Pnt      700000005.      5.      0.
Pnt      8000900010.     5.      0.
Pnt      9000900015.     5.      0.

Code:

Sample2.txt
Pnt    *         3280311       0          1.36567432E+03 -3.71226532E+02
*         2.01031464E+02       0
Pnt	 *         3280502       0          1.25433850E+03 -1.42613068E+02
*         1.80202667E+02       0
Pnt	 *         3280503       0          1.27057288E+03 -1.75843582E+02
*         1.84236084E+02       0
Pnt    *         3280504       0          1.28286145E+03 -2.01004501E+02
*         1.87218460E+02       0

Code:

Sample3.txt
Pnt*     10260209                       1156.26599      313.992828
*       155.018463
Pnt*     10270106                       1097.15002      250.676315
*       140.789337
Pnt*     10270107                       1115.47864      271.83374
*       144.698837

I am getting inconsistency input data from different softwares,but I have to write a generic Pyton code where I can read any input data format as mentioned in the above examples

**psbasha** · Mar 21 '07, 09:48 PM

Originally posted by psbasha

I am getting inconsistency input data from different softwares,but I have to write a generic Pyton code where I can read any input data format as mentioned in the above examples

Could any body help me in resolving this issue of handling the generic format data.

Thanks in advance
PSB

**bvdet** · Mar 22 '07, 12:04 AM

Originally posted by psbasha

Code:

Sample1.txt
 
Sample.txt
Pnt      100100123.      0.      0.
Pnt      200200035.      0.      0.
Pnt      3040000010.     0.      0.
Pnt      4000000015.     0.      0.
Pnt      5005000020.     0.      0.
Pnt      600008000.      5.      0.
Pnt      700000005.      5.      0.
Pnt      8000900010.     5.      0.
Pnt      9000900015.     5.      0.

Code:

Sample2.txt
Pnt    *         3280311       0          1.36567432E+03 -3.71226532E+02
*         2.01031464E+02       0
Pnt	 *         3280502       0          1.25433850E+03 -1.42613068E+02
*         1.80202667E+02       0
Pnt	 *         3280503       0          1.27057288E+03 -1.75843582E+02
*         1.84236084E+02       0
Pnt    *         3280504       0          1.28286145E+03 -2.01004501E+02
*         1.87218460E+02       0

Code:

Sample3.txt
Pnt*     10260209                       1156.26599      313.992828
*       155.018463
Pnt*     10270106                       1097.15002      250.676315
*       140.789337
Pnt*     10270107                       1115.47864      271.83374
*       144.698837

I think we have taken care of Sample1, have we not? Can you explain Sample2 and Sample3 format? Is the point data really on two separate lines? What is the significance of the asterisk? Why are there zeros mixed in with the numbers in scientific notation? Help us help you.

**psbasha** · Mar 22 '07, 12:34 AM

Originally posted by bvdet

I think we have taken care of Sample1, have we not? Can you explain Sample2 and Sample3 format? Is the point data really on two separate lines? What is the significance of the asterisk? Why are there zeros mixed in with the numbers in scientific notation? Help us help you.

a) "I think we have taken care of Sample1, have we not?"

Yes

b) "Can you explain Sample2 and Sample3 format?"

This format is some what different with Sample-1

The X,Y,Z co-ordinates are not written in a single line.They are splitted into two lines.Each String/Number is of 16-Field data
The maximum length of the line is ( 79)
c)Is the point data really on two separate lines?
Yes
d)What is the significance of the asterisk?
The "*" in the second line may be used as continuation of the fields

e) Why are there zeros mixed in with the numbers in scientific notation?
Pnt * 3280504 0 1.28286145E+03 -2.01004501E+02
* 1.87218460E+02 0
Currently I dont need of this zero's.It is also one of the ID which may be refering to some number later

Thanks in advacne
PSB

**bvdet** · Mar 22 '07, 03:05 PM

Originally posted by psbasha

a) "I think we have taken care of Sample1, have we not?"

Yes

b) "Can you explain Sample2 and Sample3 format?"

This format is some what different with Sample-1

The X,Y,Z co-ordinates are not written in a single line.They are splitted into two lines.Each String/Number is of 16-Field data
The maximum length of the line is ( 79)
c)Is the point data really on two separate lines?
Yes
d)What is the significance of the asterisk?
The "*" in the second line may be used as continuation of the fields

e) Why are there zeros mixed in with the numbers in scientific notation?
Pnt * 3280504 0 1.28286145E+03 -2.01004501E+02
* 1.87218460E+02 0
Currently I dont need of this zero's.It is also one of the ID which may be refering to some number later

Thanks in advacne
PSB

Here's one way of adding the data in this format to your point dictionary:

Code:

>>> patt = re.compile(r'''\d+\.\d+E\+\d+|
... \d+\.\d+E\+\d+|
... -\d+\.\d+E\+\d+|
... -\d+\.\d+E-\d+|
... \d+\.\d+E-\d+|
... \d+\.\d+|
... -\d+\.\d+|
... \d+''', re.X
... )
>>> patt
<_sre.SRE_Pattern object at 0x00DE68D0>
>>> s = 'Pnt    *         3280311       0          +1.36567432E+03 -3.71226532E+02'
>>> re.findall(patt,s)
['3280311', '0', '1.36567432E+03', '-3.71226532E+02']
>>> dd = {}
>>> lst = re.findall(patt,s)
>>> dd[int(lst[0])] = [float(i) for i in lst[1:] if i != '0']
>>> dd
{3280311: [1365.6743200000001, -371.22653200000002]}
>>> s1 = '*       155.018463'
>>> lst1 = re.findall(patt,s)
>>> dd[int(lst[0])] = dd[int(lst[0])]+[float(i) for i in lst1 if i != '0']
>>> dd
{3280311: [1365.6743200000001, -371.22653200000002, 155.018463]}
>>>

You can add an elif for the word 'pnt' in combination with '*'. Whoever designed the output for this data ought to be ..............

**psbasha** · Mar 23 '07, 06:11 AM

Hi BV,

Is there any other simple approach available?.It looks like we have to do the formating of the values for readiing it.

Thanks
PSB

**bvdet** · Mar 23 '07, 09:56 AM

Originally posted by psbasha

Hi BV,

Is there any other simple approach available?.It looks like we have to do the formating of the values for readiing it.

Thanks
PSB

The code I showed you works. I guess you could do splits, strips, slices. etc., but I don't think it would be simpler. After incorporating that code into the other code I showed you, you should get output like this:

Code:

>>> Point dictionary:
30400000 = [10.0, 0.0, 0.0]
40000000 = [15.0, 0.0, 0.0]
2 = [2, 5.0, 0.0, 0.0]
3 = [3, 10.0, 0.0, 0.0]
4 = [4, 15.0, 0.0, 0.0]
5 = [5, 20.0, 0.0, 0.0]
6 = [6, 0.0, 5.0, 0.0]
1 = [1, 0.0, 0.0, 0.0]
8 = [8, 10.0, 5.0, 0.0]
9 = [9, 15.0, 5.0, 0.0]
10270106 = [1097.15002, 250.67631499999999, 140.78933699999999]
10270107 = [1115.47864, 271.83373999999998, 144.698837]
10010012 = [3.0, 0.0, 0.0]
60000800 = [0.0, 5.0, 0.0]
20020003 = [5.0, 0.0, 0.0]
10260209 = [1156.2659900000001, 313.99282799999997, 155.018463]
80009000 = [10.0, 5.0, 0.0]
7 = [7, 5.0, 5.0, 0.0]
3280311 = [1365.6743200000001, -371.22653200000002, 201.031464]
50050000 = [20.0, 0.0, 0.0]
90009000 = [15.0, 5.0, 0.0]
70000000 = [5.0, 5.0, 0.0]
3280502 = [1254.3385000000001, -142.613068, 180.20266699999999]
3280503 = [1270.5728799999999, -175.843582, 184.23608400000001]
3280504 = [1282.8614500000001, -201.004501, 187.21845999999999]

Wire dictionary:
10000000 = [10000000, 20000000, 70000000]
20000000 = [20000000, 30000000, 80000000]
30000000 = [30000000, 40000000, 90000000]
10000071 = [10000101, 20000022, 70000000, 60000055]
40000000 = [40000000, 50000000]
30000088 = [30000208, 40000002, 90005000, 80003000]
20000092 = [20000105, 30000004, 80004000, 71111167]
40000094 = [40000304, 50000071, 90000600]

from data like this:

Code:

Rect    1000007110000101200000227000000060000055
Rect    2000009220000105300000048000400071111167
Rect    3000008830000208400000029000500080003000
Tria     40000094400003045000007190000600
Pnt      100100123.      0.      0.
Pnt      200200035.      0.      0.
Pnt      3040000010.     0.      0.
Pnt      4000000015.     0.      0.
Pnt      5005000020.     0.      0.
Pnt      600008000.      5.      0.
Pnt      700000005.      5.      0.
Pnt      8000900010.     5.      0.
Pnt      9000900015.     5.      0.
Pnt      100100123.      0.      0.
Pnt      200200035.      0.      0.
Pnt      3040000010.     0.      0.
Pnt      4000000015.     0.      0.
Pnt      5005000020.     0.      0.
Pnt      600008000.      5.      0.
Pnt      700000005.      5.      0.
Pnt      8000900010.     5.      0.
Pnt      9000900015.     5.      0.
Rect    100000001000000020000000700000006
Rect    200000002000000030000000800000007
Rect    300000003000000040000000900000008
Tria    4000000040000000500000009
Pnt     1       0.      0.      0.
Pnt     2       5.      0.      0.
Pnt     3       10.     0.      0.
Pnt     4       15.     0.      0.
Pnt     5       20.     0.      0.
Pnt     6       0.      5.      0.
Pnt     7       5.      5.      0.
Pnt     8       10.     5.      0.
Pnt     9       15.     5.      0.




Pnt    *         3280311       0          1.36567432E+03 -3.71226532E+02
*         2.01031464E+02       0
Pnt	 *         3280502       0          1.25433850E+03 -1.42613068E+02
*         1.80202667E+02       0
Pnt	 *         3280503       0          1.27057288E+03 -1.75843582E+02
*         1.84236084E+02       0
Pnt    *         3280504       0          1.28286145E+03 -2.01004501E+02
*         1.87218460E+02       0

Pnt*     10260209                       1156.26599      313.992828
*       155.018463
Pnt*     10270106                       1097.15002      250.676315
*       140.789337
Pnt*     10270107                       1115.47864      271.83374
*       144.698837

The data files were not formatted is the best manner for reading.

**bvdet** · Mar 24 '07, 03:06 AM

Maybe this will be easier to follow:

Code:

def read_file_data(f):
    ptDict = {}
    wireDict = {}
    fList = open(f).readlines()
    
    in_pnt = False
    patt = re.compile(r'''\d+\.\d+E\+\d+|           # engineering notation ++
                          -\d+\.\d+E\+\d+|          # engineering notation -+
                          -\d+\.\d+E-\d+|           # engineering notation --
                          \d+\.\d+E-\d+|            # engineering notation +-
                          \d+\.\d+|                 # positive float format
                          -\d+\.\d+|                # negative float format
                          \d+                       # positive integer
                          ''', re.X
                      )
    
    for line in fList:
        lineList = [x.lower().strip() for x in line.strip().split(' ', 1) if x != '']

**psbasha** · Dec 25 '07, 06:15 AM

Code:

Sample.txt
$$$$$
START
COLOR RED
LINETYPE SOLID
END
$$$$$$$
PLine    1        6      1.5     9.375   .001    .001
$ Line Details
Line*    1               1                1              2
*        .002952         .992547         .121827
$
Rect     2        1       2       3       7       6
Rect     3        1       3       4       8       7
PRect*   4               11              15              16
*        10              11              0.3
Rect*    4               1               5               6
*        10              11              0.
Othr*    1               1               5               6
*        10              11              0.              0.
*        10              11              0.              1.0
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Tria     5        1       7       2       11
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Point    1               0.0     0.0     0.0
Point    2               1.0     0.0     0.0
Point    3               2.0     0.0     0.0
Point    4               3.0     0.0     0.0
Point    5               0.0     1.0     0.0
Point    6               1.0     1.0     0.0
Point    7               2.0     1.0     0.0
Point    8               4.0     1.0     0.0
Point*   9                              0.0             2.0
*          0.0
Point  *3280504         0               1.28286145E+03  1.28286145E+03
*       -2.01004501E+02
$
END

Code:

Sample.py

    
def read_file_data(strFile):
    f = open(strFile,'r')
    pointID = 0
    curvetID = 0
    pointIDDict = {}        
    pointList = []             
    coordList = []
    attrdict=[]
    
    curveIDDict = {}
    curveOneDimIDDict = {}
    curveTwoDimIDPointIDDict = {}        
    largeFieldFlag = False
    
    curveCardLargeFieldFlag = False
    bTriaFlag = False
    bRectFlag = False
    bOnlyPointCoord = True
    b1DCurveFlag = False
    propDict={}

    strTemp = f.readlines()
    for line in strTemp:
        if(line.startswith('Point') or line.startswith('Point*') or line.startswith('Point  *') or line.startswith('*') and bOnlyPointCoord):
            
            if (line.startswith('Point') and (line[:8].strip().isalpha())):
                pointID = int(line[8:16])                        
                coordList.append((float(line[24:32])))
                coordList.append((float(line[32:40])))                
                coordList.append((float(line[40:48])))
                largeFieldFlag = False
            elif (line.startswith('Point*') or line.startswith('Point  *')):
                pointID = int(line[8:24])                        
                coordList.append((float(line[40:56])))
                coordList.append((float(line[56:72])))
                largeFieldFlag = True
                bOnlyPointCoord = True
            elif (line.startswith('*') and largeFieldFlag):                  
                coordList.append((float(line[8:24])))
                largeFieldFlag = False                    
            if ( pointID and largeFieldFlag == False):
                pointIDDict[pointID]=coordList                    
                pointID =0   
                coordList = []
                
            bOnlyPointCoord = True
        elif (line.startswith('Rect') or line.startswith('Tria') or  \
              line.startswith('Line') and line[:8].strip().isalpha() or \
              line.startswith('Rect*') or line.startswith('Tria*') or\
              line.startswith('Line*')or line.startswith('*')):
              
            if (line.startswith('Rect  ') or \
                line.startswith('Line') and line[:8].strip().isalpha() ):

                curvetID = int(line[8:16])                        
                pointList.append((int(line[24:32])))
                pointList.append((int(line[32:40])))
                b1DCurveFlag = True
                
                if (line[:4]=='Tria'or line[:4]=='Rect'):                        
                    pointList.append((int(line[40:48])))
                    b1DCurveFlag = False
                            
                    if (line[:4]=='Rect' ):
                        pointList.append((int(line[48:56])))
                        
                curveCardLargeFieldFlag = False
                    
            elif   (line.startswith('Rect*') or line.startswith('Tria*') or \
                    line.startswith('Line*')):
                curvetID = int(line[8:24])                        
                pointList.append((int(line[40:56])))
                pointList.append((int(line[56:72])))
                curveCardLargeFieldFlag = True
                bOnlyPointCoord = False
                b1DCurveFlag = True
                if line.startswith('Rect*') :
                    bRectFlag = True
                    bTriaFlag = False
                elif line.startswith('Tria*'):
                    bTriaFlag = True
                    bRectFlag = False                        
                
            elif line.startswith('*') and curveCardLargeFieldFlag:                    
                if (bTriaFlag or bRectFlag):
                    pointList.append((int(line[8:24])))
                    b1DCurveFlag = False                                
                    if bRectFlag:
                        pointList.append((int(line[24:40])))
                        
                bTriaFlag = False
                bRectFlag = False
                        
                curveCardLargeFieldFlag = False
                        
            if ( curvetID and curveCardLargeFieldFlag == False):                    
                # Map ElementID and Node ID's of that element
                curveIDDict[curvetID]=pointList
                if b1DCurveFlag:
                    curveOneDimIDDict[curvetID]= pointList
                    b1DCurveFlag = False
                else:
                    curveTwoDimIDPointIDDict[curvetID]= pointList
                    b1DCurveFlag = False                    

                curveCardLargeFieldFlag = False
                bOnlyPointCoord = False
                curvetID = 0                    
                pointList = []          
    
    f.close()

    #Node
    #For all Nodes
    print pointIDDict

    print curveIDDict

    print  curveOneDimIDDict

    print curveTwoDimIDPointIDDict  

    
if __name__ == '__main__':
    read_file_data("C:\\ReadFile\\SampleData.txt")

Above is the sample text file ,and the sample code for the above file reading.I would like to avoid using the flags and so many variables to define.Is it possible to use regular expression and reduce the piece of code

Thanks
PSB

**psbasha** · Dec 25 '07, 06:18 AM

Originally posted by psbasha

Code:

Sample.txt
$$$$$
START
COLOR RED
LINETYPE SOLID
END
$$$$$$$
PLine    1        6      1.5     9.375   .001    .001
$ Line Details
Line*    1               1                1              2
*        .002952         .992547         .121827
$
Rect     2        1       2       3       7       6
Rect     3        1       3       4       8       7
PRect*   4               11              15              16
*        10              11              0.3
Rect*    4               1               5               6
*        10              11              0.
Othr*    1               1               5               6
*        10              11              0.              0.
*        10              11              0.              1.0
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Tria     5        1       7       2       11
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Point    1               0.0     0.0     0.0
Point    2               1.0     0.0     0.0
Point    3               2.0     0.0     0.0
Point    4               3.0     0.0     0.0
Point    5               0.0     1.0     0.0
Point    6               1.0     1.0     0.0
Point    7               2.0     1.0     0.0
Point    8               4.0     1.0     0.0
Point*   9                              0.0             2.0
*          0.0
Point  *3280504         0               1.28286145E+03  1.28286145E+03
*       -2.01004501E+02
$
END

Code:

Sample.py

    
def read_file_data(strFile):
    f = open(strFile,'r')
    pointID = 0
    curvetID = 0
    pointIDDict = {}        
    pointList = []             
    coordList = []
    attrdict=[]
    
    curveIDDict = {}
    curveOneDimIDDict = {}
    curveTwoDimIDPointIDDict = {}        
    largeFieldFlag = False
    
    curveCardLargeFieldFlag = False
    bTriaFlag = False
    bRectFlag = False
    bOnlyPointCoord = True
    b1DCurveFlag = False
    propDict={}

    strTemp = f.readlines()
    for line in strTemp:
        if(line.startswith('Point') or line.startswith('Point*') or line.startswith('Point  *') or line.startswith('*') and bOnlyPointCoord):
            
            if (line.startswith('Point') and (line[:8].strip().isalpha())):
                pointID = int(line[8:16])                        
                coordList.append((float(line[24:32])))
                coordList.append((float(line[32:40])))                
                coordList.append((float(line[40:48])))
                largeFieldFlag = False
            elif (line.startswith('Point*') or line.startswith('Point  *')):
                pointID = int(line[8:24])                        
                coordList.append((float(line[40:56])))
                coordList.append((float(line[56:72])))
                largeFieldFlag = True
                bOnlyPointCoord = True
            elif (line.startswith('*') and largeFieldFlag):                  
                coordList.append((float(line[8:24])))
                largeFieldFlag = False                    
            if ( pointID and largeFieldFlag == False):
                pointIDDict[pointID]=coordList                    
                pointID =0   
                coordList = []
                
            bOnlyPointCoord = True
        elif (line.startswith('Rect') or line.startswith('Tria') or  \
              line.startswith('Line') and line[:8].strip().isalpha() or \
              line.startswith('Rect*') or line.startswith('Tria*') or\
              line.startswith('Line*')or line.startswith('*')):
              
            if (line.startswith('Rect  ') or \
                line.startswith('Line') and line[:8].strip().isalpha() ):

                curvetID = int(line[8:16])                        
                pointList.append((int(line[24:32])))
                pointList.append((int(line[32:40])))
                b1DCurveFlag = True
                
                if (line[:4]=='Tria'or line[:4]=='Rect'):                        
                    pointList.append((int(line[40:48])))
                    b1DCurveFlag = False
                            
                    if (line[:4]=='Rect' ):
                        pointList.append((int(line[48:56])))
                        
                curveCardLargeFieldFlag = False
                    
            elif   (line.startswith('Rect*') or line.startswith('Tria*') or \
                    line.startswith('Line*')):
                curvetID = int(line[8:24])                        
                pointList.append((int(line[40:56])))
                pointList.append((int(line[56:72])))
                curveCardLargeFieldFlag = True
                bOnlyPointCoord = False
                b1DCurveFlag = True
                if line.startswith('Rect*') :
                    bRectFlag = True
                    bTriaFlag = False
                elif line.startswith('Tria*'):
                    bTriaFlag = True
                    bRectFlag = False                        
                
            elif line.startswith('*') and curveCardLargeFieldFlag:                    
                if (bTriaFlag or bRectFlag):
                    pointList.append((int(line[8:24])))
                    b1DCurveFlag = False                                
                    if bRectFlag:
                        pointList.append((int(line[24:40])))
                        
                bTriaFlag = False
                bRectFlag = False
                        
                curveCardLargeFieldFlag = False
                        
            if ( curvetID and curveCardLargeFieldFlag == False):                    
                # Map ElementID and Node ID's of that element
                curveIDDict[curvetID]=pointList
                if b1DCurveFlag:
                    curveOneDimIDDict[curvetID]= pointList
                    b1DCurveFlag = False
                else:
                    curveTwoDimIDPointIDDict[curvetID]= pointList
                    b1DCurveFlag = False                    

                curveCardLargeFieldFlag = False
                bOnlyPointCoord = False
                curvetID = 0                    
                pointList = []          
    
    f.close()

    #Node
    #For all Nodes
    print pointIDDict

    print curveIDDict

    print  curveOneDimIDDict

    print curveTwoDimIDPointIDDict  

    
if __name__ == '__main__':
    read_file_data("C:\\Shakil\\ReadFile\\SampleData.txt")

Above is the sample text file ,and the sample code for the above file reading.I would like to avoid using the flags and so many variables to define.Is it possible to use regular expression and reduce the piece of code

Thanks
PSB

In some scenarios I have to read following data in the file

PLine 1 6 1.5 9.375 .001 .001
PRect* 4 11 15 16
* 10 11 0.3
Othr* 1 1 5 6
* 10 11 0. 0.
* 10 11 0. 1.0

In Some scenarios the Point data will be defined as below

Point *3280505 0 1.28286145+03 1.28286145-03
* -2.01004501+02

1.28286145+03 is same as 1.28286145E+03
1.28286145-03 is same as 1.28286145E-03

How to handle the above scenarios while reading the file

Thanks
PSB

**psbasha** · Dec 25 '07, 06:27 AM

PLine 1 6 1.5 9.375 .001 .001
PRect* 4 11 15 16
* 10 11 0.3
Othr* 1 1 5 6
* 10 11 0. 0.
* 10 11 0. 1.0

I have not written a code for the above Card lines to store the properties of the curves.

In some cases the Point coordinates are represented as shown below

Point *3280505 0 1.28286145+03 1.28286145-03
* -2.01004501+02

1.28286145+03 is same as 1.28286145E+03
1.28286145-03 is same as 1.28286145E-03

Is anybody suggest me ,how to store and print the data?

Thanks
PSB

**psbasha** · Dec 25 '07, 05:34 PM

Originally posted by psbasha

PLine 1 6 1.5 9.375 .001 .001
PRect* 4 11 15 16
* 10 11 0.3
Othr* 1 1 5 6
* 10 11 0. 0.
* 10 11 0. 1.0

I have not written a code for the above Card lines to store the properties of the curves.

In some cases the Point coordinates are represented as shown below

Point *3280505 0 1.28286145+03 1.28286145-03
* -2.01004501+02

1.28286145+03 is same as 1.28286145E+03
1.28286145-03 is same as 1.28286145E-03

Is anybody suggest me ,how to store and print the data?

Thanks
PSB

Any suggestions to the above queries ?

**psbasha** · Dec 27 '07, 05:31 AM

Hi BV,

Any suggestions on the above code.

Thanks
PSB

**bvdet** · Dec 27 '07, 07:33 PM

Try this:[code=Python]import re

def convert_data(s) :
for func in (int, float):
try:
n = func(s)
return n
except:
pass
return s

pattnum = re.compile(r'''
\d+\.\d+E\+\d+| # engineering notation ++
-\d+\.\d+E\+\d+| # engineering notation -+
-\d+\.\d+E-\d+| # engineering notation --
\d+\.\d+E-\d+| # engineering notation +-
\d+\.\d+| # positive float format
-\d+\.\d+| # negative float format
\d+\.| # positive float format
-\d+\.| # negative float format
\.\d+| # positive float format
-\.\d+| # negative float format
\d+ # positive integer
''', re.X
)

def parseData(fn, *kargs):
fileList = [item.strip() for item in open(fn).readli nes()\
if not item.startswith ('$')]
pattkey = re.compile('|'. join([r'\b(%s)' % item for item in kargs]))
'''
print pattkey
print pattkey.pattern
'''
# create dictionary with keys from kargs
masterDict = dict(zip(kargs, [[] for _ in kargs]))
inData = False
for line in fileList:
if inData and line.startswith ('*'):
data.extend(re. findall(pattnum , line))
elif inData and not line.startswith ('*'):
masterDict[m.group(0)].append([convert_data(it em)\
for item in data])
inData = False
m = pattkey.match(l ine)
if m:
# m.group(0) is the current keyword
if '*' in line.split()[0]:
inData = True
data = re.findall(patt num, line)
else:
data = re.findall(patt num, line)
masterDict[m.group(0)].append([convert_data(it em)\
for item in data])
else:
m = pattkey.match(l ine)
if m:
# m.group(0) is the current keyword
if '*' in line.split()[0]:
inData = True
data = re.findall(patt num, line)
else:
data = re.findall(patt num, line)
masterDict[m.group(0)].append([convert_data(it em)\
for item in data])
return masterDict

fn = 'H:\\TEMP\\tems ys\\sample_poin ts8.txt'
keywords = ['Point', 'Othr', 'Rect', 'PRect', 'PLine', 'Line', 'Tria']
dd = parseData(fn, *keywords)
for key in dd:
print key
for item in dd[key]:
print ' %s' % item
[/code]Output:
[code=Python]>>> Point
[1, 0.0, 0.0, 0.0]
[2, 1.0, 0.0, 0.0]
[3, 2.0, 0.0, 0.0]
[4, 3.0, 0.0, 0.0]
[5, 0.0, 1.0, 0.0]
[6, 1.0, 1.0, 0.0]
[7, 2.0, 1.0, 0.0]
[8, 4.0, 1.0, 0.0]
[9, 0.0, 2.0, 0.0]
[3280504, 0, 1282.8614500000 001, 1282.8614500000 001]
PLine
[1, 6, 1.5, 9.375, 0.001, 0.001]
Tria
[5, 1, 7, 2, 11]
PRect
[4, 11, 15, 16, 10, 11, 0.2999999999999 9999]
Line
[1, 1, 1, 2, 0.0029520000000 000002, 0.9925469999999 9996, 0.121827]
Rect
[2, 1, 2, 3, 7, 6]
[3, 1, 3, 4, 8, 7]
[4, 1, 5, 6, 10, 11, 0.0]
Othr
[1, 1, 5, 6, 10, 11, 0.0, 0.0, 10, 11, 0.0, 1.0]
[/code]

**bvdet** · Dec 28 '07, 02:14 AM

I made a few modifications so it would work properly. It probably needs some more work, but I will leave it up to you. Let us know how it turns out.[code=Python]import re

def convert_data(s) :
for func in (int, float):
try:
n = func(s)
return n
except:
pass
return s

pattnum = re.compile(r'''
-\d+\.\d+E\+\d+| # engineering notation -+
\d+\.\d+E\+\d+| # engineering notation ++
-\d+\.\d+E-\d+| # engineering notation --
\d+\.\d+E-\d+| # engineering notation +-
-\d+\.\d+| # negative float format
\d+\.\d+| # positive float format
-\d+\.| # negative float format
\d+\.| # positive float format
-\.\d+| # negative float format
\.\d+| # positive float format
\d+ # positive integer
''', re.X
)

def parseData(fn, *kargs):
fileList = [item.strip() for item in open(fn).readli nes()\
if not item.startswith ('$')]
pattkey = re.compile('|'. join([r'\b(%s)' % item for item in kargs]))
'''
print pattkey
print pattkey.pattern
'''
# create dictionary with keys from kargs
masterDict = dict(zip(kargs, [[] for _ in kargs]))
inData = False
for line in fileList:
if inData and line.startswith ('*'):
data.extend(re. findall(pattnum , line))
elif inData and not line.startswith ('*'):
masterDict[m.group(0)].append([convert_data(it em)\
for item in data])
inData = False
m = pattkey.match(l ine)
if m:
# m.group(0) is the current keyword
if '*' in line:
inData = True
data = re.findall(patt num, line)
else:
data = re.findall(patt num, line)
masterDict[m.group(0)].append([convert_data(it em)\
for item in data])
else:
m = pattkey.match(l ine)
if m:
# m.group(0) is the current keyword
if '*' in line:
inData = True
data = re.findall(patt num, line)
else:
data = re.findall(patt num, line)
masterDict[m.group(0)].append([convert_data(it em)\
for item in data])
return masterDict

fn = 'sample.txt'
keywords = ['Point', 'Othr', 'Rect', 'PRect', 'PLine', 'Line', 'Tria']
dd = parseData(fn, *keywords)
for key in dd:
print key
for item in dd[key]:
print ' %s' % item[/code]

any other best way of reading the file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment