File Parsing

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • psbasha
    Contributor
    • Feb 2007
    • 440

    #16
    Originally posted by bvdet
    BTW, your script fails on your data because all comment lines must begin with '$'. It fails on the first line of data.
    Code:
    SampleInputData
    
    $$$$Header$$$$$$$$$$$$
    $$$$Parameter$$$$$$$$$
     
    /Parameter_range/ 1 1
     
    /Flag1/ 1
    /Flag2/ 1
    /DummyFlag1/ 1
     
    /STOP/ Line and Circle
     
    $$$$
     
    /LineThick/ 0.1 $$$Line Thickness
     
    $$$$
     
    /Top1/ 10 $$Value1
    /Top2/ 11 $$Value2
     
     $$$
    /Bot1/  20 $$Comment
    /Bot2/ 30 $$Comment
    /Bot4/ 40 $$Comment
     
    $$
    /TOl1/ -0.05
    /TOl2/ 0.01
     
    $$$$$$Line IDs$$$$$$$$
     
    /NOT/  10 11 12 1
    /NOT/  10 11 12 2
    /Ok/   11 12 1  3
     
    /MAT/ $$
    1 $Begin
    100.    40. 30.  2.0   0 ****22 ksdas
    2
    200.    40. 60.  2.0   0 ****22 ksdas
    3
    600.    40. 30.  5.0   0 ****22 ksdas
    4
    500.    40. 70.  2.0   0 ****22 ksdas
    0 $End
    2 ***Values $Begin  
    1000.  .1
    2000.  .2
    3000.  .3
    4000.  .6
       0.  .0 $End
     
    3 ***Values $Begin  
    3000.  .1
    5000.  .2
    6000.  .3
    7000.  .6
       0.  .0 $End
    0 $End
     
    2 ***Values $Begin  
    1000.  .1
    2000.  .2
    3000.  .3
    4000.  .6
       0.  .0 $End
     
    3 ***Values $Begin  
    13000.  .1
    45000.  .2
    56000.  .3
    87000.  .6
        0.  .0 $End
    0 $End
    2 $Begin
        2.0 .00
        2.0 .210
        3.0 .235
        0.  .0 $End
    3 $Begin
        2.0 .00
        2.0 .210
        3.0 .235
        0.  .0 $End
    0 $End
    /4*ALL/ $ ***
     11       1       1       1     69716.   1000
     11       1       1       5     76296.   1000
     31       1       1       6     74926.   1000
     31       1       1       7     74653.   1000

    Comment

    • psbasha
      Contributor
      • Feb 2007
      • 440

      #17
      >>> Ok = [11, 12, 1, 3]
      MAT = [[1], [100, 40, 30, 2.0, 0], [2], [200, 40, 60, 2.0, 0], [3], [600, 40, 30, 5.0, 0], [4], [500, 40, 70, 2.0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [3000, 1], [5000, 2], [6000, 3], [7000, 6], [0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [13000, 1], [45000, 2], [56000, 3], [87000, 6], [0, 0], [0], [2], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [3], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [0], [4, 'ALL'], [11, 1, 1, 1, 69716, 1000], [11, 1, 1, 5, 76296, 1000], [31, 1, 1, 6, 74926, 1000], [31, 1, 1, 7, 74653, 1000]]
      LineThick = [0.1000000000000 0001]
      TOl2 = [0.01]
      TOl1 = [0.0500000000000 00003]
      STOP = ['Line', 'and', 'Circle']
      Top2 = [11]
      Top1 = [10]
      Bot4 = [40]
      Bot1 = [20]
      NOT = [10, 11, 12, 1, 10, 11, 12, 2]
      Flag2 = [1]
      Flag1 = [1]
      Parameter_range = [1, 1]
      Bot2 = [30]
      DummyFlag1 = [1]
      >>> [/code]I understand that this is not your final solution. Maybe you can come up with a way to parse the 'MAT' data.[/QUOTE]

      Code:
      Description
      >>> 
      ------------------------------------------------------------------
      Ok = [11, 12, 1, 3]  # This should be [[11, 12, 1, 3] ].In this we have one or more 
      
      MAT = [[1], [100, 40, 30, 2.0, 0], [2], [200, 40, 60, 2.0, 0], [3], [600, 40, 30, 5.0, 0], [4], [500, 40, 70, 2.0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [3000, 1], [5000, 2], [6000, 3], [7000, 6], [0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [13000, 1], [45000, 2], [56000, 3], [87000, 6], [0, 0], [0], [2], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [3], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [0], [4, 'ALL'], [11, 1, 1, 1, 69716, 1000], [11, 1, 1, 5, 76296, 1000], [31, 1, 1, 6, 74926, 1000], [31, 1, 1, 7, 74653, 1000]]
      
      #The mat should have only these values
      MAT = { 1:[100.,40.,30.,20.,0],2:[200.,40.,60.,2.0,0],3:[600,40.,30.,5.0,0],4:[500.,40.,70.,2.0,0]}
      -------------------------------------------------------------------------------
      #Other  block data should be
      {2:[[1000., 0.1],[2000. ,.2],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.1],[5000.,0.2],[6000.,.4],[7000.,.6]}
      
       {2:[[1000., 0.1],[2000. ,.5],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.9],[5000.,0.2],[6000.,.4],[7000.,.6]}
      
      {2:[[0.00,2.0],[.210,2.0],[0.235,3.0]],3:[[0.00,2.0],[.2110,2.0],[0.2135,3.0]]}
      
      ALL = [[ 11,1,1,1,69716.,1000],[ 11,1,1,5,76296.,1000],[ 31,1,1,6,74926.,1000],[ 31, 1,1,7,74653.,1000]]
      
      -------------------------------------------------------------------------------
      LineThick = [0.10000000000000001]
      TOl2 = [0.01]
      TOl1 = [0.050000000000000003]
      ----------------------------------------------------------
      STOP = ['Line', 'and', 'Circle']
      #It should be  STOP = 'Line and Circle'
      ----------------------------------------------------------
      Top2 = [11]
      Top1 = [10]
      Bot4 = [40]
      Bot1 = [20]
      ----------------------------------------------------------
      NOT = [10, 11, 12, 1, 10, 11, 12, 2]
      
      #it should be stored as 
      NOT = [[ 10,11,12,1],[10,11,12,2]]
      
      ----------------------------------------------------------
      Flag2 = [1]
      Flag1 = [1]
      Parameter_range = [1, 1]
      Bot2 = [30]
      DummyFlag1 = [1]
      The above is the description for some of the variables to be stored and the for some cases where we have single data ,need not be created as list.

      Help me in fixing to get the data as mentioned above description
      Last edited by psbasha; Jan 6 '08, 05:24 AM. Reason: Only kept the output

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #18
        Originally posted by psbasha

        Code:
        Description
        >>> 
        ------------------------------------------------------------------
        Ok = [11, 12, 1, 3]  # This should be [[11, 12, 1, 3] ].In this we have one or more 
        
        MAT = [[1], [100, 40, 30, 2.0, 0], [2], [200, 40, 60, 2.0, 0], [3], [600, 40, 30, 5.0, 0], [4], [500, 40, 70, 2.0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [3000, 1], [5000, 2], [6000, 3], [7000, 6], [0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [13000, 1], [45000, 2], [56000, 3], [87000, 6], [0, 0], [0], [2], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [3], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [0], [4, 'ALL'], [11, 1, 1, 1, 69716, 1000], [11, 1, 1, 5, 76296, 1000], [31, 1, 1, 6, 74926, 1000], [31, 1, 1, 7, 74653, 1000]]
        
        #The mat should have only these values
        MAT = { 1:[100.,40.,30.,20.,0],2:[200.,40.,60.,2.0,0],3:[600,40.,30.,5.0,0],4:[500.,40.,70.,2.0,0]}
        -------------------------------------------------------------------------------
        #Other  block data should be
        {2:[[1000., 0.1],[2000. ,.2],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.1],[5000.,0.2],[6000.,.4],[7000.,.6]}
        
         {2:[[1000., 0.1],[2000. ,.5],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.9],[5000.,0.2],[6000.,.4],[7000.,.6]}
        
        {2:[[0.00,2.0],[.210,2.0],[0.235,3.0]],3:[[0.00,2.0],[.2110,2.0],[0.2135,3.0]]}
        
        ALL = [[ 11,1,1,1,69716.,1000],[ 11,1,1,5,76296.,1000],[ 31,1,1,6,74926.,1000],[ 31, 1,1,7,74653.,1000]]
        
        -------------------------------------------------------------------------------
        LineThick = [0.10000000000000001]
        TOl2 = [0.01]
        TOl1 = [0.050000000000000003]
        ----------------------------------------------------------
        STOP = ['Line', 'and', 'Circle']
        #It should be  STOP = 'Line and Circle'
        ----------------------------------------------------------
        Top2 = [11]
        Top1 = [10]
        Bot4 = [40]
        Bot1 = [20]
        ----------------------------------------------------------
        NOT = [10, 11, 12, 1, 10, 11, 12, 2]
        
        #it should be stored as 
        NOT = [[ 10,11,12,1],[10,11,12,2]]
        
        ----------------------------------------------------------
        Flag2 = [1]
        Flag1 = [1]
        Parameter_range = [1, 1]
        Bot2 = [30]
        DummyFlag1 = [1]
        The above is the description for some of the variables to be stored and the for some cases where we have single data ,need not be created as list.

        Help me in fixing to get the data as mentioned above description
        What you are saying is that the parsing must be customized for certain keywords. The code as is will parse all the data in a consistent manner. You must make an effort to solve this problem yourself. Post back your solution and we can try to help you from there.

        Comment

        • psbasha
          Contributor
          • Feb 2007
          • 440

          #19
          Originally posted by bvdet
          What you are saying is that the parsing must be customized for certain keywords. The code as is will parse all the data in a consistent manner. You must make an effort to solve this problem yourself. Post back your solution and we can try to help you from there.
          Code:
          SampleCode
          key_patt = re.compile(r'/([A-Za-z_\-0-9]+)/')
          data_patt = re.compile(r'\d+\.\d+|\d+|-\.\d+|\w+') 
          def parse_data(fn):
              key = None
              bMFlag = False
              iCount = 0
              dataList = []
              dd = {}
              
              matDataDict = {}
              matCDict = {}
              matTDict = {}
              comFactDict = {}
              bMCFlag = False
              bMTFlag = False
              bMatDataBlockFlag = False
              bDataFlag = False
              otherFList =[]
              bmatStartFlag = True
              bmatEndFlag = False
              dataListList = []
              
              lineList = [strip_comments(line.strip()) for line in open(fn).readlines()\
                          if line != '\n' and not line.startswith('$')]
                          
              for line in lineList:
                  m = key_patt.search(line)
                  if m:
                      key = m.group(1)
                      line1 = line[indexList(line, '/')[1]+1:]
                      if key == 'NOT':
                          dataList = [convertType(item) for item in \
                                          data_patt.findall(line1)]
                          dataListList.append(dataList)                
                      else:
                          if data_patt.search(line1):                
                              if dd.has_key(key):
                                  dd[key] = dd[key]+[convertType(item) for item in \
                                                     data_patt.findall(line1)]
                              else:
                                  dd[key] = [convertType(item) for item in \
                                             data_patt.findall(line1)]
                          else:
                              dd[key] = []
                              bMFlag = True
                              bMatDataBlockFlag = True
                  else:
                      if 'ALL' in line:
                          bDataFlag = True
                          bMatDataBlockFlag = False
                      elif bDataFlag:
                          if bDataFlag and line != '\n':
                              line1 = line.split()
                              otherFList.append(line1)
                          elif bDataFlag and '\n':
                              bDataFlag = False
                      elif  bMatDataBlockFlag:                               
                          if line.startswith('0') and  '0.  .0' != line and not line.startswith('0.  .0'):                
                              bMFlag = False                
                              if bMCFlag :
                                  bMCFlag = False
                                  bMTFlag = True                    
                              else:
                                  if bMTFlag :
                                      bMTFlag = False
                                      iCount =0
                                  elif not bMCFlag :
                                      bMCFlag = True                            
                                      iCount =0                                                                
                                  else:
                                      pass                            
                          else:
                              if bMFlag:
                                  m1 = data_patt.search(line)
                                  if m1:
                                      if bmatStartFlag:
                                          dataList = []
                                          list1 = [convertType(n) for n in \
                                                      data_patt.findall(line)]
                                          matID = list1[0]
                                          bmatStartFlag = False
                                      else:
                                          dataList = [convertType(n) for n in \
                                                      data_patt.findall(line)]
                                          matDataDict[matID] = dataList
                                          dataList = []
                                          bmatStartFlag = True                                
                              elif bMCFlag:                    
                                  if iCount ==0:
                                      dataList =[]
                                      line1 = line.split()
                                      matID = int(line1[0])
                                      iCount = iCount + 1                        
                                  #elif '0.  .0' != line and not line.startswith('0.  .0'):
                                  elif not line.startswith('0.  .0'):
                                      line1 = line.split()
                                      dataList.append([float( line1[0]),float(line1[1])])                        
                                  elif  '0.  .0' == line or line.startswith('0.  .0') :
                                      matCDict[matID] = dataList
                                      iCount =0
                              elif bMTFlag:
                                  if iCount ==0:
                                      dataList = []
                                      line1 = line.split()
                                      matID = int(line1[0])
                                      iCount = iCount + 1                                                
                                  elif not line.startswith('0.  .0'):
                                      line1 = line.split()
                                      dataList.append([float( line1[0]),float(line1[1])])                                                
                                  elif '0.  .0' == line or line.startswith('0.  .0'):
                                      matTDict[matID] = dataList
                                      iCount =0
                              elif not bMCFlag and not bMTFlag:
                                  if iCount ==0:
                                      dataList = []
                                      line1 = line.split()
                                      matID = int(line1[0])
                                      iCount = iCount + 1                                                
                                  elif not line.startswith('0.  .0'):
                                      line1 = line.split()                        
                                      dataList.append([float( line1[1]),float( line1[0])])                        
                                  elif  '0.  .0' == line or line.startswith('0.  .0'):
                                      comFactDict[matID] = dataList
                                      iCount =0                    
          
              dd['NOT'] =dataListList
              print 'matDataDict',matDataDict                            
              print 'matCDict',matCDict
              print 'matTDict',matTDict
              print 'comFactDict',comFactDict
              print ',otherFList',otherFList
              return dd
          Please find the solution.Let me know whether this can be done in more precise and better way.

          Thanks
          PSB

          Comment

          • psbasha
            Contributor
            • Feb 2007
            • 440

            #20
            Code:
            Output
            matDataDict {1: [100, 40, 30, 2.0, 0], 2: [200, 40, 60, 2.0, 0], 3: [600, 40, 30, 5.0, 0], 4: [500, 40, 70, 2.0, 0]}
            matCDict {2: [[1000.0, 0.10000000000000001], [2000.0, 0.20000000000000001], [3000.0, 0.29999999999999999], [4000.0, 0.59999999999999998]], 3: [[3000.0, 0.10000000000000001], [5000.0, 0.20000000000000001], [6000.0, 0.40000000000000002], [7000.0, 0.59999999999999998]]}
            matTDict {2: [[1000.0, 0.10000000000000001], [2000.0, 0.5], [3000.0, 0.29999999999999999], [4000.0, 0.59999999999999998]], 3: [[13000.0, 0.90000000000000002], [45000.0, 0.20000000000000001], [56000.0, 0.29999999999999999], [87000.0, 0.59999999999999998]]}
            comFactDict {2: [[0.0, 2.0], [0.20999999999999999, 2.0], [0.23499999999999999, 3.0]], 3: [[0.0, 2.0], [0.21099999999999999, 2.0], [0.23150000000000001, 3.0]]}
            ,otherFList [['11', '1', '1', '1', '69716.', '1000'], ['11', '1', '1', '5', '76296.', '1000'], ['31', '1', '1', '6', '74926.', '1000'], ['31', '1', '1', '7', '74653.', '1000']]
            Ok = [11, 12, 1, 3]
            MAT = []
            LineThick = [0.10000000000000001]
            TOl2 = [0.01]
            TOl1 = [0.050000000000000003]
            STOP = ['Line', 'and', 'Circle']
            Top2 = [11]
            Top1 = [10]
            Bot4 = [40]
            Bot1 = [20]
            NOT = [[10, 11, 12, 1], [10, 11, 12, 2]]
            Flag2 = [1]
            Flag1 = [1]
            Parameter_range = [1, 1]
            Bot2 = [30]
            DummyFlag1 = [1]

            Comment

            • psbasha
              Contributor
              • Feb 2007
              • 440

              #21
              BV,

              Your suggestion is required.

              Thanks
              PSB

              Comment

              • bvdet
                Recognized Expert Specialist
                • Oct 2006
                • 2851

                #22
                Originally posted by psbasha
                BV,

                Your suggestion is required.

                Thanks
                PSB
                I don't have time to do it right now, as work deadlines are approaching. I will try to look at it later.

                Comment

                Working...