Getting different Regular Expression results

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • psbasha
    Contributor
    • Feb 2007
    • 440

    Getting different Regular Expression results

    Hi ,

    Could anybody help me in fixing this problem.I am getting different results.

    Code:
    Sample1
     import re 
     strLine =' 1 THRU 20    
     sList = re.findall('\d+ THRU \d+|\d+', strLine)
     print sList
    O/P is :
    ['1 THRU 20'] --Which is correct?

    Code:
    Sample2
    import re 
    strLine ='  8001  THRU 10828  '
    sList = re.findall('\d+ THRU \d+|\d+', strLine)
    print sList
    O/P is :
    ['8001', '10828'] --Which is different compare to above one?

    The correct Output should be :

    ['8001 THRU 10828']


    Thanks
    PSB
  • bartonc
    Recognized Expert Expert
    • Sep 2006
    • 6478

    #2
    Originally posted by psbasha
    Hi ,

    Could anybody help me in fixing this problem.I am getting different results.

    Code:
    Sample1
     import re 
     strLine =' 1 THRU 20    
     sList = re.findall('\d+ THRU \d+|\d+', strLine)
     print sList
    O/P is :
    ['1 THRU 20'] --Which is correct?

    Code:
    Sample2
    import re 
    strLine ='  8001  THRU 10828  '
    sList = re.findall('\d+ THRU \d+|\d+', strLine)
    print sList
    O/P is :
    ['8001', '10828'] --Which is different compare to above one?

    The correct Output should be :

    ['8001 THRU 10828']


    Thanks
    PSB
    Code:
    Sample2
    import re 
    strLine ='  8001  THRU 10828  '
    sList = re.findall('\d+ THRU \d+|\d+', strLine)
    print sList
    Your Regular Expression doesn't take into account 2 spaces between '8001' and 'THRU'.

    Comment

    • psbasha
      Contributor
      • Feb 2007
      • 440

      #3
      Thanks Barton,

      How to fix this problem.

      PSB

      Comment

      • bartonc
        Recognized Expert Expert
        • Sep 2006
        • 6478

        #4
        Originally posted by psbasha
        Thanks Barton,

        How to fix this problem.

        PSB
        You are welcome.

        Regular Expressions are a good tool. Well worth the time to learn.

        Comment

        • bvdet
          Recognized Expert Specialist
          • Oct 2006
          • 2851

          #5
          Originally posted by psbasha
          Thanks Barton,

          How to fix this problem.

          PSB
          Code:
          >>> strLine ='  8001  THRU 10828  '
          >>> sList = re.findall('\d+.+THRU.+\d+|\d+', strLine)
          >>> sList
          ['8001  THRU 10828']
          >>>

          Comment

          • psbasha
            Contributor
            • Feb 2007
            • 440

            #6
            Hi BV ,

            Thanks for the reply.

            I have some issue when I am using the string in this format.How to fix this issue?.

            Code:
            Sample1
            import re 
            strLine = '1 = 11001 THRU 11848'
            sList = re.findall('\d+ THRU \d+|\d+', strLine)
            print sList
            O/P is : ['1', '11001 THRU 11848']


            Code:
            Sample
            
            import re 
            strLine = '1 = 11001 THRU 11848'
            sList = re.findall('\d+.+THRU.+\d+|\d+', strLine)
            print sList

            O/P is : ['1 = 11001 THRU 11848']

            I would like to have my Output to be ['1', '11001 THRU 11848'].

            Handling the strLine '14 = 8001 THRU 10828 '

            O/P to be ['14', '8001 THRU 10828' ].

            -PSB

            Comment

            • bvdet
              Recognized Expert Specialist
              • Oct 2006
              • 2851

              #7
              Originally posted by psbasha
              Hi BV ,

              Thanks for the reply.

              I have some issue when I am using the string in this format.How to fix this issue?.

              Code:
              Sample1
              import re 
              strLine = '1 = 11001 THRU 11848'
              sList = re.findall('\d+ THRU \d+|\d+', strLine)
              print sList
              O/P is : ['1', '11001 THRU 11848']


              Code:
              Sample
              
              import re 
              strLine = '1 = 11001 THRU 11848'
              sList = re.findall('\d+.+THRU.+\d+|\d+', strLine)
              print sList

              O/P is : ['1 = 11001 THRU 11848']

              I would like to have my Output to be ['1', '11001 THRU 11848'].

              Handling the strLine '14 = 8001 THRU 10828 '

              O/P to be ['14', '8001 THRU 10828' ].

              -PSB
              Code:
              >>> strLine = '1 = 11001 THRU 11848'
              >>> re.findall(r'\d+ +THRU +\d+|\d+', strLine)
              ['1', '11001 THRU 11848']
              >>> re.findall(r'\d+\s+THRU\s+\d+|\d+', strLine)
              ['1', '11001 THRU 11848']
              >>>
              Notice what happens when I rearrange the expression a bit:
              Code:
              >>> re.findall(r'\d+|\d+\s+THRU\s+\d+', strLine)
              ['1', '11001', '11848']
              >>>

              Comment

              • ghostdog74
                Recognized Expert Contributor
                • Apr 2006
                • 511

                #8
                Originally posted by psbasha
                Hi BV ,

                Thanks for the reply.

                I have some issue when I am using the string in this format.How to fix this issue?.

                Code:
                Sample1
                import re 
                strLine = '1 = 11001 THRU 11848'
                sList = re.findall('\d+ THRU \d+|\d+', strLine)
                print sList
                O/P is : ['1', '11001 THRU 11848']


                Code:
                Sample
                
                import re 
                strLine = '1 = 11001 THRU 11848'
                sList = re.findall('\d+.+THRU.+\d+|\d+', strLine)
                print sList

                O/P is : ['1 = 11001 THRU 11848']

                I would like to have my Output to be ['1', '11001 THRU 11848'].

                Handling the strLine '14 = 8001 THRU 10828 '

                O/P to be ['14', '8001 THRU 10828' ].

                -PSB
                you could have easily got your results with split()
                Code:
                >>> '14 = 8001 THRU 10828'.split(" = ")
                ['14', '8001 THRU 10828']
                >>>

                Comment

                • psbasha
                  Contributor
                  • Feb 2007
                  • 440

                  #9
                  HI BV,

                  Thanks for the reply.

                  Still I have the problem in reading the string data.In my earlier post "Reading and writing a text file " ,I am reading different SETS file format.When I use this regular expression pattern,the earlier file data is not able to read.

                  Could you please refer to my posting "Reading and writing a text file " and let me know the exact pattern.

                  -PSB

                  Comment

                  • bvdet
                    Recognized Expert Specialist
                    • Oct 2006
                    • 2851

                    #10
                    Originally posted by psbasha
                    HI BV,

                    Thanks for the reply.

                    Still I have the problem in reading the string data.In my earlier post "Reading and writing a text file " ,I am reading different SETS file format.When I use this regular expression pattern,the earlier file data is not able to read.

                    Could you please refer to my posting "Reading and writing a text file " and let me know the exact pattern.

                    -PSB
                    PSB - See my last post in the referenced thread. I posted the code for parsing the data as you had described. The only difference between that data and your problem in this thread is extra spaces around keyword "THRU" - is this correct? If so, all you have to do is modify function getThruData(s) slightly as follows:
                    Old code:
                    Code:
                    sList = re.findall('\d+ THRU \d+|\d+', s)
                    New code:
                    Code:
                    sList = re.findall('\d+ +THRU +\d+|\d+', s)
                    The added '+' characters enable a match if one or more spaces occur around "THRU". You really need to study that code to understand how it works.

                    Comment

                    • psbasha
                      Contributor
                      • Feb 2007
                      • 440

                      #11
                      Hi BV,

                      Thanks,it is working fine.

                      How can we decide this "pattern" style while using regular expression?.

                      Whether we have to go with "trail and error" or we have to get the different possibilities of formats and then decide the pattern.

                      -PSB

                      Comment

                      • bvdet
                        Recognized Expert Specialist
                        • Oct 2006
                        • 2851

                        #12
                        Originally posted by psbasha
                        Hi BV,

                        Thanks,it is working fine.

                        How can we decide this "pattern" style while using regular expression?.

                        Whether we have to go with "trail and error" or we have to get the different possibilities of formats and then decide the pattern.

                        -PSB
                        Ideally the data should be in a strict format designed for easy of parsing. My experience has been to determine the different formats then decide how best to parse.

                        Comment

                        • ghostdog74
                          Recognized Expert Contributor
                          • Apr 2006
                          • 511

                          #13
                          Originally posted by psbasha
                          Hi BV,

                          Thanks,it is working fine.

                          How can we decide this "pattern" style while using regular expression?.

                          Whether we have to go with "trail and error" or we have to get the different possibilities of formats and then decide the pattern.

                          -PSB
                          You have to sit down and think through what are the different formats that your input file may turn out and then construct the expression to fit all possible scenarios. Regular expression is a powerful tool, at the same time, may confuse new users. Also, too much of it makes debugging and enhancements to your programs difficult. It takes practice to really understand the mechanics of it too. As the saying goes,
                          Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. —Jamie Zawinski,

                          Comment

                          Working...