Perl pattern matching

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pkn876
    New Member
    • May 2010
    • 6

    Perl pattern matching

    I have a string of characters ex: fhjhejfherhfehk ehkeh

    I want to retain the first "h" and delete all the other characters how do i do it ?
  • numberwhun
    Recognized Expert Moderator Specialist
    • May 2007
    • 3467

    #2
    What have you tried thus far? We would like to see what road you are going down instead of just giving you the code. Its a better way to learn. Then, we can guide you.

    Regards,

    Jeff

    Comment

    • pkn876
      New Member
      • May 2010
      • 6

      #3
      To start with i am a newbie in perl.
      I am aware of how to delete the first “h “character and keep the remaining as it is, using ^
      But I wanted to know how I do the inverse of this i.e keep the first “h” character and remove the remaining "h" characters

      Comment

      • chaarmann
        Recognized Expert Contributor
        • Nov 2007
        • 785

        #4
        regex: s/(.*h[^h]*)h/$1/g
        Algorhithm: just read to the second "h" from the beginning on. then copy everything except the second "h".
        repeat this regex until nothing will be replaced anymore.

        there is also another way: a single regex with negative lookahead that needs no looping : look backward from the current 'h'. If there exists an 'h' left of it, delete the current, else don't delete it (that means regex will not match).

        Also a third way:
        step 1: replace all multiple h with a double one h+ --> hh
        step 2 replace first 'hh' with a single one. (^.*h)h --> $1
        step 3: replace all hh with nothing: hh -->

        Also a fourth way:
        step 1: find first h and store its position.
        step 2: use regex after this position, to replace all h with nothing .

        Fifth way:
        ...

        there are many other ways which come to my mind, but I am tired of writing them down all. Just give a hint how you like it.

        Most likely you want to know how to do the regex with the negative lookahead (the shortest coding), right?
        Here is an example how negative lookahead is used, so you can learn and do it yourself (If you still have problems, come back)

        This Regex:
        (?<!\.)(\d+?)(? =(\d{3})+\D) Replace all with: $1
        didn't work as expected. Why? (it should format a number inside a text like "distance 1234567.1234567 meter" into "distance 1,234,567.12345 67 meter").

        Try to correct it yourself, so you learn a lot.
        Solution:
        I corrected it to this one and it worked:
        (?<!\.\d{0,100} )(\d+?)(?=(\d{3 })+($|\D)) Replace all with: $1

        Is this also possible without the restriction of 100 maximum characters after the dot?

        Comment

        • pkn876
          New Member
          • May 2010
          • 6

          #5
          Can you please give some pointers to the concept of regex with negative lookahead basics, it would really help.

          My main application of perl is for pattern matching, some tutorial on this also would help

          Comment

          • numberwhun
            Recognized Expert Moderator Specialist
            • May 2007
            • 3467

            #6
            @pkn876 There are plenty of Perl Regular Expression tutorials out there. All you have to do is search Google and you will find them.

            You say your a newbie and want to do negative look aheads. You really need to start practicing with regex's and work with them. They take practice to understand. To get to and understand the look aheads and look behinds, you will need to have a good understanding of how regex's work.

            I would start with the link I provided above.

            Regards,

            Jeff

            Comment

            • RonB
              Recognized Expert Contributor
              • Jun 2009
              • 589

              #7
              Mastering Regular Expressions

              Comment

              • pkn876
                New Member
                • May 2010
                • 6

                #8
                I started with writing some simple programmes to understand the look ahead and look behind concepts.

                I basically have a file which has lot of lines of characters, i loop through each line for some pattern matching.
                Code:
                while(<IN1>) {
                        chomp($_);
                        if (/cat(?=\s)/) {
                            s/cat/dog/;
                        }
                    }
                For ex:
                catacatbcatccat dagduefgvdcat sfjjgja

                the code is written such that the o/p should have been:
                catacatbcatccat dagduefgvddog sfjjgja

                but the o/p i am seeing is:

                dogacatbcatccat dagduefgvdcat sfjjgja

                It is replacing the first cat

                Anything wrong with the piece of code?
                Last edited by numberwhun; May 24 '10, 02:44 AM. Reason: Please use CODE TAGS!!!

                Comment

                • RonB
                  Recognized Expert Contributor
                  • Jun 2009
                  • 589

                  #9
                  Code:
                  use strict;
                  use warnings;
                  use 5.010;
                  
                  my $str = 'catacatbcatccatdagduefgvdcat sfjjgja';
                  
                  if ($str =~ s/cat(?=\s)/dog/) {
                      say $str;
                  }
                  else {
                      say "no match";
                  }

                  Comment

                  • pkn876
                    New Member
                    • May 2010
                    • 6

                    #10
                    I tried it on vi with simple regular expression it says "pattern not found"
                    My file has:
                    catacatbcatccat dagduefgvdcat sfjjgja

                    I used:

                    :s/cat(?=\s)/dog/

                    Getting the following error:

                    E486: Pattern not found:cat(?=\s)

                    Comment

                    • RonB
                      Recognized Expert Contributor
                      • Jun 2009
                      • 589

                      #11
                      Please post a short but complete script that demonstrates the problem.

                      Here's my example that uses your sample data.

                      Code:
                      D:\perl>type test.pl
                      #!/usr/bin/perl
                      
                      use strict;
                      use warnings;
                      use 5.010;
                      
                      my $str = 'catacatbcatccatdagduefgvdcat sfjjgja';
                      
                      if ($str =~ s/cat(?=\s)/dog/) {
                          say $str;
                      }
                      else {
                          say "no match";
                      }
                      D:\perl>test.pl
                      catacatbcatccat dagduefgvddog sfjjgja

                      Comment

                      • pkn876
                        New Member
                        • May 2010
                        • 6

                        #12
                        This is the code.
                        The input file passed in the argument contains the character $str.
                        Code:
                        #!/usr/bin/perl -w
                        
                        my $testlist = $ARGV[0];
                        open(IN, "$testlist") || die "cannot open test list:$testlist";
                        open(OUT, ">$outfile") || die "cannot open output file:$outfile";
                        
                        
                        while(<IN>) {
                                chomp($_);
                                if (/cat(?=\s)/){
                                s/cat/dog/;}
                        
                        printf OUT <<EOF
                        $_
                        EOF
                        ;
                        }
                        close(IN);
                        close(OUT);
                        Last edited by numberwhun; May 24 '10, 02:45 AM. Reason: Please use CODE TAGS!!!

                        Comment

                        • RonB
                          Recognized Expert Contributor
                          • Jun 2009
                          • 589

                          #13
                          I fail to see why you think using 2 different regexs would accomplish your goal.

                          Did you try the regex in my example?

                          Comment

                          • numberwhun
                            Recognized Expert Moderator Specialist
                            • May 2007
                            • 3467

                            #14
                            @pkn876 You really need to please learn to use CODE TAGS!!

                            They are required around any and all code that you post into the forums. I have replaced it in your posts here, but in the future you need to use them. This is your only warning.

                            Regards,

                            Jeff

                            Comment

                            Working...