how to display the count of html tags based on tag name using regular expression?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • rampdv
    New Member
    • Feb 2013
    • 7

    how to display the count of html tags based on tag name using regular expression?

    Code:
    <html>
    <head>
    </head>
    <h1>ghfghfgh</h1>
    <body>
    ghjhjg hghjgjk
    fghfjh hjkhkjl
    <b>jhjkhjk</b><b>uyyjhyu</b>
    <br/><br/>
    <p>this is paragraph</p>
    <p>this is paragraph1</p><p>this is paragraph2</p>
    </body>
    </html>
    i am trying using below code

    first read the html file after that i use belo regular expression .Here i face problem to retrive the tag that is two times in the line
    Code:
    while(<html>) {
    if( $_ =~/<(\w+)\/?.*?>/gi) {
    push @data,$1;
    }
    }
    can u please suggest how to retrive the tags data those are placed two times in a line be in the sameline
    Last edited by acoder; Feb 14 '13, 02:02 PM. Reason: Please use [code] tags when posting code
  • Rabbit
    Recognized Expert MVP
    • Jan 2007
    • 12517

    #2
    Your question title and your question in your post is different. What are you actually looking for?

    Comment

    • rampdv
      New Member
      • Feb 2013
      • 7

      #3
      i would like to count the tags how many times each tag exists in the html file.Above i wrote code to retrive the each tag and placed into an array.Because it is easy to count how many times each tag name exist with in the array.so please help me above problem.array is not an mandatory,so please provide the regular expression how to read each tag and display how many times each tag exist in html file

      Please provide the regular expression for above question,my solution is only my thought.So,just provide your solution but using regular expression is mandatory,don't use any predefined module for this

      Comment

      • Rabbit
        Recognized Expert MVP
        • Jan 2007
        • 12517

        #4
        A singular regular expression can't give you a count of matches by tag type. The algorithm you will need to implement is along these lines:

        1) You will need one regular expression to return all tags.
        2) Dedupe the matches.
        3) Loop through the deduplicated matches.
        4) Run a regular expression looking for just that tag.
        5) Return the count of matches.
        6) Go to 3.

        Comment

        • rampdv
          New Member
          • Feb 2013
          • 7

          #5
          Hi below code working fine to count the tags based on the html tag name .but i am using below html file for this if u have any doubts please post u r html file here ,i will provide the regular expression based on u r html code

          Code:
          <html>
          <h1>hi this is ramanjaneyulu</h1>
          <b>this is bold text</b>
          <br/>
          <br/>
          <body>this is the body of the html</body><body> this is another body</body>
          <b>this is bold again </b><b> this is another bold </b>
          <head>this is head</head>
          </html>
          
          Below is the program
          
          open(HTM,"checktag.html");
          my @data;
          while(<HTM>) {
          while($_ =~/<(\w+)[>?|(?:(?:.*)?\/)?]>?/gi) {
          push @data,$1;
          
          }
          }
          
           my %hash;
          for($i=0;$i< $#data;$i++) {
          
          if($hash{$data[$i]}){
          
          $hash{$data[$i]}++;
          
          }
          else
          {
          
          $hash{$data[$i]} = 1;
          
          }
          
          }
          
          foreach (keys %hash) {
          print " $_ occurs  $hash{$_} times \n";
          }
          Last edited by Rabbit; Feb 26 '13, 06:43 PM. Reason: Please use code tags when posting code.

          Comment

          Working...