string match/discard issue

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jacob600
    New Member
    • Feb 2008
    • 2

    string match/discard issue

    Hello,

    I am very new to Perl. I have been trying to modify an existing script that is used to read tcpdump files and then eventually generate Top Talker stats from it. If my tcpdump file from System-A looks like this, it works fine:
    Code:
    02:02:28.578439 IP 10.56.252.32.ssh > 10.56.252.66.3722: tcp 116
    02:02:28.544248 IP 10.56.252.66.3722 > 10.56.252.32.ssh: tcp 0
    02:02:28.550995 IP 10.69.252.2.hsrp > 224.0.0.2.hsrp: UDP, length 20

    On System-B and newer systems, the tcpdump output is attaching two extra columns as shown here:
    Code:
    12:50:23.671341 10.62.58.112.ssh > 10.62.119.87.1397: tcp 68 (DF) [tos 0x10] 
    12:50:23.672143 10.62.119.87.1397 > 10.62.58.112.ssh: tcp 0 (DF) tail-type  255 len 255 f5 type 255 len 255
    12:50:23.696168 0:50:56:47:7c:64 Broadcast 74: 
    12:50:23.696404 127.2.0.1.4401 > 127.2.0.2.32838: tcp 264 (DF)
    These two extra fields are the (DF) and another optional field. I know to find out how to modify my script to read everything that it normally does but once it hurst that first "(" then ignore/drop the rest. Here is my current code snippet that breaks on the newer tcpdump files. The "if ($rest =~ /^IP\s+(\d+\.\" is the line in question.

    [CODE=perl]
    1. open(INFILE,"$i nfile");
      while (<INFILE>) {
      $line=$_;
      chop($line);
      if ($line=~ /^(\d+)\:(\d+)\: (\d+)\.\d+\s+(. *)/) {
      $newhour=$1; $min=$2; $sec=$3; $rest=$4;
      # check time; if hour goes DOWN, must be a new day
      if ($hour > $newhour) { $newhour=$newho ur+24; }
      $hour=$newhour;
      if ($time eq 0) { $inittime = 60*$hour+$min; }
      $time=int((60*$ hour+$min)/$timeblock);
      #new event
      # look for IP addresses
      if ($rest =~ /^IP\s+(\d+\.\d+ \.\d+\.\d+)\.(\ S+)\s+\>\s+(\d+ \.\d+\.\d+\.\d+ )\.(\S+)\:\s+(\ S+)\s+(\S+)/) {
      $srcip=$1;
      $srcport=$2;
      $dstip=$3;
      $dstport=$4;
      $proto=$5;

      # find replies
      if ($con{"$proto,$ dstip,$srcip,$s rcport"} > 0) {
      if (($dstport =~ /^\d+$/) && ($dstport > 1023)) {
      if (($srcport =~ /\D/) || ($dstport < 1150)) {
      # is probably reply packet; switch src & dst
      #print "$srcport, $dstport\n" unless ($srcport =~ /netbios/);
      $temp=$srcport;
      $srcport=$dstpo rt;
      $dstport=$temp;
      $temp=$srcip;
      $srcip=$dstip;
      $dstip=$temp;
      }
      }
      }
      #cleaning up formatting
    [/CODE]

    Thank you,
    Last edited by eWish; Feb 28 '08, 12:02 AM. Reason: Fixed Code Tags
  • KevinADC
    Recognized Expert Specialist
    • Jan 2007
    • 4092

    #2
    I'm confused, is the code you posted to parse the old format or the new format? Your code is looking for 'IP' in the new format? But there is no 'IP' in the new format?

    Keep in mind that parenthesis are used to capture patterns in memory in a regexp, so if you have to match parenthesis in the search string you must escape them :

    Code:
    /\( foo\) /;
    if you want to match and capture parenthesis:

    Code:
    /(\( foo\) )/;
    print $1;

    Comment

    • nithinpes
      Recognized Expert Contributor
      • Dec 2007
      • 410

      #3
      I could see that 'IP' is missing in tcpdump output from SystemB. Make the pattern 'IP' optional(?) in your search. Other than that, the regex you have used will meet your objective.
      Code:
      if ($rest =~ /^(IP\s+)?(\d+\.\d+\.\d+\.\d+)\.(\S+)\s+\>\s+(\d+\.\d+\.\d+\.\d+)\.(\S+)\:\s+(\S+)\s+(\S+)/)  {
                  $srcip=$2;
                  $srcport=$3;
                  $dstip=$4;
                  $dstport=$5;
                  $proto=$6;
      If you deliberately want to ignore pattern after "(", modifying your first regex as below would do that:
      Code:
      if ($line=~ /(\d+)\:(\d+)\:(\d+)\.\d+\s+([^(]*)/) {  ## match anything, not "("

      Comment

      • jacob600
        New Member
        • Feb 2008
        • 2

        #4
        I actually didn't even notice that the word "IP" was missing on the new captures. Either way I appreciate the help.

        Thank you

        Comment

        Working...