Q: Analyse data and provide a report - Arrays?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Troll

    Q: Analyse data and provide a report - Arrays?

    Hi,

    I need to write a script which reads some data and reports the findings.
    Just to give you an idea the structure is similar to the following.

    Data input example:

    HEADING 1
    **********
    ColumnA ColumnB ColumnC ColumnD ColumnE
    Pete Male Marketing Single 40
    Kate Female Marketing Married 30
    John Male Sales Married 38
    Pete Male Sales Single 52
    John Male Sales Single 24


    HEADING 2
    **********
    ColumnF ColumnG ColumnH ColumnI
    whatever
    whatever
    whatever
    whatever


    Report Output example:
    # of Pete's =
    # of Males =
    # of Salespeople =
    # of Singles =
    # of over 35s =


    Since this is the first time I'm even writing such a script I would
    appreciate some pointers.
    1) Do I use arrays or associate arrays for this? Why or why not?
    2) Is it possible for someone to give me a code example of counting how many
    Singles we have?
    3) What happens when I have read all the data under HEADING 1 and need to
    move onto HEADING 2?
    That is, how do I accomplish the jump from what I think is one loop onto the
    next?

    I imagine that there will be many more posts following this one so there's
    no need to get into too much detail. Some guidance would be nice as I will
    need to utilise Google and my references for the rest.

    Thanks in advance.





  • Troll

    #2
    Re: Q: Analyse data and provide a report - Arrays?

    Ga Mu,
    Great stuff - thanks very much. :)

    The headings differentiate blocks of data so once we count everything under
    HEADING 1 we move onto HEADING 2 then HEADING 3 etc.

    Does this help a bit?




    "Ga Mu" <NgamuthO@SPcom cast.netAM> wrote in message
    news:xl44b.3068 60$Ho3.43264@sc crnsc03...[color=blue]
    > Troll wrote:
    >[color=green]
    > > 1) Do I use arrays or associate arrays for this? Why or why not?[/color]
    >
    > Use hashes (aka associative arrays) because they work so well for
    > counting occurences of words. A hash instance is automatically
    > initialized to zero the first time it is used, so, assuming you have
    > already declared the hash %names ('my %names;') and we are in your
    > parsing loop and have extracted the person's name into $name, all you
    > need is:
    >
    > $names{$name}++ ; # increment the count for this name.
    >[color=green]
    > > 2) Is it possible for someone to give me a code example of counting how[/color][/color]
    many[color=blue][color=green]
    > > Singles we have?[/color]
    >
    > You could count everything with hashes.
    >
    > Prior to your parsing loop:
    >
    > my (%names, %sexes, %depts, %m_statuses, %ages);
    >
    > Within your parsing loop:
    >
    > # extract four words and a number into scalars:
    > my ($name, $sex, $dept, $m_status, $age) =
    > /^(\w+) (\w+) (\w+) (\w+) (\d+)$/;
    >
    > # increment counts for each:
    > $names{$name}++ ;
    > $sexes{$sex}++;
    > $depts{$dept}++ ;
    > $m_statuses{$m_ status}++;
    > $ages{$age}++;
    >
    > After your parsing loop:
    >
    > $names{'Pete'} gives the number of Petes.
    > $sexes{'Male'} gives the number of Males.
    > $depts{'Sales'} gives the number of sales people.
    > $m_statuses{'Si ngle'} gives the number of single people.
    > $ages{'25'} gives the number of 25 year-olds.
    >
    > To print a list of all names and the number of occurences of each:
    >
    > foreach $key (keys %names) {
    > print "$key: $names{$key}\n" ;
    > }
    >
    > This will output something like:
    >
    > John: 2
    > Pete: 3
    > Kate: 1
    >
    > This list could have been sorted by either name or count. Do a 'perldoc
    > -f' for 'keys' and 'sort'.
    >[color=green]
    > > 3) What happens when I have read all the data under HEADING 1 and need[/color][/color]
    to[color=blue][color=green]
    > > move onto HEADING 2?
    > > That is, how do I accomplish the jump from what I think is one loop onto[/color][/color]
    the[color=blue][color=green]
    > > next?[/color]
    >
    > Can't answer that, as you don't provide enough detail. What is the
    > significance of the headings? Would the results be the same if the
    > headings were completely ignored or do the headings signify some
    > distinction between blocks of data?
    >
    > Greg
    >[/color]


    Comment

    • Troll

      #3
      Re: Q: Analyse data and provide a report - Arrays?

      Ga Mu,

      Pls disregard last post.

      With regard to the jump between HEADINGS, will it be enough to do something
      like:
      while (<>)
      ....
      if (/HEADING 1/ .. /HEADING 2/) {
      # line falls between HEADING 1 and HEADING 2 in the text, inclusive.
      # then do the string extraction
      # then increment stuff
      elsif (/HEADING 2/ .. /HEADING 3/) {
      # line falls between HEADING 2 and HEADING 3 in the text, inclusive.
      # then do the string extraction
      # then increment stuff
      etc?

      I quite like the code example you provided - actually found a similar one in

      Up until now I was under the impression that I would have to use split - can
      you elaborate why you chose a different approach?

      One other task I have to do is similar to:[color=blue]
      > If a line contains Single in the column then get the single person's name.[/color]
      I sort of came up with:
      foreach $m_statuses{'Si ngle'}
      print $names{$name}

      but that's probably totally wrong. Can you advise?

      Thanks again.


      "Troll" <abuse@microsof t.com> wrote in message
      news:e7b4b.7402 1$bo1.27093@new s-server.bigpond. net.au...[color=blue]
      > Ga Mu,
      > Great stuff - thanks very much. :)
      >
      > The headings differentiate blocks of data so once we count everything[/color]
      under[color=blue]
      > HEADING 1 we move onto HEADING 2 then HEADING 3 etc.
      >
      > Does this help a bit?
      >
      >
      >
      >
      > "Ga Mu" <NgamuthO@SPcom cast.netAM> wrote in message
      > news:xl44b.3068 60$Ho3.43264@sc crnsc03...[color=green]
      > > Troll wrote:
      > >[color=darkred]
      > > > 1) Do I use arrays or associate arrays for this? Why or why not?[/color]
      > >
      > > Use hashes (aka associative arrays) because they work so well for
      > > counting occurences of words. A hash instance is automatically
      > > initialized to zero the first time it is used, so, assuming you have
      > > already declared the hash %names ('my %names;') and we are in your
      > > parsing loop and have extracted the person's name into $name, all you
      > > need is:
      > >
      > > $names{$name}++ ; # increment the count for this name.
      > >[color=darkred]
      > > > 2) Is it possible for someone to give me a code example of counting[/color][/color][/color]
      how[color=blue]
      > many[color=green][color=darkred]
      > > > Singles we have?[/color]
      > >
      > > You could count everything with hashes.
      > >
      > > Prior to your parsing loop:
      > >
      > > my (%names, %sexes, %depts, %m_statuses, %ages);
      > >
      > > Within your parsing loop:
      > >
      > > # extract four words and a number into scalars:
      > > my ($name, $sex, $dept, $m_status, $age) =
      > > /^(\w+) (\w+) (\w+) (\w+) (\d+)$/;
      > >
      > > # increment counts for each:
      > > $names{$name}++ ;
      > > $sexes{$sex}++;
      > > $depts{$dept}++ ;
      > > $m_statuses{$m_ status}++;
      > > $ages{$age}++;
      > >
      > > After your parsing loop:
      > >
      > > $names{'Pete'} gives the number of Petes.
      > > $sexes{'Male'} gives the number of Males.
      > > $depts{'Sales'} gives the number of sales people.
      > > $m_statuses{'Si ngle'} gives the number of single people.
      > > $ages{'25'} gives the number of 25 year-olds.
      > >
      > > To print a list of all names and the number of occurences of each:
      > >
      > > foreach $key (keys %names) {
      > > print "$key: $names{$key}\n" ;
      > > }
      > >
      > > This will output something like:
      > >
      > > John: 2
      > > Pete: 3
      > > Kate: 1
      > >
      > > This list could have been sorted by either name or count. Do a 'perldoc
      > > -f' for 'keys' and 'sort'.
      > >[color=darkred]
      > > > 3) What happens when I have read all the data under HEADING 1 and need[/color][/color]
      > to[color=green][color=darkred]
      > > > move onto HEADING 2?
      > > > That is, how do I accomplish the jump from what I think is one loop[/color][/color][/color]
      onto[color=blue]
      > the[color=green][color=darkred]
      > > > next?[/color]
      > >
      > > Can't answer that, as you don't provide enough detail. What is the
      > > significance of the headings? Would the results be the same if the
      > > headings were completely ignored or do the headings signify some
      > > distinction between blocks of data?
      > >
      > > Greg
      > >[/color]
      >
      >[/color]


      Comment

      • Ga Mu

        #4
        Re: Q: Analyse data and provide a report - Arrays?

        Troll wrote:[color=blue]
        > Ga Mu,
        >
        > Pls disregard last post.
        >
        > With regard to the jump between HEADINGS, will it be enough to do something
        > like:
        > while (<>)
        > ...
        > if (/HEADING 1/ .. /HEADING 2/) {
        > # line falls between HEADING 1 and HEADING 2 in the text, inclusive.
        > # then do the string extraction
        > # then increment stuff
        > elsif (/HEADING 2/ .. /HEADING 3/) {
        > # line falls between HEADING 2 and HEADING 3 in the text, inclusive.
        > # then do the string extraction
        > # then increment stuff
        > etc?[/color]

        I am unclear as to the distinction between blocks. Are there a separate
        group of totals for each heading or is everyting totalled up together?
        If the latter, then simply ignore the headings. If the former, then you
        could parse out the heading name and use a multidimensiona l hash. I.e.,
        replace this:

        $names{$name}++ ;

        with this:

        $names{$heading }{$name}++;
        [color=blue]
        >
        > I quite like the code example you provided - actually found a similar one in
        > http://www.oreilly.com/catalog/perlw...pter/ch08.html
        > Up until now I was under the impression that I would have to use split - can
        > you elaborate why you chose a different approach?[/color]

        Either method produces the same results. If you plan on incorporating
        error checking, m// allows to specifically define a format, e.g., four
        words and a number, whereas split simply breaks a string up into a list.
        Whichever method makes you happy.
        [color=blue]
        > One other task I have to do is similar to:
        >[color=green]
        >>If a line contains Single in the column then get the single person's name.[/color]
        >
        > I sort of came up with:
        > foreach $m_statuses{'Si ngle'}
        > print $names{$name}
        >
        > but that's probably totally wrong. Can you advise?[/color]

        Yes, it is totally wrong. $m_statuses{'Si ngle'} is a scalar. It is the
        count of lines where the marital status is 'Single'. Your foreach loop
        above would produce a syntax error. Although it is not what you're
        after, a valid foreach loop could look like this:

        foreach $m_status ( keys %m_statuses ) {
        #
        # $m_status will be 'female' for one iteration of the loop and 'male'
        # for the other. (Unless you have more than two sexes...)
        #
        }

        Perhaps a more meaningful foreach loop would look like this:

        foreach $age ( keys %ages ) {
        #
        # For each iteration, $age will one the ages that was found in the data
        # -->> IN NO PARTICULAR ORDER <<-- unless you sort it.
        #
        }

        To do what you propose, i.e., print the name of all single people, you
        would have to include the logic for that in the parsing loop:

        # extract four words and a number into scalars:
        my ($name, $sex, $dept, $m_status, $age) =
        /^(\w+) (\w+) (\w+) (\w+) (\d+)$/;

        # increment counts for each:
        $names{$name}++ ;
        $sexes{$sex}++;
        $depts{$dept}++ ;
        $m_statuses{$m_ status}++;
        $ages{$age}++;

        # take special actions:
        if ( $m_status eq 'Single' ) print "$name is single.\n";
        if ( $age >= 40 ) print "$name is over the hill!\n";


        Hope this helps!

        Greg


        Comment

        • Troll

          #5
          Re: Q: Analyse data and provide a report - Arrays?

          Thanks again !

          1)
          Sorry for being too vague. With regard to the HEADINGS they separate blocks
          of data. But because the column names will be different [data is different]
          then I'm not quite sure I could use:
          $names{$heading }{$name}++;

          So I'm looking at creating separate my () definitions for each HEADING and
          just wanted to confirm how to jump out of one HEADING loop and start with
          the next.

          For example, under HEADING 1 we have these columns:
          Name, Sex, Dept, M_Status, Age

          and under HEADING 2we have:
          Address, Phone#, Mobile#, Salary

          So at the beginning of the script I would have
          my (%names, %sexes, %depts, %m_statuses, %ages)
          my (%addresses, %phones, %mobiles, %salaries)
          #then I have my while (<>) and parsing here
          #I have my output at the end

          Is that a little more clearer?


          2)
          With my last question regarding the printing of the names of single people,
          if we include a print statement in the parsing loop would that give us
          something like:
          Pete is single.
          John is single.
          while the parsing is still running?

          What I'm after is hopefully feeding that output into something else
          [@array?] which can then print a list of the names [line by line] at the end
          of the script, something like:
          #this is the output structure
          Number of Petes =
          Number of Males =
          Singles are:
          Pete
          John
          Number of Salespeople =


          Does this make sense?

          Thanks Greg.


          "Ga Mu" <NgamuthO@SPcom cast.netAM> wrote in message
          news:3Gs4b.1414 69$2x.40819@rwc rnsc52.ops.asp. att.net...[color=blue]
          > Troll wrote:[color=green]
          > > Ga Mu,
          > >
          > > Pls disregard last post.
          > >
          > > With regard to the jump between HEADINGS, will it be enough to do[/color][/color]
          something[color=blue][color=green]
          > > like:
          > > while (<>)
          > > ...
          > > if (/HEADING 1/ .. /HEADING 2/) {
          > > # line falls between HEADING 1 and HEADING 2 in the text, inclusive.
          > > # then do the string extraction
          > > # then increment stuff
          > > elsif (/HEADING 2/ .. /HEADING 3/) {
          > > # line falls between HEADING 2 and HEADING 3 in the text, inclusive.
          > > # then do the string extraction
          > > # then increment stuff
          > > etc?[/color]
          >
          > I am unclear as to the distinction between blocks. Are there a separate
          > group of totals for each heading or is everyting totalled up together?
          > If the latter, then simply ignore the headings. If the former, then you
          > could parse out the heading name and use a multidimensiona l hash. I.e.,
          > replace this:
          >
          > $names{$name}++ ;
          >
          > with this:
          >
          > $names{$heading }{$name}++;
          >[color=green]
          > >
          > > I quite like the code example you provided - actually found a similar[/color][/color]
          one in[color=blue][color=green]
          > > http://www.oreilly.com/catalog/perlw...pter/ch08.html
          > > Up until now I was under the impression that I would have to use split -[/color][/color]
          can[color=blue][color=green]
          > > you elaborate why you chose a different approach?[/color]
          >
          > Either method produces the same results. If you plan on incorporating
          > error checking, m// allows to specifically define a format, e.g., four
          > words and a number, whereas split simply breaks a string up into a list.
          > Whichever method makes you happy.
          >[color=green]
          > > One other task I have to do is similar to:
          > >[color=darkred]
          > >>If a line contains Single in the column then get the single person's[/color][/color][/color]
          name.[color=blue][color=green]
          > >
          > > I sort of came up with:
          > > foreach $m_statuses{'Si ngle'}
          > > print $names{$name}
          > >
          > > but that's probably totally wrong. Can you advise?[/color]
          >
          > Yes, it is totally wrong. $m_statuses{'Si ngle'} is a scalar. It is the
          > count of lines where the marital status is 'Single'. Your foreach loop
          > above would produce a syntax error. Although it is not what you're
          > after, a valid foreach loop could look like this:
          >
          > foreach $m_status ( keys %m_statuses ) {
          > #
          > # $m_status will be 'female' for one iteration of the loop and 'male'
          > # for the other. (Unless you have more than two sexes...)
          > #
          > }
          >
          > Perhaps a more meaningful foreach loop would look like this:
          >
          > foreach $age ( keys %ages ) {
          > #
          > # For each iteration, $age will one the ages that was found in the data
          > # -->> IN NO PARTICULAR ORDER <<-- unless you sort it.
          > #
          > }
          >
          > To do what you propose, i.e., print the name of all single people, you
          > would have to include the logic for that in the parsing loop:
          >
          > # extract four words and a number into scalars:
          > my ($name, $sex, $dept, $m_status, $age) =
          > /^(\w+) (\w+) (\w+) (\w+) (\d+)$/;
          >
          > # increment counts for each:
          > $names{$name}++ ;
          > $sexes{$sex}++;
          > $depts{$dept}++ ;
          > $m_statuses{$m_ status}++;
          > $ages{$age}++;
          >
          > # take special actions:
          > if ( $m_status eq 'Single' ) print "$name is single.\n";
          > if ( $age >= 40 ) print "$name is over the hill!\n";
          >
          >
          > Hope this helps!
          >
          > Greg
          >
          >[/color]


          Comment

          • Ga Mu

            #6
            Re: Q: Analyse data and provide a report - Arrays?

            Troll wrote:[color=blue]
            > Thanks again !
            >
            > 1)
            > Sorry for being too vague. With regard to the HEADINGS they separate blocks
            > of data. But because the column names will be different [data is different]
            > then I'm not quite sure I could use:
            > $names{$heading }{$name}++;
            >
            > So I'm looking at creating separate my () definitions for each HEADING and
            > just wanted to confirm how to jump out of one HEADING loop and start with
            > the next.
            >
            > For example, under HEADING 1 we have these columns:
            > Name, Sex, Dept, M_Status, Age
            >
            > and under HEADING 2we have:
            > Address, Phone#, Mobile#, Salary
            >
            > So at the beginning of the script I would have
            > my (%names, %sexes, %depts, %m_statuses, %ages)
            > my (%addresses, %phones, %mobiles, %salaries)
            > #then I have my while (<>) and parsing here
            > #I have my output at the end
            >
            > Is that a little more clearer?[/color]

            Yes. Much clearer. There are a couple of different ways you could do
            this. One is to use a single loop that reads through the file and uses
            a state variable (e.g., $heading) to keep track of where you are in the
            parsing process. The other is to have a separate loop for each heading.
            Again, six of one, half a dozen of another. It's more a matter of
            preference than anything else.

            An example of the first approach:

            my $heading = 'initial';
            my $fin_name = '/usr/local/blah/blah/blah';
            open FIN,$fin_name || die "Can't open $fin_name\n";

            while (<FIN>) {

            # check for a new heading
            # I am assuming single word heading names
            if ( /HEADING (\S+)/ {

            $heading = $1; # set $heading equal to word extracted above

            # take appropriate action based on the heading we are under

            } elsif ( $heading eq 'NAMES' ) {

            ( $name, $sex, $dept, $m_status, $age ) =
            /(\w+) (\w+) (\w+) (\w+) (\d+)/;

            # update counts, append to lists, etc...

            } elsif ( $heading eq 'ADDRESSES' ) {

            # I am assuming the address field is limited to 30 characters
            # here:
            ( $address,$phone , $mobile, $salary ) =
            /(\.{30}) (\S+) (\S+) (\d+)/;

            # update counts, append to lists, etc...

            }

            }


            And the second approach:

            my $heading = 'initial';
            my $fin_name = '/usr/local/blah/blah/blah';
            open FIN,$fin_name || die "Can't open $fin_name\n";

            # scan for first heading
            while ( <FIN> && ! /HEADING NAMES/ );

            # parse the names, etc...
            while ( <FIN> && ! /HEADING ADDRESSES/ ) {

            ( $name, $sex, $dept, $m_status, $age ) =
            /(\w+) (\w+) (\w+) (\w+) (\d+)/;

            # update counts, append to lists, etc...


            # parse the addresses, etc...
            # for brevity , I am assuming only two headings
            while ( <FIN> ) {

            ( $address,$phone , $mobile, $salary ) =
            /(\.{30}) (\S+) (\S+) (\d+)/;

            # update counts, append to lists, etc...

            }
            [color=blue]
            >
            >
            > 2)
            > With my last question regarding the printing of the names of single people,
            > if we include a print statement in the parsing loop would that give us
            > something like:
            > Pete is single.
            > John is single.
            > while the parsing is still running?[/color]

            Yes.
            [color=blue]
            >
            > What I'm after is hopefully feeding that output into something else
            > [@array?] which can then print a list of the names [line by line] at the end
            > of the script, something like:
            > #this is the output structure
            > Number of Petes =
            > Number of Males =
            > Singles are:
            > Pete
            > John
            > Number of Salespeople =
            >
            >
            > Does this make sense?
            >[/color]

            Yes. It would be easy to create a list/array of, e.g., single people.
            Prior to the loop, declare the array. Within the loop, test each person
            for being single. If they are, push them onto the list:

            # prior to your parsing loop, declare array @singles:

            my @singles;

            # within your parsing loop, after parsing out name, status, etc.:

            if ( $m_status eq 'Single' ) push @singles,($name );

            # after loop, to print the list of singles:

            print "Single persons:\n";
            foreach $single_person ( @singles ) print " $single_person\ n";


            Greg

            Comment

            • Troll

              #7
              Re: Q: Analyse data and provide a report - Arrays?

              Wow. I don't know how you get the time to respond to my queries in such
              detail. It is greatly appreciated.
              I just came back from work and it's like 2:30 am so I'll crash out soon and
              have a closer read tomorrow [especially of the HEADINGS part].

              With the push @array stuff I actually got to this today in my readings. I
              saw an example of appending an array onto another array with a push and I
              was wondering if we could just substitute a $variable for one of the arrays.
              I'm glad you confirmed this. :)

              I was also wondering if doing this at the beginning of the script:

              my (%names, %sexes, %depts, %m_statuses, %ages) # declaring things
              locally

              would be considered bad practice. I thought that one should declare things
              as my ( ) if one is using things within a loop so as not to impact anything
              external to the loop. But if one uses variables/arrays both within and
              outside the loops, should we then still declare stuff as my ( )?
              Maybe I'm just confused about my ( )...

              Greg, if you could possibly keep an eye on this thread for the next few days
              I would be very much in your debt. Your help has been invaluabe so far in
              allowing me to visualise quite a few things.

              Thanks very much.


              "Ga Mu" <NgamuthO@SPcom cast.netAM> wrote in message
              news:uRJ4b.1475 42$2x.41412@rwc rnsc52.ops.asp. att.net...[color=blue]
              > Troll wrote:[color=green]
              > > Thanks again !
              > >
              > > 1)
              > > Sorry for being too vague. With regard to the HEADINGS they separate[/color][/color]
              blocks[color=blue][color=green]
              > > of data. But because the column names will be different [data is[/color][/color]
              different][color=blue][color=green]
              > > then I'm not quite sure I could use:
              > > $names{$heading }{$name}++;
              > >
              > > So I'm looking at creating separate my () definitions for each HEADING[/color][/color]
              and[color=blue][color=green]
              > > just wanted to confirm how to jump out of one HEADING loop and start[/color][/color]
              with[color=blue][color=green]
              > > the next.
              > >
              > > For example, under HEADING 1 we have these columns:
              > > Name, Sex, Dept, M_Status, Age
              > >
              > > and under HEADING 2we have:
              > > Address, Phone#, Mobile#, Salary
              > >
              > > So at the beginning of the script I would have
              > > my (%names, %sexes, %depts, %m_statuses, %ages)
              > > my (%addresses, %phones, %mobiles, %salaries)
              > > #then I have my while (<>) and parsing here
              > > #I have my output at the end
              > >
              > > Is that a little more clearer?[/color]
              >
              > Yes. Much clearer. There are a couple of different ways you could do
              > this. One is to use a single loop that reads through the file and uses
              > a state variable (e.g., $heading) to keep track of where you are in the
              > parsing process. The other is to have a separate loop for each heading.
              > Again, six of one, half a dozen of another. It's more a matter of
              > preference than anything else.
              >
              > An example of the first approach:
              >
              > my $heading = 'initial';
              > my $fin_name = '/usr/local/blah/blah/blah';
              > open FIN,$fin_name || die "Can't open $fin_name\n";
              >
              > while (<FIN>) {
              >
              > # check for a new heading
              > # I am assuming single word heading names
              > if ( /HEADING (\S+)/ {
              >
              > $heading = $1; # set $heading equal to word extracted above
              >
              > # take appropriate action based on the heading we are under
              >
              > } elsif ( $heading eq 'NAMES' ) {
              >
              > ( $name, $sex, $dept, $m_status, $age ) =
              > /(\w+) (\w+) (\w+) (\w+) (\d+)/;
              >
              > # update counts, append to lists, etc...
              >
              > } elsif ( $heading eq 'ADDRESSES' ) {
              >
              > # I am assuming the address field is limited to 30 characters
              > # here:
              > ( $address,$phone , $mobile, $salary ) =
              > /(\.{30}) (\S+) (\S+) (\d+)/;
              >
              > # update counts, append to lists, etc...
              >
              > }
              >
              > }
              >
              >
              > And the second approach:
              >
              > my $heading = 'initial';
              > my $fin_name = '/usr/local/blah/blah/blah';
              > open FIN,$fin_name || die "Can't open $fin_name\n";
              >
              > # scan for first heading
              > while ( <FIN> && ! /HEADING NAMES/ );
              >
              > # parse the names, etc...
              > while ( <FIN> && ! /HEADING ADDRESSES/ ) {
              >
              > ( $name, $sex, $dept, $m_status, $age ) =
              > /(\w+) (\w+) (\w+) (\w+) (\d+)/;
              >
              > # update counts, append to lists, etc...
              >
              >
              > # parse the addresses, etc...
              > # for brevity , I am assuming only two headings
              > while ( <FIN> ) {
              >
              > ( $address,$phone , $mobile, $salary ) =
              > /(\.{30}) (\S+) (\S+) (\d+)/;
              >
              > # update counts, append to lists, etc...
              >
              > }
              >[color=green]
              > >
              > >
              > > 2)
              > > With my last question regarding the printing of the names of single[/color][/color]
              people,[color=blue][color=green]
              > > if we include a print statement in the parsing loop would that give us
              > > something like:
              > > Pete is single.
              > > John is single.
              > > while the parsing is still running?[/color]
              >
              > Yes.
              >[color=green]
              > >
              > > What I'm after is hopefully feeding that output into something else
              > > [@array?] which can then print a list of the names [line by line] at the[/color][/color]
              end[color=blue][color=green]
              > > of the script, something like:
              > > #this is the output structure
              > > Number of Petes =
              > > Number of Males =
              > > Singles are:
              > > Pete
              > > John
              > > Number of Salespeople =
              > >
              > >
              > > Does this make sense?
              > >[/color]
              >
              > Yes. It would be easy to create a list/array of, e.g., single people.
              > Prior to the loop, declare the array. Within the loop, test each person
              > for being single. If they are, push them onto the list:
              >
              > # prior to your parsing loop, declare array @singles:
              >
              > my @singles;
              >
              > # within your parsing loop, after parsing out name, status, etc.:
              >
              > if ( $m_status eq 'Single' ) push @singles,($name );
              >
              > # after loop, to print the list of singles:
              >
              > print "Single persons:\n";
              > foreach $single_person ( @singles ) print " $single_person\ n";
              >
              >
              > Greg
              >[/color]


              Comment

              • Troll

                #8
                Re: Q: Analyse data and provide a report - Arrays?

                Now time for some stupid Qs:

                Let's say that the data I have is in a file called employees.
                How can I call this file so that I can parse it?

                1) Can I do:
                @HRdata = `cat employees`;
                while (<@HRdata>) {


                2) With regard to the HEADING sections, the script has to be able to
                recognise the different sections by the following rules:
                # there's a blank line
                before each heading
                HEADING 1 # this is the name of the heading -
                this is a string with a special character and a blank space as part of it
                ColumnA ColumnB ColumnC # these are the column names - these are
                strings which also can inlude a blank space if they have 2 or more words
                ******* # a sort of an underlining
                pattern

                I guess this is to make sure that one does not include any silly heading
                data as part of the arrays created and the parsing only takes place on
                'real' data. Can you pls advise? Or do you need more info? I'm more in
                favour of creating separate 'if' loops due to my 'newbie' status. I'll get
                lost otherwise...

                Thanks.



                "Troll" <abuse@microsof t.com> wrote in message
                news:uRK4b.7709 4$bo1.13700@new s-server.bigpond. net.au...[color=blue]
                > Wow. I don't know how you get the time to respond to my queries in such
                > detail. It is greatly appreciated.
                > I just came back from work and it's like 2:30 am so I'll crash out soon[/color]
                and[color=blue]
                > have a closer read tomorrow [especially of the HEADINGS part].
                >
                > With the push @array stuff I actually got to this today in my readings. I
                > saw an example of appending an array onto another array with a push and I
                > was wondering if we could just substitute a $variable for one of the[/color]
                arrays.[color=blue]
                > I'm glad you confirmed this. :)
                >
                > I was also wondering if doing this at the beginning of the script:
                >
                > my (%names, %sexes, %depts, %m_statuses, %ages) # declaring things
                > locally
                >
                > would be considered bad practice. I thought that one should declare things
                > as my ( ) if one is using things within a loop so as not to impact[/color]
                anything[color=blue]
                > external to the loop. But if one uses variables/arrays both within and
                > outside the loops, should we then still declare stuff as my ( )?
                > Maybe I'm just confused about my ( )...
                >
                > Greg, if you could possibly keep an eye on this thread for the next few[/color]
                days[color=blue]
                > I would be very much in your debt. Your help has been invaluabe so far in
                > allowing me to visualise quite a few things.
                >
                > Thanks very much.
                >
                >
                > "Ga Mu" <NgamuthO@SPcom cast.netAM> wrote in message
                > news:uRJ4b.1475 42$2x.41412@rwc rnsc52.ops.asp. att.net...[color=green]
                > > Troll wrote:[color=darkred]
                > > > Thanks again !
                > > >
                > > > 1)
                > > > Sorry for being too vague. With regard to the HEADINGS they separate[/color][/color]
                > blocks[color=green][color=darkred]
                > > > of data. But because the column names will be different [data is[/color][/color]
                > different][color=green][color=darkred]
                > > > then I'm not quite sure I could use:
                > > > $names{$heading }{$name}++;
                > > >
                > > > So I'm looking at creating separate my () definitions for each HEADING[/color][/color]
                > and[color=green][color=darkred]
                > > > just wanted to confirm how to jump out of one HEADING loop and start[/color][/color]
                > with[color=green][color=darkred]
                > > > the next.
                > > >
                > > > For example, under HEADING 1 we have these columns:
                > > > Name, Sex, Dept, M_Status, Age
                > > >
                > > > and under HEADING 2we have:
                > > > Address, Phone#, Mobile#, Salary
                > > >
                > > > So at the beginning of the script I would have
                > > > my (%names, %sexes, %depts, %m_statuses, %ages)
                > > > my (%addresses, %phones, %mobiles, %salaries)
                > > > #then I have my while (<>) and parsing here
                > > > #I have my output at the end
                > > >
                > > > Is that a little more clearer?[/color]
                > >
                > > Yes. Much clearer. There are a couple of different ways you could do
                > > this. One is to use a single loop that reads through the file and uses
                > > a state variable (e.g., $heading) to keep track of where you are in the
                > > parsing process. The other is to have a separate loop for each heading.
                > > Again, six of one, half a dozen of another. It's more a matter of
                > > preference than anything else.
                > >
                > > An example of the first approach:
                > >
                > > my $heading = 'initial';
                > > my $fin_name = '/usr/local/blah/blah/blah';
                > > open FIN,$fin_name || die "Can't open $fin_name\n";
                > >
                > > while (<FIN>) {
                > >
                > > # check for a new heading
                > > # I am assuming single word heading names
                > > if ( /HEADING (\S+)/ {
                > >
                > > $heading = $1; # set $heading equal to word extracted above
                > >
                > > # take appropriate action based on the heading we are under
                > >
                > > } elsif ( $heading eq 'NAMES' ) {
                > >
                > > ( $name, $sex, $dept, $m_status, $age ) =
                > > /(\w+) (\w+) (\w+) (\w+) (\d+)/;
                > >
                > > # update counts, append to lists, etc...
                > >
                > > } elsif ( $heading eq 'ADDRESSES' ) {
                > >
                > > # I am assuming the address field is limited to 30 characters
                > > # here:
                > > ( $address,$phone , $mobile, $salary ) =
                > > /(\.{30}) (\S+) (\S+) (\d+)/;
                > >
                > > # update counts, append to lists, etc...
                > >
                > > }
                > >
                > > }
                > >
                > >
                > > And the second approach:
                > >
                > > my $heading = 'initial';
                > > my $fin_name = '/usr/local/blah/blah/blah';
                > > open FIN,$fin_name || die "Can't open $fin_name\n";
                > >
                > > # scan for first heading
                > > while ( <FIN> && ! /HEADING NAMES/ );
                > >
                > > # parse the names, etc...
                > > while ( <FIN> && ! /HEADING ADDRESSES/ ) {
                > >
                > > ( $name, $sex, $dept, $m_status, $age ) =
                > > /(\w+) (\w+) (\w+) (\w+) (\d+)/;
                > >
                > > # update counts, append to lists, etc...
                > >
                > >
                > > # parse the addresses, etc...
                > > # for brevity , I am assuming only two headings
                > > while ( <FIN> ) {
                > >
                > > ( $address,$phone , $mobile, $salary ) =
                > > /(\.{30}) (\S+) (\S+) (\d+)/;
                > >
                > > # update counts, append to lists, etc...
                > >
                > > }
                > >[color=darkred]
                > > >
                > > >
                > > > 2)
                > > > With my last question regarding the printing of the names of single[/color][/color]
                > people,[color=green][color=darkred]
                > > > if we include a print statement in the parsing loop would that give us
                > > > something like:
                > > > Pete is single.
                > > > John is single.
                > > > while the parsing is still running?[/color]
                > >
                > > Yes.
                > >[color=darkred]
                > > >
                > > > What I'm after is hopefully feeding that output into something else
                > > > [@array?] which can then print a list of the names [line by line] at[/color][/color][/color]
                the[color=blue]
                > end[color=green][color=darkred]
                > > > of the script, something like:
                > > > #this is the output structure
                > > > Number of Petes =
                > > > Number of Males =
                > > > Singles are:
                > > > Pete
                > > > John
                > > > Number of Salespeople =
                > > >
                > > >
                > > > Does this make sense?
                > > >[/color]
                > >
                > > Yes. It would be easy to create a list/array of, e.g., single people.
                > > Prior to the loop, declare the array. Within the loop, test each person
                > > for being single. If they are, push them onto the list:
                > >
                > > # prior to your parsing loop, declare array @singles:
                > >
                > > my @singles;
                > >
                > > # within your parsing loop, after parsing out name, status, etc.:
                > >
                > > if ( $m_status eq 'Single' ) push @singles,($name );
                > >
                > > # after loop, to print the list of singles:
                > >
                > > print "Single persons:\n";
                > > foreach $single_person ( @singles ) print " $single_person\ n";
                > >
                > >
                > > Greg
                > >[/color]
                >
                >[/color]


                Comment

                • Troll

                  #9
                  Re: Q: Analyse data and provide a report - Arrays?

                  I'm getting heaps of the following errors when I run my script:
                  Use of uninitialized value in hash element at ...

                  The beginning of my script looks like:
                  my(%names, %sexes, %depts);
                  %names = ("name" => "0");
                  %sexes = ("sex" => "0");
                  %depts = ("dept" => "0");

                  $names = '0';
                  $sexes = '0';
                  $depts = '0';
                  $name = '0';
                  $sex = '0';
                  $dept = '0';

                  while (<>)
                  #and the parsing loop here...


                  The hash errors relate to only these 3 lines which are part of the parsing
                  loop:
                  $names{$name}++ ;
                  $sexes{$sex}++;
                  $depts{$dept}++ ;


                  Can you run over the variable declarations/initializations for me as I'm not
                  sure I'm doing this right?
                  Thanks.


                  "Troll" <abuse@microsof t.com> wrote in message
                  news:eh_4b.7824 8$bo1.24286@new s-server.bigpond. net.au...[color=blue]
                  > Now time for some stupid Qs:
                  >
                  > Let's say that the data I have is in a file called employees.
                  > How can I call this file so that I can parse it?
                  >
                  > 1) Can I do:
                  > @HRdata = `cat employees`;
                  > while (<@HRdata>) {
                  >
                  >
                  > 2) With regard to the HEADING sections, the script has to be able to
                  > recognise the different sections by the following rules:
                  > # there's a blank[/color]
                  line[color=blue]
                  > before each heading
                  > HEADING 1 # this is the name of the[/color]
                  heading -[color=blue]
                  > this is a string with a special character and a blank space as part of it
                  > ColumnA ColumnB ColumnC # these are the column names - these are
                  > strings which also can inlude a blank space if they have 2 or more words
                  > ******* # a sort of an underlining
                  > pattern
                  >
                  > I guess this is to make sure that one does not include any silly heading
                  > data as part of the arrays created and the parsing only takes place on
                  > 'real' data. Can you pls advise? Or do you need more info? I'm more in
                  > favour of creating separate 'if' loops due to my 'newbie' status. I'll get
                  > lost otherwise...
                  >
                  > Thanks.
                  >
                  >
                  >
                  > "Troll" <abuse@microsof t.com> wrote in message
                  > news:uRK4b.7709 4$bo1.13700@new s-server.bigpond. net.au...[color=green]
                  > > Wow. I don't know how you get the time to respond to my queries in such
                  > > detail. It is greatly appreciated.
                  > > I just came back from work and it's like 2:30 am so I'll crash out soon[/color]
                  > and[color=green]
                  > > have a closer read tomorrow [especially of the HEADINGS part].
                  > >
                  > > With the push @array stuff I actually got to this today in my readings.[/color][/color]
                  I[color=blue][color=green]
                  > > saw an example of appending an array onto another array with a push and[/color][/color]
                  I[color=blue][color=green]
                  > > was wondering if we could just substitute a $variable for one of the[/color]
                  > arrays.[color=green]
                  > > I'm glad you confirmed this. :)
                  > >
                  > > I was also wondering if doing this at the beginning of the script:
                  > >
                  > > my (%names, %sexes, %depts, %m_statuses, %ages) # declaring[/color][/color]
                  things[color=blue][color=green]
                  > > locally
                  > >
                  > > would be considered bad practice. I thought that one should declare[/color][/color]
                  things[color=blue][color=green]
                  > > as my ( ) if one is using things within a loop so as not to impact[/color]
                  > anything[color=green]
                  > > external to the loop. But if one uses variables/arrays both within and
                  > > outside the loops, should we then still declare stuff as my ( )?
                  > > Maybe I'm just confused about my ( )...
                  > >
                  > > Greg, if you could possibly keep an eye on this thread for the next few[/color]
                  > days[color=green]
                  > > I would be very much in your debt. Your help has been invaluabe so far[/color][/color]
                  in[color=blue][color=green]
                  > > allowing me to visualise quite a few things.
                  > >
                  > > Thanks very much.
                  > >
                  > >
                  > > "Ga Mu" <NgamuthO@SPcom cast.netAM> wrote in message
                  > > news:uRJ4b.1475 42$2x.41412@rwc rnsc52.ops.asp. att.net...[color=darkred]
                  > > > Troll wrote:
                  > > > > Thanks again !
                  > > > >
                  > > > > 1)
                  > > > > Sorry for being too vague. With regard to the HEADINGS they separate[/color]
                  > > blocks[color=darkred]
                  > > > > of data. But because the column names will be different [data is[/color]
                  > > different][color=darkred]
                  > > > > then I'm not quite sure I could use:
                  > > > > $names{$heading }{$name}++;
                  > > > >
                  > > > > So I'm looking at creating separate my () definitions for each[/color][/color][/color]
                  HEADING[color=blue][color=green]
                  > > and[color=darkred]
                  > > > > just wanted to confirm how to jump out of one HEADING loop and start[/color]
                  > > with[color=darkred]
                  > > > > the next.
                  > > > >
                  > > > > For example, under HEADING 1 we have these columns:
                  > > > > Name, Sex, Dept, M_Status, Age
                  > > > >
                  > > > > and under HEADING 2we have:
                  > > > > Address, Phone#, Mobile#, Salary
                  > > > >
                  > > > > So at the beginning of the script I would have
                  > > > > my (%names, %sexes, %depts, %m_statuses, %ages)
                  > > > > my (%addresses, %phones, %mobiles, %salaries)
                  > > > > #then I have my while (<>) and parsing here
                  > > > > #I have my output at the end
                  > > > >
                  > > > > Is that a little more clearer?
                  > > >
                  > > > Yes. Much clearer. There are a couple of different ways you could do
                  > > > this. One is to use a single loop that reads through the file and[/color][/color][/color]
                  uses[color=blue][color=green][color=darkred]
                  > > > a state variable (e.g., $heading) to keep track of where you are in[/color][/color][/color]
                  the[color=blue][color=green][color=darkred]
                  > > > parsing process. The other is to have a separate loop for each[/color][/color][/color]
                  heading.[color=blue][color=green][color=darkred]
                  > > > Again, six of one, half a dozen of another. It's more a matter of
                  > > > preference than anything else.
                  > > >
                  > > > An example of the first approach:
                  > > >
                  > > > my $heading = 'initial';
                  > > > my $fin_name = '/usr/local/blah/blah/blah';
                  > > > open FIN,$fin_name || die "Can't open $fin_name\n";
                  > > >
                  > > > while (<FIN>) {
                  > > >
                  > > > # check for a new heading
                  > > > # I am assuming single word heading names
                  > > > if ( /HEADING (\S+)/ {
                  > > >
                  > > > $heading = $1; # set $heading equal to word extracted above
                  > > >
                  > > > # take appropriate action based on the heading we are under
                  > > >
                  > > > } elsif ( $heading eq 'NAMES' ) {
                  > > >
                  > > > ( $name, $sex, $dept, $m_status, $age ) =
                  > > > /(\w+) (\w+) (\w+) (\w+) (\d+)/;
                  > > >
                  > > > # update counts, append to lists, etc...
                  > > >
                  > > > } elsif ( $heading eq 'ADDRESSES' ) {
                  > > >
                  > > > # I am assuming the address field is limited to 30 characters
                  > > > # here:
                  > > > ( $address,$phone , $mobile, $salary ) =
                  > > > /(\.{30}) (\S+) (\S+) (\d+)/;
                  > > >
                  > > > # update counts, append to lists, etc...
                  > > >
                  > > > }
                  > > >
                  > > > }
                  > > >
                  > > >
                  > > > And the second approach:
                  > > >
                  > > > my $heading = 'initial';
                  > > > my $fin_name = '/usr/local/blah/blah/blah';
                  > > > open FIN,$fin_name || die "Can't open $fin_name\n";
                  > > >
                  > > > # scan for first heading
                  > > > while ( <FIN> && ! /HEADING NAMES/ );
                  > > >
                  > > > # parse the names, etc...
                  > > > while ( <FIN> && ! /HEADING ADDRESSES/ ) {
                  > > >
                  > > > ( $name, $sex, $dept, $m_status, $age ) =
                  > > > /(\w+) (\w+) (\w+) (\w+) (\d+)/;
                  > > >
                  > > > # update counts, append to lists, etc...
                  > > >
                  > > >
                  > > > # parse the addresses, etc...
                  > > > # for brevity , I am assuming only two headings
                  > > > while ( <FIN> ) {
                  > > >
                  > > > ( $address,$phone , $mobile, $salary ) =
                  > > > /(\.{30}) (\S+) (\S+) (\d+)/;
                  > > >
                  > > > # update counts, append to lists, etc...
                  > > >
                  > > > }
                  > > >
                  > > > >
                  > > > >
                  > > > > 2)
                  > > > > With my last question regarding the printing of the names of single[/color]
                  > > people,[color=darkred]
                  > > > > if we include a print statement in the parsing loop would that give[/color][/color][/color]
                  us[color=blue][color=green][color=darkred]
                  > > > > something like:
                  > > > > Pete is single.
                  > > > > John is single.
                  > > > > while the parsing is still running?
                  > > >
                  > > > Yes.
                  > > >
                  > > > >
                  > > > > What I'm after is hopefully feeding that output into something else
                  > > > > [@array?] which can then print a list of the names [line by line] at[/color][/color]
                  > the[color=green]
                  > > end[color=darkred]
                  > > > > of the script, something like:
                  > > > > #this is the output structure
                  > > > > Number of Petes =
                  > > > > Number of Males =
                  > > > > Singles are:
                  > > > > Pete
                  > > > > John
                  > > > > Number of Salespeople =
                  > > > >
                  > > > >
                  > > > > Does this make sense?
                  > > > >
                  > > >
                  > > > Yes. It would be easy to create a list/array of, e.g., single people.
                  > > > Prior to the loop, declare the array. Within the loop, test each[/color][/color][/color]
                  person[color=blue][color=green][color=darkred]
                  > > > for being single. If they are, push them onto the list:
                  > > >
                  > > > # prior to your parsing loop, declare array @singles:
                  > > >
                  > > > my @singles;
                  > > >
                  > > > # within your parsing loop, after parsing out name, status, etc.:
                  > > >
                  > > > if ( $m_status eq 'Single' ) push @singles,($name );
                  > > >
                  > > > # after loop, to print the list of singles:
                  > > >
                  > > > print "Single persons:\n";
                  > > > foreach $single_person ( @singles ) print " $single_person\ n";
                  > > >
                  > > >
                  > > > Greg
                  > > >[/color]
                  > >
                  > >[/color]
                  >
                  >[/color]


                  Comment

                  • Ga Mu

                    #10
                    Re: Q: Analyse data and provide a report - Arrays?

                    Troll wrote:[color=blue]
                    > Greg,
                    > I decided to give you a glimpse at the code itself so as to make it clearer.
                    > Just be aware that the variable/array names have changed but the general
                    > idea is the same.
                    > The hash errors refer to the variables in the increment section.
                    >
                    > #!/usr/bin/perl -w
                    >
                    > open(NET, "netstat|") || die ("Cannot run netstat: $!");
                    >
                    > my(%UDP4localad dresses, %UDP4remoteaddr esses, %UDP4states);
                    >
                    > $UDP4localaddre ss = '0';
                    > $UDP4remoteaddr ess = '0';
                    > $UDP4state = '0';
                    >[/color]

                    Why are you doing this (above)? This is initializing three variables to
                    zero. These three variables have nothing to do with the three variables
                    of the same name in the while loop.
                    [color=blue]
                    > $UDP4localaddre sses = '0';
                    > $UDP4remoteaddr esses = '0';
                    > $UDP4states = '0';
                    >[/color]

                    Why are you doing this (above)? This is initializing three scalars to
                    zero. These three scalars have the same name, but have nothing else to
                    do with the hashes of the same name.
                    [color=blue]
                    > $UDP4localaddre sses{$UDP4local address} = '0';
                    > $UDP4remoteaddr esses{$UDP4remo teaddress} = '0';
                    > $UDP4states = ($UDP4state} = '0';
                    >[/color]

                    Instances of hash keys are automatically initialized to zero. That is
                    what makes them perfect for counting occurences of unknown words,
                    numbers, etc. And even if you had to initialize them, you are
                    initilizing $UDP4localaddre sses{0} to zero.
                    [color=blue]
                    > while (<NET>) {
                    > my($UDP4localad dress, $UDP4remoteaddr ess, $UDP4state)=
                    > /(\s+) (\s+) (\s+)$/;
                    >
                    > #increments start here
                    > $UDP4localaddre sses{$UDP4local address}++;
                    > $UDP4remoteaddr esses{$UDP4remo teaddress}++;
                    > $UDP4states = ($UDP4state}++;[/color]

                    If the increments above are failing, it is probably because your m// is
                    failing and one or more of the keys (variable inside the {}) are
                    undefined. Try putting a print statement before the increments and
                    print each of the variables you are extracting, then play with the
                    regular expression until you get values for ALL of them.
                    [color=blue]
                    > }
                    >
                    > #here comes the output
                    >
                    >
                    > Can you pls criticise my futile attempt to get this going? As one can see,
                    > I'm not that clear on initializations ...
                    >
                    >[/color]

                    Comment

                    • Ga Mu

                      #11
                      Re: Q: Analyse data and provide a report - Arrays?

                      Troll wrote:
                      [color=blue]
                      > Now time for some stupid Qs:
                      >
                      > Let's say that the data I have is in a file called employees.
                      > How can I call this file so that I can parse it?
                      >
                      > 1) Can I do:
                      > @HRdata = `cat employees`;
                      > while (<@HRdata>) {[/color]

                      The above is considered bad practice, especially if the file is large.
                      Why read the entire file into memory when you can read, process, and
                      discard a line at a time..? To open and read a file:

                      open (FIN, '<employess') || die "blah blah blah...";

                      while (<FIN>) {


                      }
                      [color=blue]
                      >
                      >
                      > 2) With regard to the HEADING sections, the script has to be able to
                      > recognise the different sections by the following rules:
                      > # there's a blank line
                      > before each heading
                      > HEADING 1 # this is the name of the heading -
                      > this is a string with a special character and a blank space as part of it
                      > ColumnA ColumnB ColumnC # these are the column names - these are
                      > strings which also can inlude a blank space if they have 2 or more words
                      > ******* # a sort of an underlining
                      > pattern
                      >[/color]

                      while (<FIN>) {

                      if ( /^$/ ) {

                      # this is a blank line, don't do anything

                      } elsif ( /HEADING (\.+)/ ) {

                      # this is a heading, with the heading name in $1

                      } elsif ( (($name, $sex, $status, $age) = /(\s+) (\s+) (\s+) (\d+)/) ==
                      4 ) {

                      # this line contains three words and a number, do whatever
                      # (I'm not really sure if this will work. My Linux box is
                      # down and I have no way of testing.)

                      }

                      } # end of while(<FIN>)
                      [color=blue]
                      > I guess this is to make sure that one does not include any silly heading
                      > data as part of the arrays created and the parsing only takes place on
                      > 'real' data. Can you pls advise? Or do you need more info? I'm more in
                      > favour of creating separate 'if' loops due to my 'newbie' status. I'll get
                      > lost otherwise...
                      >[/color]

                      "if loops"...? How does one make an if loop?
                      [color=blue]
                      > Thanks.
                      >
                      >
                      >
                      > "Troll" <abuse@microsof t.com> wrote in message
                      > news:uRK4b.7709 4$bo1.13700@new s-server.bigpond. net.au...
                      >[color=green]
                      >>Wow. I don't know how you get the time to respond to my queries in such
                      >>detail. It is greatly appreciated.
                      >>I just came back from work and it's like 2:30 am so I'll crash out soon[/color]
                      >
                      > and
                      >[color=green]
                      >>have a closer read tomorrow [especially of the HEADINGS part].
                      >>
                      >>With the push @array stuff I actually got to this today in my readings. I
                      >>saw an example of appending an array onto another array with a push and I
                      >>was wondering if we could just substitute a $variable for one of the[/color]
                      >
                      > arrays.
                      >[color=green]
                      >>I'm glad you confirmed this. :)
                      >>
                      >>I was also wondering if doing this at the beginning of the script:
                      >>
                      >>my (%names, %sexes, %depts, %m_statuses, %ages) # declaring things
                      >>locally
                      >>
                      >>would be considered bad practice. I thought that one should declare things
                      >>as my ( ) if one is using things within a loop so as not to impact[/color]
                      >
                      > anything
                      >[color=green]
                      >>external to the loop. But if one uses variables/arrays both within and
                      >>outside the loops, should we then still declare stuff as my ( )?
                      >>Maybe I'm just confused about my ( )...
                      >>
                      >>Greg, if you could possibly keep an eye on this thread for the next few[/color]
                      >
                      > days
                      >[color=green]
                      >>I would be very much in your debt. Your help has been invaluabe so far in
                      >>allowing me to visualise quite a few things.
                      >>
                      >>Thanks very much.
                      >>
                      >>
                      >>"Ga Mu" <NgamuthO@SPcom cast.netAM> wrote in message
                      >>news:uRJ4b.14 7542$2x.41412@r wcrnsc52.ops.as p.att.net...
                      >>[color=darkred]
                      >>>Troll wrote:
                      >>>
                      >>>>Thanks again !
                      >>>>
                      >>>>1)
                      >>>>Sorry for being too vague. With regard to the HEADINGS they separate[/color]
                      >>
                      >>blocks
                      >>[color=darkred]
                      >>>>of data. But because the column names will be different [data is[/color]
                      >>
                      >>different]
                      >>[color=darkred]
                      >>>>then I'm not quite sure I could use:
                      >>>>$names{$hea ding}{$name}++;
                      >>>>
                      >>>>So I'm looking at creating separate my () definitions for each HEADING[/color]
                      >>
                      >>and
                      >>[color=darkred]
                      >>>>just wanted to confirm how to jump out of one HEADING loop and start[/color]
                      >>
                      >>with
                      >>[color=darkred]
                      >>>>the next.
                      >>>>
                      >>>>For example, under HEADING 1 we have these columns:
                      >>>>Name, Sex, Dept, M_Status, Age
                      >>>>
                      >>>>and under HEADING 2we have:
                      >>>>Address, Phone#, Mobile#, Salary
                      >>>>
                      >>>>So at the beginning of the script I would have
                      >>>>my (%names, %sexes, %depts, %m_statuses, %ages)
                      >>>>my (%addresses, %phones, %mobiles, %salaries)
                      >>>>#then I have my while (<>) and parsing here
                      >>>>#I have my output at the end
                      >>>>
                      >>>>Is that a little more clearer?
                      >>>
                      >>>Yes. Much clearer. There are a couple of different ways you could do
                      >>>this. One is to use a single loop that reads through the file and uses
                      >>>a state variable (e.g., $heading) to keep track of where you are in the
                      >>>parsing process. The other is to have a separate loop for each heading.
                      >>> Again, six of one, half a dozen of another. It's more a matter of
                      >>>preference than anything else.
                      >>>
                      >>>An example of the first approach:
                      >>>
                      >>>my $heading = 'initial';
                      >>>my $fin_name = '/usr/local/blah/blah/blah';
                      >>>open FIN,$fin_name || die "Can't open $fin_name\n";
                      >>>
                      >>>while (<FIN>) {
                      >>>
                      >>> # check for a new heading
                      >>> # I am assuming single word heading names
                      >>> if ( /HEADING (\S+)/ {
                      >>>
                      >>> $heading = $1; # set $heading equal to word extracted above
                      >>>
                      >>> # take appropriate action based on the heading we are under
                      >>>
                      >>> } elsif ( $heading eq 'NAMES' ) {
                      >>>
                      >>> ( $name, $sex, $dept, $m_status, $age ) =
                      >>> /(\w+) (\w+) (\w+) (\w+) (\d+)/;
                      >>>
                      >>> # update counts, append to lists, etc...
                      >>>
                      >>> } elsif ( $heading eq 'ADDRESSES' ) {
                      >>>
                      >>> # I am assuming the address field is limited to 30 characters
                      >>> # here:
                      >>> ( $address,$phone , $mobile, $salary ) =
                      >>> /(\.{30}) (\S+) (\S+) (\d+)/;
                      >>>
                      >>> # update counts, append to lists, etc...
                      >>>
                      >>> }
                      >>>
                      >>>}
                      >>>
                      >>>
                      >>>And the second approach:
                      >>>
                      >>>my $heading = 'initial';
                      >>>my $fin_name = '/usr/local/blah/blah/blah';
                      >>>open FIN,$fin_name || die "Can't open $fin_name\n";
                      >>>
                      >>># scan for first heading
                      >>>while ( <FIN> && ! /HEADING NAMES/ );
                      >>>
                      >>># parse the names, etc...
                      >>>while ( <FIN> && ! /HEADING ADDRESSES/ ) {
                      >>>
                      >>> ( $name, $sex, $dept, $m_status, $age ) =
                      >>> /(\w+) (\w+) (\w+) (\w+) (\d+)/;
                      >>>
                      >>> # update counts, append to lists, etc...
                      >>>
                      >>>
                      >>># parse the addresses, etc...
                      >>># for brevity , I am assuming only two headings
                      >>>while ( <FIN> ) {
                      >>>
                      >>> ( $address,$phone , $mobile, $salary ) =
                      >>> /(\.{30}) (\S+) (\S+) (\d+)/;
                      >>>
                      >>> # update counts, append to lists, etc...
                      >>>
                      >>>}
                      >>>
                      >>>
                      >>>>
                      >>>>2)
                      >>>>With my last question regarding the printing of the names of single[/color]
                      >>
                      >>people,
                      >>[color=darkred]
                      >>>>if we include a print statement in the parsing loop would that give us
                      >>>>something like:
                      >>>>Pete is single.
                      >>>>John is single.
                      >>>>while the parsing is still running?
                      >>>
                      >>>Yes.
                      >>>
                      >>>
                      >>>>What I'm after is hopefully feeding that output into something else
                      >>>>[@array?] which can then print a list of the names [line by line] at[/color][/color]
                      >
                      > the
                      >[color=green]
                      >>end
                      >>[color=darkred]
                      >>>>of the script, something like:
                      >>>>#this is the output structure
                      >>>>Number of Petes =
                      >>>>Number of Males =
                      >>>>Singles are:
                      >>>>Pete
                      >>>>John
                      >>>>Number of Salespeople =
                      >>>>
                      >>>>
                      >>>>Does this make sense?
                      >>>>
                      >>>
                      >>>Yes. It would be easy to create a list/array of, e.g., single people.
                      >>>Prior to the loop, declare the array. Within the loop, test each person
                      >>>for being single. If they are, push them onto the list:
                      >>>
                      >>># prior to your parsing loop, declare array @singles:
                      >>>
                      >>>my @singles;
                      >>>
                      >>># within your parsing loop, after parsing out name, status, etc.:
                      >>>
                      >>>if ( $m_status eq 'Single' ) push @singles,($name );
                      >>>
                      >>># after loop, to print the list of singles:
                      >>>
                      >>>print "Single persons:\n";
                      >>>foreach $single_person ( @singles ) print " $single_person\ n";
                      >>>
                      >>>
                      >>>Greg
                      >>>[/color]
                      >>
                      >>[/color]
                      >
                      >[/color]

                      Comment

                      • Troll

                        #12
                        Re: Q: Analyse data and provide a report - Arrays?

                        Thanks again :)

                        Will I get these errors:
                        Use of uninitialized value in print at ./netstat.pl line 16, <NET> line 1.
                        Use of uninitialized value in print at ./netstat.pl line 17, <NET> line 1.
                        Use of uninitialized value in print at ./netstat.pl line 18, <NET> line 1.
                        ....etc

                        if an undefined value is passed, for example, to $UDP4localaddre ss?
                        Because if that's the case then all I need to do is to make sure that
                        whatever I'm passing as part of the m()// is correctly split and defined as
                        a string, digit, word etc, yes?


                        "Ga Mu" <NgamuthO@SPcom cast.netAM> wrote in message
                        news:SM25b.2516 63$cF.79266@rwc rnsc53...[color=blue]
                        > Troll wrote:[color=green]
                        > > Greg,
                        > > I decided to give you a glimpse at the code itself so as to make it[/color][/color]
                        clearer.[color=blue][color=green]
                        > > Just be aware that the variable/array names have changed but the general
                        > > idea is the same.
                        > > The hash errors refer to the variables in the increment section.
                        > >
                        > > #!/usr/bin/perl -w
                        > >
                        > > open(NET, "netstat|") || die ("Cannot run netstat: $!");
                        > >
                        > > my(%UDP4localad dresses, %UDP4remoteaddr esses, %UDP4states);
                        > >
                        > > $UDP4localaddre ss = '0';
                        > > $UDP4remoteaddr ess = '0';
                        > > $UDP4state = '0';
                        > >[/color]
                        >
                        > Why are you doing this (above)? This is initializing three variables to
                        > zero. These three variables have nothing to do with the three variables
                        > of the same name in the while loop.
                        >[color=green]
                        > > $UDP4localaddre sses = '0';
                        > > $UDP4remoteaddr esses = '0';
                        > > $UDP4states = '0';
                        > >[/color]
                        >
                        > Why are you doing this (above)? This is initializing three scalars to
                        > zero. These three scalars have the same name, but have nothing else to
                        > do with the hashes of the same name.
                        >[color=green]
                        > > $UDP4localaddre sses{$UDP4local address} = '0';
                        > > $UDP4remoteaddr esses{$UDP4remo teaddress} = '0';
                        > > $UDP4states = ($UDP4state} = '0';
                        > >[/color]
                        >
                        > Instances of hash keys are automatically initialized to zero. That is
                        > what makes them perfect for counting occurences of unknown words,
                        > numbers, etc. And even if you had to initialize them, you are
                        > initilizing $UDP4localaddre sses{0} to zero.
                        >[color=green]
                        > > while (<NET>) {
                        > > my($UDP4localad dress, $UDP4remoteaddr ess, $UDP4state)=
                        > > /(\s+) (\s+) (\s+)$/;
                        > >
                        > > #increments start here
                        > > $UDP4localaddre sses{$UDP4local address}++;
                        > > $UDP4remoteaddr esses{$UDP4remo teaddress}++;
                        > > $UDP4states = ($UDP4state}++;[/color]
                        >
                        > If the increments above are failing, it is probably because your m// is
                        > failing and one or more of the keys (variable inside the {}) are
                        > undefined. Try putting a print statement before the increments and
                        > print each of the variables you are extracting, then play with the
                        > regular expression until you get values for ALL of them.
                        >[color=green]
                        > > }
                        > >
                        > > #here comes the output
                        > >
                        > >
                        > > Can you pls criticise my futile attempt to get this going? As one can[/color][/color]
                        see,[color=blue][color=green]
                        > > I'm not that clear on initializations ...
                        > >
                        > >[/color]
                        >[/color]


                        Comment

                        • John Bokma

                          #13
                          Re: Q: Analyse data and provide a report - Arrays?

                          Ga Mu wrote:

                          [color=blue]
                          > while (<FIN>) {
                          >
                          > if ( /^$/ ) {
                          >
                          > # this is a blank line, don't do anything[/color]


                          next if /^\s*$/; # skip blank lines (or consisting of white space
                          # only)
                          [color=blue]
                          > } elsif ( /HEADING (\.+)/ ) {
                          >
                          > # this is a heading, with the heading name in $1[/color]


                          if (/ .....) {

                          # this is a heading
                          next;
                          }
                          [color=blue]
                          > } elsif ( (($name, $sex, $status, $age) = /(\s+) (\s+) (\s+) (\d+)/) ==[/color]


                          if (......) {

                          # bla bla
                          next;
                          }

                          next moves on to the next "while step".

                          --
                          Kind regards, feel free to mail: mail(at)johnbok ma.com (or reply)
                          virtual home: http://johnbokma.com/ ICQ: 218175426
                          John web site hints: http://johnbokma.com/websitedesign/

                          Comment

                          • Ga Mu

                            #14
                            Re: Q: Analyse data and provide a report - Arrays?

                            Troll wrote:
                            [color=blue]
                            > Thanks again :)
                            >
                            > Will I get these errors:
                            > Use of uninitialized value in print at ./netstat.pl line 16, <NET> line 1.
                            > Use of uninitialized value in print at ./netstat.pl line 17, <NET> line 1.
                            > Use of uninitialized value in print at ./netstat.pl line 18, <NET> line 1.
                            > ...etc
                            >
                            > if an undefined value is passed, for example, to $UDP4localaddre ss?
                            > Because if that's the case then all I need to do is to make sure that
                            > whatever I'm passing as part of the m()// is correctly split and defined as
                            > a string, digit, word etc, yes?
                            >[/color]

                            Exactly. Experiment with your re in the m// until you get values.


                            Comment

                            • Ga Mu

                              #15
                              Re: Q: Analyse data and provide a report - Arrays?

                              Troll wrote:
                              [color=blue]
                              > Thanks again. No reading files into memory from now on [unless necessary] :)
                              >
                              > The data will actually be read from stdin in the form of
                              > $ netstat | netstat.pl
                              > or
                              > $ netstat.pl < netstat
                              >
                              > Will something like this suffice?
                              > #!/usr/bin/perl -w
                              > while (<STDIN>) {[/color]

                              STDIN is the default file handle, so all you need is:

                              while (<>) {

                              }
                              [color=blue][color=green]
                              >>"if loops"...? How does one make an if loop?[/color]
                              >
                              > What I meant here is that I'll create 4 separate 'if' sections [with their
                              > own elsif branches], one for each HEADING section [there are 4 of them].
                              > So I think I meant 'if' statements...is that better or I am still confusing
                              > my terminology?[/color]

                              Makes more sense...

                              Comment

                              Working...