Query about using split...URGENT

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • uc_sk

    Query about using split...URGENT

    Hello All
    I am a newbie to PERL language...If i have a file with data of form

    abcd 4 {1,2,3} 3
    lmn- 3 {12,18,19,22} 4

    then i can read them as...
    ($list $listTotal $set $noElements) = split / /

    But if i have a dataset of the form:

    abcd 4 {1,2,3} 3[color=blue]
    >{1,2}
    >{3}[/color]

    lmn- 3 {12,18,19,22} 4[color=blue]
    >{12}
    >{18,19}
    >{19,22}[/color]

    i.e. I have more than two kinds of delimiters, then how should i read it.

    Please, help...I am unable to go ahead with my work because i am stuck here.
    Any kind of input will be appreciated. Thanks a lot in advance....

    ~uc_sk
  • Gunnar Hjalmarsson

    #2
    Re: Query about using split...URGENT

    uc_sk wrote:[color=blue]
    > If i have a file with data of form
    >
    > abcd 4 {1,2,3} 3
    > lmn- 3 {12,18,19,22} 4
    >
    > then i can read them as...
    > ($list $listTotal $set $noElements) = split / /
    >
    > But if i have a dataset of the form:
    >
    > abcd 4 {1,2,3} 3[color=green]
    >>{1,2}
    >>{3}[/color]
    >
    > lmn- 3 {12,18,19,22} 4[color=green]
    >>{12}
    >>{18,19}
    >>{19,22}[/color]
    >
    > i.e. I have more than two kinds of delimiters, then
    > how should i read it.[/color]

    Is

    split /\s+/

    what you are after?

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl

    Comment

    • uc_sk

      #3
      Re: Query about using split...URGENT

      Hello Gunnar
      I didnt get what do you mean by "what you are after?"
      Basically i am trying read variables/fields by teh delimiters and
      instead of using the whole expression, i just want to use may be 1 or
      two fields....let say in my output file i want to print just the $list
      and $set or anyother two variables.

      Thanks
      ~uc_sk

      Gunnar Hjalmarsson <noreply@gunnar .cc> wrote in message news:<5L9ac.546 24$mU6.228337@n ewsb.telia.net> ...[color=blue]
      > uc_sk wrote:[color=green]
      > > If i have a file with data of form
      > >
      > > abcd 4 {1,2,3} 3
      > > lmn- 3 {12,18,19,22} 4
      > >
      > > then i can read them as...
      > > ($list $listTotal $set $noElements) = split / /
      > >
      > > But if i have a dataset of the form:
      > >
      > > abcd 4 {1,2,3} 3[color=darkred]
      > >>{1,2}
      > >>{3}[/color]
      > >
      > > lmn- 3 {12,18,19,22} 4[color=darkred]
      > >>{12}
      > >>{18,19}
      > >>{19,22}[/color]
      > >
      > > i.e. I have more than two kinds of delimiters, then
      > > how should i read it.[/color]
      >
      > Is
      >
      > split /\s+/
      >
      > what you are after?[/color]

      Comment

      • Gunnar Hjalmarsson

        #4
        Re: Query about using split...URGENT

        [ Please do not top post! ]

        uc_sk wrote:[color=blue]
        > Gunnar Hjalmarsson wrote:[color=green]
        >> uc_sk wrote:[color=darkred]
        >>> If i have a file with data of form
        >>>
        >>> abcd 4 {1,2,3} 3
        >>> lmn- 3 {12,18,19,22} 4
        >>>
        >>> then i can read them as...
        >>> ($list $listTotal $set $noElements) = split / /
        >>>
        >>> But if i have a dataset of the form:
        >>>
        >>> abcd 4 {1,2,3} 3
        >>> >{1,2}
        >>> >{3}
        >>>
        >>> lmn- 3 {12,18,19,22} 4
        >>> >{12}
        >>> >{18,19}
        >>> >{19,22}
        >>>
        >>> i.e. I have more than two kinds of delimiters, then
        >>> how should i read it.[/color]
        >>
        >> Is
        >>
        >> split /\s+/
        >>
        >> what you are after?[/color]
        >
        > I didnt get what do you mean by "what you are after?" Basically i
        > am trying read variables/fields by teh delimiters and instead of
        > using the whole expression, i just want to use may be 1 or two
        > fields....let say in my output file i want to print just the $list
        > and $set or anyother two variables.[/color]

        I meant that I wasn't sure of what's the field data and which the
        separators are. Accordingly my suggestion above was a guess.

        Since I'm still not sure, this is another guess:

        split /[\s>]+/

        --
        Gunnar Hjalmarsson
        Email: http://www.gunnar.cc/cgi-bin/contact.pl

        Comment

        • uc_sk

          #5
          Re: Query about using split...URGENT

          Gunnar Hjalmarsson <noreply@gunnar .cc> wrote in message news:<cNfac.546 50$mU6.228175@n ewsb.telia.net> ...[color=blue]
          > [ Please do not top post! ]
          >
          > uc_sk wrote:[color=green]
          > > Gunnar Hjalmarsson wrote:[color=darkred]
          > >> uc_sk wrote:
          > >>> If i have a file with data of form
          > >>>
          > >>> abcd 4 {1,2,3} 3
          > >>> lmn- 3 {12,18,19,22} 4
          > >>>
          > >>> then i can read them as...
          > >>> ($list $listTotal $set $noElements) = split / /
          > >>>
          > >>> But if i have a dataset of the form:
          > >>>
          > >>> abcd 4 {1,2,3} 3
          > >>> >{1,2}
          > >>> >{3}
          > >>>
          > >>> lmn- 3 {12,18,19,22} 4
          > >>> >{12}
          > >>> >{18,19}
          > >>> >{19,22}
          > >>>
          > >>> i.e. I have more than two kinds of delimiters, then
          > >>> how should i read it.
          > >>
          > >> Is
          > >>
          > >> split /\s+/
          > >>
          > >> what you are after?[/color]
          > >
          > > I didnt get what do you mean by "what you are after?" Basically i
          > > am trying read variables/fields by teh delimiters and instead of
          > > using the whole expression, i just want to use may be 1 or two
          > > fields....let say in my output file i want to print just the $list
          > > and $set or anyother two variables.[/color]
          >
          > I meant that I wasn't sure of what's the field data and which the
          > separators are. Accordingly my suggestion above was a guess.
          >
          > Since I'm still not sure, this is another guess:
          >
          > split /[\s>]+/[/color]
          -------------------------------------------

          Hi,
          For my second set of data:

          abcd 4 {1,2,3} 3[color=blue]
          >{1,2}
          >{3}[/color]

          lmn- 3 {12,18,19,22} 4[color=blue]
          >{12}
          >{18,19}
          >{19,22}[/color]

          in the first line the delimiters are "space", and I could split the
          fields in this line but the fields in the second line, I am unable to
          access them or even split them from the first line or otherwise.

          Thanks for all your help...waiting for reply!
          ~uc_sk

          Comment

          • Gunnar Hjalmarsson

            #6
            Re: Query about using split...URGENT

            uc_sk wrote:[color=blue]
            > For my second set of data:
            >
            > abcd 4 {1,2,3} 3[color=green]
            > >{1,2}
            > >{3}[/color]
            >
            > lmn- 3 {12,18,19,22} 4[color=green]
            > >{12}
            > >{18,19}
            > >{19,22}[/color]
            >
            > in the first line the delimiters are "space", and I could split the
            > fields in this line but the fields in the second line, I am unable
            > to access them or even split them from the first line or otherwise.[/color]

            Maybe I'm stupid, but I still don't understand what you mean. Please
            post a *short* but *complete* program that people can copy and run,
            and that illustrates what it is you are trying to do.

            --
            Gunnar Hjalmarsson
            Email: http://www.gunnar.cc/cgi-bin/contact.pl

            Comment

            • uc_sk

              #7
              Re: Query about using split...URGENT

              Gunnar Hjalmarsson <noreply@gunnar .cc> wrote in message news:<qykac.546 80$mU6.228684@n ewsb.telia.net> ...[color=blue]
              > uc_sk wrote:[color=green]
              > > For my second set of data:
              > >
              > > abcd 4 {1,2,3} 3[color=darkred]
              > > >{1,2}
              > > >{3}[/color]
              > >
              > > lmn- 3 {12,18,19,22} 4[color=darkred]
              > > >{12}
              > > >{18,19}
              > > >{19,22}[/color]
              > >
              > > in the first line the delimiters are "space", and I could split the
              > > fields in this line but the fields in the second line, I am unable
              > > to access them or even split them from the first line or otherwise.[/color]
              >
              > Maybe I'm stupid, but I still don't understand what you mean. Please
              > post a *short* but *complete* program that people can copy and run,
              > and that illustrates what it is you are trying to do.[/color]

              Hi Gunnar
              I am extremely sorry for confusing you....but I am in a big trouble
              right now. May be my problem is easy but because i am new to PERL, I
              am making it complicated.... Anyways.....

              FILE1 looks like:

              fhhh--bf-h-gcb 10 {30,31,} 2
              ^[efg]{1}[gh]{3}.{2}[abc]{1}[efg]{1}.{1}[gh]{1}.{1}[fg]{1}[bcd]{1}[abc]{1}
              (1,2,3,4,7,8,10 ,12,13,14,) 10 {23,27,28,30,31 ,36,} 6

              fggg-ca-g--g-b 9 {24,27,36,} 3
              ^[efg]{1}[fg]{3}.{1}[bcd]{1}[ab]{1}.{1}[fg]{1}.{2}[fg]{1}.{1}[abc]{1}
              (1,2,3,4,6,7,9, 12,14,) 9 {17,22,24,27,36 ,38,} 6

              I have given just 2 expressions, each has 9 fields and separated by
              "space". So lets say the name of the fields are F1, F2, F3, F4, F5,
              F6, F7, F8, F9.

              FILE2 looks like:

              ^[efg]{1}[gh]{3}.{2}[abc]{1}[efg]{1}.{1}[gh]{1}.{1}[fg]{1}[bcd]{1}[abc]{1}
              (1,2,3,4,7,8,10 ,12,13,14,) 10
              *{23(5.3,7.7),2 7(8.8,8.4),28(5 .8,6.8),30(5.0, 6.8),31(7.2,9.7 ),36(8.8,5.8),}
              6[color=blue]
              > {23,28,30}
              > {27}
              > {31}
              > {36}[/color]

              ^[efg]{1}[fg]{3}.{1}[bcd]{1}[ab]{1}.{1}[fg]{1}.{2}[fg]{1}.{1}[abc]{1}
              (1,2,3,4,6,7,9, 12,14,) 9
              *{17(1.9,6.8),2 2(5.1,7.4),24(5 .5,8.7),27(8.8, 8.4),36(8.8,5.8 ),38(8.2,3.9),}
              6[color=blue]
              > {17}
              > {22,24}
              > {27}
              > {36,38}[/color]

              In this file we again have 2 expressions.... but some fields are
              different from the ones in FILE1 and some fields which were in FILE1
              are not even in FILE2.....so lets say the fields are F5, F6, F7, F88,
              F9 (these fields are the same as the ones in FILE1 except F88, which
              is different from F8 of FILE1 as i have added something in it). Ok, so
              these 5 fields are separated by "space" but there are more fields in
              FILE2 which are in teh next line and are separated by "next line tab",
              lets name them as F10

              So what i want in my output file is that i want to compare F5 from
              both the files and see if they are same, then concatanate them.....SO
              BASICALLY I WANT THE OUTPUT AS:

              F1 F2 F3 F4
              F5 F6 F7 F8 F9
              F10

              since F10 can be either of 2 lines or 3 lines or may be 10 so i
              thought that i will store all of them in an array and then will
              outputting, i can read one by one and print on separate lines....but
              my problem is that how to go to next line i.e 2nd line of expression 1
              in FILE2.

              I HOPE WHATEVER I WROTE ABOVE MAKES SOME SENSE AND I AM NOT MAKING YOU
              CONFUSE.....i have tried so many ways, that everthing is mixed up in
              my mind. Apologize for the confusion and would really appreciate your
              or anybody's help to take me out of this problem.

              Thanks a ton
              ~uc_sk

              Comment

              • Gunnar Hjalmarsson

                #8
                Re: Query about using split...URGENT

                I still don't see any code. Anyway, this might be a start as regards
                your FILE2:


                #!/usr/bin/perl
                use strict;
                use warnings;

                my @file2;

                open FH, 'FILE2.txt' or die $!
                {
                local $/ = '';
                while (<FH>) {
                my %tmp;
                @tmp{ qw/F5 F6 F7 F8 F9 F10/ } = split /\s+/, $_, 6;
                $tmp{F10} = [ split /\n/, $tmp{F10} ];
                push @file2, \%tmp;
                }
                }
                close FH;

                for my $rec (1..@file2) {
                print "Record $rec\n";
                for ( qw/F5 F6 F7 F8 F9/ ) {
                print "$_: $file2[$rec-1]{$_}\n";
                }
                print 'F10: ', ( join '; ', @{ $file2[$rec-1]{F10} } ), "\n";
                print "\n";
                }

                __END__


                That outputs:

                Record 1
                F5:
                ^[efg]{1}[gh]{3}.{2}[abc]{1}[efg]{1}.{1}[gh]{1}.{1}[fg]{1}[bcd]{1}[abc]{1}
                F6: (1,2,3,4,7,8,10 ,12,13,14,)
                F7: 10
                F8:
                *{23(5.3,7.7),2 7(8.8,8.4),28(5 .8,6.8),30(5.0, 6.8),31(7.2,9.7 ),36(8.8,5.8),}
                F9: 6
                F10: > {23,28,30}; > {27}; > {31}; > {36}

                Record 2
                F5: ^[efg]{1}[fg]{3}.{1}[bcd]{1}[ab]{1}.{1}[fg]{1}.{2}[fg]{1}.{1}[abc]{1}
                F6: (1,2,3,4,6,7,9, 12,14,)
                F7: 9
                F8:
                *{17(1.9,6.8),2 2(5.1,7.4),24(5 .5,8.7),27(8.8, 8.4),36(8.8,5.8 ),38(8.2,3.9),}
                F9: 6
                F10: > {17}; > {22,24}; > {27}; > {36,38}


                HTH

                --
                Gunnar Hjalmarsson
                Email: http://www.gunnar.cc/cgi-bin/contact.pl

                Comment

                • Joe Smith

                  #9
                  Re: Query about using split...URGENT

                  uc_sk wrote:
                  [color=blue]
                  > FILE1 looks like:
                  >
                  > fhhh--bf-h-gcb 10 {30,31,} 2
                  > ^[efg]{1}[gh]{3}.{2}[abc]{1}[efg]{1}.{1}[gh]{1}.{1}[fg]{1}[bcd]{1}[abc]{1}
                  > (1,2,3,4,7,8,10 ,12,13,14,) 10 {23,27,28,30,31 ,36,} 6
                  >
                  > fggg-ca-g--g-b 9 {24,27,36,} 3
                  > ^[efg]{1}[fg]{3}.{1}[bcd]{1}[ab]{1}.{1}[fg]{1}.{2}[fg]{1}.{1}[abc]{1}
                  > (1,2,3,4,6,7,9, 12,14,) 9 {17,22,24,27,36 ,38,} 6[/color]

                  That looks like records separated by blank lines.
                  If you use $/=""; perl will read the input in paragraph mode.
                  [color=blue]
                  > I have given just 2 expressions, each has 9 fields and separated by
                  > "space". So lets say the name of the fields are F1, F2, F3, F4, F5,
                  > F6, F7, F8, F9.[/color]

                  { # Start of block for %info and %/
                  my %info;
                  local $/ = ''; # Set input separator for paragraph mode
                  open IN,'<',"file1" or die "Cannot read file1 - $!\n";
                  while (<IN>) { # Read until blank line
                  ($F1,$F2,$F3,$F 4,$F5,$F6,$F7,$ F8,$F9) = split;
                  $info{$F5} = [ $F1,$F2,$F3,$F4 ,$F6,$F7,$F8,$F 9 ]; # Save array in a hash
                  }; close IN;
                  [color=blue]
                  > FILE2 looks like:
                  >
                  > ^[efg]{1}[gh]{3}.{2}[abc]{1}[efg]{1}.{1}[gh]{1}.{1}[fg]{1}[bcd]{1}[abc]{1}
                  > (1,2,3,4,7,8,10 ,12,13,14,) 10
                  > *{23(5.3,7.7),2 7(8.8,8.4),28(5 .8,6.8),30(5.0, 6.8),31(7.2,9.7 ),36(8.8,5.8),}
                  > 6
                  >[color=green]
                  >>{23,28,30}
                  >>{27}
                  >>{31}
                  >>{36}[/color]
                  >
                  >
                  > ^[efg]{1}[fg]{3}.{1}[bcd]{1}[ab]{1}.{1}[fg]{1}.{2}[fg]{1}.{1}[abc]{1}
                  > (1,2,3,4,6,7,9, 12,14,) 9
                  > *{17(1.9,6.8),2 2(5.1,7.4),24(5 .5,8.7),27(8.8, 8.4),36(8.8,5.8 ),38(8.2,3.9),}
                  > 6
                  >[color=green]
                  >>{17}
                  >>{22,24}
                  >>{27}
                  >>{36,38}[/color]
                  >
                  >
                  > In this file we again have 2 expressions.... but some fields are
                  > different from the ones in FILE1 and some fields which were in FILE1
                  > are not even in FILE2.....so lets say the fields are F5, F6, F7, F88,
                  > F9 (these fields are the same as the ones in FILE1 except F88, which
                  > is different from F8 of FILE1 as i have added something in it). Ok, so
                  > these 5 fields are separated by "space" but there are more fields in
                  > FILE2 which are in teh next line and are separated by "next line tab",
                  > lets name them as F10[/color]

                  open IN,'<',"file2" or die "Cannot read file2 - $!\n";
                  while (<IN>) { # Read until blank line
                  ($F5,$F6_,$F7_, $F8_,$F9_) = split;
                  $F10 = <IN>; # Next paragraph is F10
                  $F10 =~ s/\s*>//gs; # Make it look better (remove \n and '>')
                  [color=blue]
                  > So what i want in my output file is that i want to compare F5 from
                  > both the files and see if they are same, then concatanate them.....SO
                  > BASICALLY I WANT THE OUTPUT AS:
                  >
                  > F1 F2 F3 F4
                  > F5 F6 F7 F8 F9
                  > F10[/color]

                  if (defined $info{$F5}) {
                  ($F1,$F2,$F3,$F 4,$F6,$F7,$F8,$ F9) = @{$info{$F5}} # Get array from hash
                  warn "mismatch on F6" if $F6_ ne $F6;
                  warn "mismatch on F7" if $F7_ ne $F7; # $F8_ and $F8 are different
                  warn "mismatch on F9" if $F9_ ne $F9;
                  print "$F1 $F2 $F3 $F4\n$F5 $F6 $F7 $F8 $F9\n>$F10\n\n" ;
                  }
                  }; close IN;
                  } # End of block for %info and $/
                  [color=blue]
                  > I HOPE WHATEVER I WROTE ABOVE MAKES SOME SENSE AND I AM NOT MAKING YOU
                  > CONFUSE.....i have tried so many ways, that everthing is mixed up in
                  > my mind.[/color]

                  If the records are indeed separated by blank lines, then setting perl's
                  special variable $/ to "" makes it a lot easier, as shown above.

                  From the command line, use
                  perldoc perlvar
                  to discover other special perl variables. They are there to make your
                  life easier.
                  -Joe

                  Comment

                  Working...