Query about using split...URGENT

**Gunnar Hjalmarsson** · Jul 19 '05, 05:01 AM

Re: Query about using split...URGENT

uc_sk wrote:[color=blue]
> If i have a file with data of form
>
> abcd 4 {1,2,3} 3
> lmn- 3 {12,18,19,22} 4
>
> then i can read them as...
> ($list $listTotal $set $noElements) = split / /
>
> But if i have a dataset of the form:
>
> abcd 4 {1,2,3} 3[color=green]
>>{1,2}
>>{3}[/color]
>
> lmn- 3 {12,18,19,22} 4[color=green]
>>{12}
>>{18,19}
>>{19,22}[/color]
>
> i.e. I have more than two kinds of delimiters, then
> how should i read it.[/color]

Is

split /\s+/

what you are after?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

**uc_sk** · Jul 19 '05, 05:01 AM

Re: Query about using split...URGENT

Hello Gunnar
I didnt get what do you mean by "what you are after?"
Basically i am trying read variables/fields by teh delimiters and
instead of using the whole expression, i just want to use may be 1 or
two fields....let say in my output file i want to print just the $list
and $set or anyother two variables.

Thanks
~uc_sk

Gunnar Hjalmarsson <noreply@gunnar .cc> wrote in message news:<5L9ac.546 24$mU6.228337@n ewsb.telia.net> ...[color=blue]
> uc_sk wrote:[color=green]
> > If i have a file with data of form
> >
> > abcd 4 {1,2,3} 3
> > lmn- 3 {12,18,19,22} 4
> >
> > then i can read them as...
> > ($list $listTotal $set $noElements) = split / /
> >
> > But if i have a dataset of the form:
> >
> > abcd 4 {1,2,3} 3[color=darkred]
> >>{1,2}
> >>{3}[/color]
> >
> > lmn- 3 {12,18,19,22} 4[color=darkred]
> >>{12}
> >>{18,19}
> >>{19,22}[/color]
> >
> > i.e. I have more than two kinds of delimiters, then
> > how should i read it.[/color]
>
> Is
>
> split /\s+/
>
> what you are after?[/color]

**Gunnar Hjalmarsson** · Jul 19 '05, 05:01 AM

Re: Query about using split...URGENT

[ Please do not top post! ]

uc_sk wrote:[color=blue]
> Gunnar Hjalmarsson wrote:[color=green]
>> uc_sk wrote:[color=darkred]
>>> If i have a file with data of form
>>>
>>> abcd 4 {1,2,3} 3
>>> lmn- 3 {12,18,19,22} 4
>>>
>>> then i can read them as...
>>> ($list $listTotal $set $noElements) = split / /
>>>
>>> But if i have a dataset of the form:
>>>
>>> abcd 4 {1,2,3} 3
>>> >{1,2}
>>> >{3}
>>>
>>> lmn- 3 {12,18,19,22} 4
>>> >{12}
>>> >{18,19}
>>> >{19,22}
>>>
>>> i.e. I have more than two kinds of delimiters, then
>>> how should i read it.[/color]
>>
>> Is
>>
>> split /\s+/
>>
>> what you are after?[/color]
>
> I didnt get what do you mean by "what you are after?" Basically i
> am trying read variables/fields by teh delimiters and instead of
> using the whole expression, i just want to use may be 1 or two
> fields....let say in my output file i want to print just the $list
> and $set or anyother two variables.[/color]

I meant that I wasn't sure of what's the field data and which the
separators are. Accordingly my suggestion above was a guess.

Since I'm still not sure, this is another guess:

split /[\s>]+/

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

**uc_sk** · Jul 19 '05, 05:01 AM

Re: Query about using split...URGENT

Gunnar Hjalmarsson <noreply@gunnar .cc> wrote in message news:<cNfac.546 50$mU6.228175@n ewsb.telia.net> ...[color=blue]
> [ Please do not top post! ]
>
> uc_sk wrote:[color=green]
> > Gunnar Hjalmarsson wrote:[color=darkred]
> >> uc_sk wrote:
> >>> If i have a file with data of form
> >>>
> >>> abcd 4 {1,2,3} 3
> >>> lmn- 3 {12,18,19,22} 4
> >>>
> >>> then i can read them as...
> >>> ($list $listTotal $set $noElements) = split / /
> >>>
> >>> But if i have a dataset of the form:
> >>>
> >>> abcd 4 {1,2,3} 3
> >>> >{1,2}
> >>> >{3}
> >>>
> >>> lmn- 3 {12,18,19,22} 4
> >>> >{12}
> >>> >{18,19}
> >>> >{19,22}
> >>>
> >>> i.e. I have more than two kinds of delimiters, then
> >>> how should i read it.
> >>
> >> Is
> >>
> >> split /\s+/
> >>
> >> what you are after?[/color]
> >
> > I didnt get what do you mean by "what you are after?" Basically i
> > am trying read variables/fields by teh delimiters and instead of
> > using the whole expression, i just want to use may be 1 or two
> > fields....let say in my output file i want to print just the $list
> > and $set or anyother two variables.[/color]
>
> I meant that I wasn't sure of what's the field data and which the
> separators are. Accordingly my suggestion above was a guess.
>
> Since I'm still not sure, this is another guess:
>
> split /[\s>]+/[/color]
-------------------------------------------

Hi,
For my second set of data:

abcd 4 {1,2,3} 3[color=blue]
>{1,2}
>{3}[/color]

lmn- 3 {12,18,19,22} 4[color=blue]
>{12}
>{18,19}
>{19,22}[/color]

in the first line the delimiters are "space", and I could split the
fields in this line but the fields in the second line, I am unable to
access them or even split them from the first line or otherwise.

Thanks for all your help...waiting for reply!
~uc_sk

**Gunnar Hjalmarsson** · Jul 19 '05, 05:01 AM

Re: Query about using split...URGENT

uc_sk wrote:[color=blue]
> For my second set of data:
>
> abcd 4 {1,2,3} 3[color=green]
> >{1,2}
> >{3}[/color]
>
> lmn- 3 {12,18,19,22} 4[color=green]
> >{12}
> >{18,19}
> >{19,22}[/color]
>
> in the first line the delimiters are "space", and I could split the
> fields in this line but the fields in the second line, I am unable
> to access them or even split them from the first line or otherwise.[/color]

Maybe I'm stupid, but I still don't understand what you mean. Please
post a *short* but *complete* program that people can copy and run,
and that illustrates what it is you are trying to do.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

**uc_sk** · Jul 19 '05, 05:01 AM

Re: Query about using split...URGENT

Gunnar Hjalmarsson <noreply@gunnar .cc> wrote in message news:<qykac.546 80$mU6.228684@n ewsb.telia.net> ...[color=blue]
> uc_sk wrote:[color=green]
> > For my second set of data:
> >
> > abcd 4 {1,2,3} 3[color=darkred]
> > >{1,2}
> > >{3}[/color]
> >
> > lmn- 3 {12,18,19,22} 4[color=darkred]
> > >{12}
> > >{18,19}
> > >{19,22}[/color]
> >
> > in the first line the delimiters are "space", and I could split the
> > fields in this line but the fields in the second line, I am unable
> > to access them or even split them from the first line or otherwise.[/color]
>
> Maybe I'm stupid, but I still don't understand what you mean. Please
> post a *short* but *complete* program that people can copy and run,
> and that illustrates what it is you are trying to do.[/color]

Hi Gunnar
I am extremely sorry for confusing you....but I am in a big trouble
right now. May be my problem is easy but because i am new to PERL, I
am making it complicated.... Anyways.....

FILE1 looks like:

fhhh--bf-h-gcb 10 {30,31,} 2
^[efg]{1}[gh]{3}.{2}[abc]{1}[efg]{1}.{1}[gh]{1}.{1}[fg]{1}[bcd]{1}[abc]{1}
(1,2,3,4,7,8,10 ,12,13,14,) 10 {23,27,28,30,31 ,36,} 6

fggg-ca-g--g-b 9 {24,27,36,} 3
^[efg]{1}[fg]{3}.{1}[bcd]{1}[ab]{1}.{1}[fg]{1}.{2}[fg]{1}.{1}[abc]{1}
(1,2,3,4,6,7,9, 12,14,) 9 {17,22,24,27,36 ,38,} 6

I have given just 2 expressions, each has 9 fields and separated by
"space". So lets say the name of the fields are F1, F2, F3, F4, F5,
F6, F7, F8, F9.

FILE2 looks like:

^[efg]{1}[gh]{3}.{2}[abc]{1}[efg]{1}.{1}[gh]{1}.{1}[fg]{1}[bcd]{1}[abc]{1}
(1,2,3,4,7,8,10 ,12,13,14,) 10
*{23(5.3,7.7),2 7(8.8,8.4),28(5 .8,6.8),30(5.0, 6.8),31(7.2,9.7 ),36(8.8,5.8),}
6[color=blue]
> {23,28,30}
> {27}
> {31}
> {36}[/color]

^[efg]{1}[fg]{3}.{1}[bcd]{1}[ab]{1}.{1}[fg]{1}.{2}[fg]{1}.{1}[abc]{1}
(1,2,3,4,6,7,9, 12,14,) 9
*{17(1.9,6.8),2 2(5.1,7.4),24(5 .5,8.7),27(8.8, 8.4),36(8.8,5.8 ),38(8.2,3.9),}
6[color=blue]
> {17}
> {22,24}
> {27}
> {36,38}[/color]

In this file we again have 2 expressions.... but some fields are
different from the ones in FILE1 and some fields which were in FILE1
are not even in FILE2.....so lets say the fields are F5, F6, F7, F88,
F9 (these fields are the same as the ones in FILE1 except F88, which
is different from F8 of FILE1 as i have added something in it). Ok, so
these 5 fields are separated by "space" but there are more fields in
FILE2 which are in teh next line and are separated by "next line tab",
lets name them as F10

So what i want in my output file is that i want to compare F5 from
both the files and see if they are same, then concatanate them.....SO
BASICALLY I WANT THE OUTPUT AS:

F1 F2 F3 F4
F5 F6 F7 F8 F9
F10

since F10 can be either of 2 lines or 3 lines or may be 10 so i
thought that i will store all of them in an array and then will
outputting, i can read one by one and print on separate lines....but
my problem is that how to go to next line i.e 2nd line of expression 1
in FILE2.

I HOPE WHATEVER I WROTE ABOVE MAKES SOME SENSE AND I AM NOT MAKING YOU
CONFUSE.....i have tried so many ways, that everthing is mixed up in
my mind. Apologize for the confusion and would really appreciate your
or anybody's help to take me out of this problem.

Thanks a ton
~uc_sk

**Gunnar Hjalmarsson** · Jul 19 '05, 05:01 AM

Re: Query about using split...URGENT

I still don't see any code. Anyway, this might be a start as regards
your FILE2:

#!/usr/bin/perl
use strict;
use warnings;

my @file2;

open FH, 'FILE2.txt' or die $!
{
local $/ = '';
while (<FH>) {
my %tmp;
@tmp{ qw/F5 F6 F7 F8 F9 F10/ } = split /\s+/, $_, 6;
$tmp{F10} = [ split /\n/, $tmp{F10} ];
push @file2, \%tmp;
}
}
close FH;

for my $rec (1..@file2) {
print "Record $rec\n";
for ( qw/F5 F6 F7 F8 F9/ ) {
print "$_: $file2[$rec-1]{$_}\n";
}
print 'F10: ', ( join '; ', @{ $file2[$rec-1]{F10} } ), "\n";
print "\n";
}

__END__

That outputs:

Record 1
F5:
^[efg]{1}[gh]{3}.{2}[abc]{1}[efg]{1}.{1}[gh]{1}.{1}[fg]{1}[bcd]{1}[abc]{1}
F6: (1,2,3,4,7,8,10 ,12,13,14,)
F7: 10
F8:
*{23(5.3,7.7),2 7(8.8,8.4),28(5 .8,6.8),30(5.0, 6.8),31(7.2,9.7 ),36(8.8,5.8),}
F9: 6
F10: > {23,28,30}; > {27}; > {31}; > {36}

Record 2
F5: ^[efg]{1}[fg]{3}.{1}[bcd]{1}[ab]{1}.{1}[fg]{1}.{2}[fg]{1}.{1}[abc]{1}
F6: (1,2,3,4,6,7,9, 12,14,)
F7: 9
F8:
*{17(1.9,6.8),2 2(5.1,7.4),24(5 .5,8.7),27(8.8, 8.4),36(8.8,5.8 ),38(8.2,3.9),}
F9: 6
F10: > {17}; > {22,24}; > {27}; > {36,38}

HTH

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

**Joe Smith** · Jul 19 '05, 05:01 AM

Re: Query about using split...URGENT

uc_sk wrote:
[color=blue]
> FILE1 looks like:
>
> fhhh--bf-h-gcb 10 {30,31,} 2
> ^[efg]{1}[gh]{3}.{2}[abc]{1}[efg]{1}.{1}[gh]{1}.{1}[fg]{1}[bcd]{1}[abc]{1}
> (1,2,3,4,7,8,10 ,12,13,14,) 10 {23,27,28,30,31 ,36,} 6
>
> fggg-ca-g--g-b 9 {24,27,36,} 3
> ^[efg]{1}[fg]{3}.{1}[bcd]{1}[ab]{1}.{1}[fg]{1}.{2}[fg]{1}.{1}[abc]{1}
> (1,2,3,4,6,7,9, 12,14,) 9 {17,22,24,27,36 ,38,} 6[/color]

That looks like records separated by blank lines.
If you use $/=""; perl will read the input in paragraph mode.
[color=blue]
> I have given just 2 expressions, each has 9 fields and separated by
> "space". So lets say the name of the fields are F1, F2, F3, F4, F5,
> F6, F7, F8, F9.[/color]

{ # Start of block for %info and %/
my %info;
local $/ = ''; # Set input separator for paragraph mode
open IN,'<',"file1" or die "Cannot read file1 - $!\n";
while (<IN>) { # Read until blank line
($F1,$F2,$F3,$F 4,$F5,$F6,$F7,$ F8,$F9) = split;
$info{$F5} = [ $F1,$F2,$F3,$F4 ,$F6,$F7,$F8,$F 9 ]; # Save array in a hash
}; close IN;
[color=blue]
> FILE2 looks like:
>
> ^[efg]{1}[gh]{3}.{2}[abc]{1}[efg]{1}.{1}[gh]{1}.{1}[fg]{1}[bcd]{1}[abc]{1}
> (1,2,3,4,7,8,10 ,12,13,14,) 10
> *{23(5.3,7.7),2 7(8.8,8.4),28(5 .8,6.8),30(5.0, 6.8),31(7.2,9.7 ),36(8.8,5.8),}
> 6
>[color=green]
>>{23,28,30}
>>{27}
>>{31}
>>{36}[/color]
>
>
> ^[efg]{1}[fg]{3}.{1}[bcd]{1}[ab]{1}.{1}[fg]{1}.{2}[fg]{1}.{1}[abc]{1}
> (1,2,3,4,6,7,9, 12,14,) 9
> *{17(1.9,6.8),2 2(5.1,7.4),24(5 .5,8.7),27(8.8, 8.4),36(8.8,5.8 ),38(8.2,3.9),}
> 6
>[color=green]
>>{17}
>>{22,24}
>>{27}
>>{36,38}[/color]
>
>
> In this file we again have 2 expressions.... but some fields are
> different from the ones in FILE1 and some fields which were in FILE1
> are not even in FILE2.....so lets say the fields are F5, F6, F7, F88,
> F9 (these fields are the same as the ones in FILE1 except F88, which
> is different from F8 of FILE1 as i have added something in it). Ok, so
> these 5 fields are separated by "space" but there are more fields in
> FILE2 which are in teh next line and are separated by "next line tab",
> lets name them as F10[/color]

open IN,'<',"file2" or die "Cannot read file2 - $!\n";
while (<IN>) { # Read until blank line
($F5,$F6_,$F7_, $F8_,$F9_) = split;
$F10 = <IN>; # Next paragraph is F10
$F10 =~ s/\s*>//gs; # Make it look better (remove \n and '>')
[color=blue]
> So what i want in my output file is that i want to compare F5 from
> both the files and see if they are same, then concatanate them.....SO
> BASICALLY I WANT THE OUTPUT AS:
>
> F1 F2 F3 F4
> F5 F6 F7 F8 F9
> F10[/color]

if (defined $info{$F5}) {
($F1,$F2,$F3,$F 4,$F6,$F7,$F8,$ F9) = @{$info{$F5}} # Get array from hash
warn "mismatch on F6" if $F6_ ne $F6;
warn "mismatch on F7" if $F7_ ne $F7; # $F8_ and $F8 are different
warn "mismatch on F9" if $F9_ ne $F9;
print "$F1 $F2 $F3 $F4\n$F5 $F6 $F7 $F8 $F9\n>$F10\n\n" ;
}
}; close IN;
} # End of block for %info and $/
[color=blue]
> I HOPE WHATEVER I WROTE ABOVE MAKES SOME SENSE AND I AM NOT MAKING YOU
> CONFUSE.....i have tried so many ways, that everthing is mixed up in
> my mind.[/color]

If the records are indeed separated by blank lines, then setting perl's
special variable $/ to "" makes it a lot easier, as shown above.

From the command line, use
perldoc perlvar
to discover other special perl variables. They are there to make your
life easier.
-Joe

Query about using split...URGENT

Query about using split...URGENT

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment