grep or a simple script?

**ashitpro** · Apr 9 '08, 09:32 AM

check the below script:

Code:

for a_res in `awk '{print $1 $5}' file1`
do
        for b_res in `awk '{print $1 $5}' file2`
        do
                if [ $a_res == $b_res ]
                then
                        echo "match found"
                fi
        done
done

here $1 and $5 are the number of columns that you want to match..
you can change it according to your requirement.

**netrom** · Apr 9 '08, 09:40 AM

Thanks...and if not every record has the same column layout? Is there a possibility extract those 2 fields from certain position in the first file and look them up in the second file?

**ashitpro** · Apr 9 '08, 10:05 AM

Originally posted by netrom

Thanks...and if not every record has the same column layout? Is there a possibility extract those 2 fields from certain position in the first file and look them up in the second file?

How would you extract 'those 2 fields' from first file?
I mean there must be some logic..

Like in above code we are sure that 1st and 5th field is to be checked..
Now if your column layout is something like below:

1st record:5 columns
2nd record:3 columns
3 record:7 columns..... etc

on which basis we'll find fields from records?

**netrom** · Apr 9 '08, 10:52 AM

It's like this, file#1:

[line 1]124323432423 423423423423423 423443243243432 432424
[line 2]432432432 4234234 434324324 434342343242342 3423

extract only for example positions: 1-6 and then 20-27 that is 2 strings from this file...

so from line 1: string1=124323 and string2=2342342
and THEN lookup these 2 strings in file#2, that is if there any line in file#2 that contains both string1 and string2 from file#1.

I'm sorry to make it unclear - I'm really an amateur and can't explain in programming language :-)

thanks for all your help!

**ashitpro** · Apr 9 '08, 11:28 AM

check this out..
here we'll extract two fields (1-6 and 20-27 ) from each line...and look into other file for it's occurrence....
we are using grep command...just make sure that file2 always present otherwise...Ban g......

Code:

for line in `cat file1`
do
        f1=""
        f2=""
        for (( i = 1 ; i <= 6 ; i++ ))
        do
                p=`echo $line | cut -c $i`
                f1=$f1$p
        done
     
        echo "First word is :$f1"      
  
        for (( i = 20 ; i <= 27 ; i++ ))
        do
                p=`echo $line | cut -c $i`
                f2=$f2$p
        done

        echo "second word is :$f2"

        res=`grep "$f1.*$f2" file2`

        [ -z $res ]
        if [ $? -eq 1 ]
        then
                echo "match found...for line:$line"
        fi

done

**netrom** · Apr 9 '08, 12:41 PM

When I inserted the code and run, it says:

0403-057 Syntax error at line 5 : `(' is not expected.

Can this be fixed please?

**prn** · Apr 9 '08, 12:50 PM

Given: file1=

Code:

124323432423 423423423423423423443243243432432424
432432432 4234234 789789789 58769507-67986785765

and file2=

Code:

[line 1]The quick brown fox jumps over the lazy dog.
[line 2]124323432423 423423423423423423443243243432432424
[line 3]Jackdaws love my big sphinx of quartz.
[line 4]Pack my box with five dozen liquor jugs.

How about something like;

Code:

#! /bin/bash

PATFILE="file1"
TESTFILE="file2"

cat $PATFILE | while read LINE
do
        STR1=`echo $LINE | cut -c1-6`
        STR2=`echo $LINE | cut -c21-27`
        echo "str1 is $STR1     str2 is $STR2"
        RESULT=`grep $STR1.*$STR2 $TESTFILE`
        echo "result is $RESULT"
done

with the result:

Code:

[prn@deimos ~]$ netrom.sh
str1 is 124323  str2 is 2342342
result is [line 2]124323432423 423423423423423423443243243432432424
str1 is 432432  str2 is 9789789
result is

You can then modify that for the results you actually want.

HTH,
Paul

**ashitpro** · Apr 9 '08, 12:58 PM

Originally posted by prn

Given: file1=

Code:

124323432423 423423423423423423443243243432432424
432432432 4234234 789789789 58769507-67986785765

and file2=

Code:

[line 1]The quick brown fox jumps over the lazy dog.
[line 2]124323432423 423423423423423423443243243432432424
[line 3]Jackdaws love my big sphinx of quartz.
[line 4]Pack my box with five dozen liquor jugs.

How about something like;

Code:

#! /bin/bash

PATFILE="file1"
TESTFILE="file2"

cat $PATFILE | while read LINE
do
        STR1=`echo $LINE | cut -c1-6`
        STR2=`echo $LINE | cut -c21-27`
        echo "str1 is $STR1     str2 is $STR2"
        RESULT=`grep $STR1.*$STR2 $TESTFILE`
        echo "result is $RESULT"
done

with the result:

Code:

[prn@deimos ~]$ netrom.sh
str1 is 124323  str2 is 2342342
result is [line 2]124323432423 423423423423423423443243243432432424
str1 is 432432  str2 is 9789789
result is

You can then modify that for the results you actually want.

HTH,
Paul

this one is great....
by the way...are you using 'bash' or something else

**netrom** · Apr 9 '08, 02:01 PM

Thanks a lot for your help...just tried the latest example...since the output of every line gets into $LINE, the positions are now changed and I had to change from-to in the command cut.... the results were not correct though....it somehow changed the look of each line - perhaps it was cause by spaces, slashes and various characters.

Is there another way like excluding read lines by lines, instead direct cutting with command cut and then finding the string1 and string2 in file2?

Thanks again.

**prn** · Apr 9 '08, 02:14 PM

Ashitpro: Yes, my example uses bash, but it ought to work with the Bourne shell or Korn shell too.

Originally posted by netrom

Thanks a lot for your help...just tried the latest example...since the output of every line gets into $LINE, the positions are now changed and I had to change from-to in the command cut.... the results were not correct though....it somehow changed the look of each line - perhaps it was cause by spaces, slashes and various characters.

Is there another way like excluding read lines by lines, instead direct cutting with command cut and then finding the string1 and string2 in file2?

Thanks again.

Netrom: I don't understand at all what you mean by "the output of every line gets into $LINE" I sort of gather that you must mean the character positions in the lines of file1 are not constant. Is that it?

If that is what you mean, then is there some other way of determining what strings you are looking for? There are lots of ways to isolate substrings from a larger string, but neither I nor the script can read your mind. There absolutely must be some way to recognize the strings you need. The algorithm can be quite complex, but there must be one. I'm going to assume that the strings in your example are not the real data here. Can you post real data? Or if the real data is confidential (and that would not be at all surprising), perhaps you could post somewhat sanitized data? It is just impossible to suggest a way to extract the strings without some clue about what the strings look like.

Best Regards,
Paul

**ghostdog74** · Apr 9 '08, 02:33 PM

Originally posted by netrom

Any suggestions please? Thanks.

Code:

# more file
123232 3232 2323 2323123123213 trterert
# more file1
123232 3232 2323 2323123123213 XXXXXXXX
123232 3232 2323 2323123123213 trterert
# ./test.sh
123232 or trterert not in line 1
123232 3232 2323 2323123123213 trterert
# cat test.sh
#!/bin/sh

awk 'FNR==NR{ a[FNR]=$0;next }
{
 for( i in a ){
    if  ( ( a[i] ~ $1) && ( a[i] ~ $NF ) ) {
        print $0
    }else {
        print $1 " or "$NF " not in line "FNR
    }
 }
}
' file1 file

**netrom** · Apr 9 '08, 02:46 PM

Ok, Paul, I know what you mean, so I've uploaded 2 test files, each one with 5 records/lines at: http://thetechiebuddy. com/hm/

The filenames are file1_test and file2_test. I've amended the records - so they're not real data....what I like:

search line by line and get string1 (length=16, position 63-78) and string2 (length=6, positions (183-188) in file1_test

...then

search for those 2 strings in each line of file2_test, that is e.g. string1 is 123456789012345 6 and string2 is e.g. 998877

and then search for both these string in file2_test if there are line that contains these strings (at this moment it doesn't matter what positions are the strings found in file2_test - the aim is to find the 2 strings in one line or more lines - if there's any of course)....

then the output may say: string1 and string2 found in file2_test here, or better string1 and string2 NOT found (anywhere) in file2_test.

Many thanks Paul! You're very helpful.

**netrom** · Apr 9 '08, 02:47 PM

Originally posted by ghostdog74

Code:

# more file
123232 3232 2323 2323123123213 trterert
# more file1
123232 3232 2323 2323123123213 XXXXXXXX
123232 3232 2323 2323123123213 trterert
# ./test.sh
123232 or trterert not in line 1
123232 3232 2323 2323123123213 trterert
# cat test.sh
#!/bin/sh

awk 'FNR==NR{ a[FNR]=$0;next }
{
 for( i in a ){
    if  ( ( a[i] ~ $1) && ( a[i] ~ $NF ) ) {
        print $0
    }else {
        print $1 " or "$NF " not in line "FNR
    }
 }
}
' file1 file

will try this as well. many thanks!

**prn** · Apr 9 '08, 05:50 PM

Originally posted by netrom

Ok, Paul, I know what you mean, so I've uploaded 2 test files, each one with 5 records/lines at: http://thetechiebuddy. com/hm/

The filenames are file1_test and file2_test. I've amended the records - so they're not real data....what I like:

search line by line and get string1 (length=16, position 63-78) and string2 (length=6, positions (183-188) in file1_test

...then

search for those 2 strings in each line of file2_test, that is e.g. string1 is 123456789012345 6 and string2 is e.g. 998877

and then search for both these string in file2_test if there are line that contains these strings (at this moment it doesn't matter what positions are the strings found in file2_test - the aim is to find the 2 strings in one line or more lines - if there's any of course)....

then the output may say: string1 and string2 found in file2_test here, or better string1 and string2 NOT found (anywhere) in file2_test.

Many thanks Paul! You're very helpful.

Hi Netrom,

OK. I've downloaded those two files and made the corresponding changes in the script. It might have helped if you had used an example where there was at least one match, though.

Here's a script:

Code:

#! /bin/bash

PATFILE="file1_test"
TESTFILE="file2_test"

cat $PATFILE | while read LINE
do
        STR1=`echo $LINE | cut -c63-78`
        STR2=`echo $LINE | cut -c183-188`
        echo "str1 is $STR1     str2 is $STR2"
        RESULT=`grep $STR1.*$STR2 $TESTFILE`
        if [ ! -z $RESULT ]; then
                echo $STR1 and $STR2 both found in \"$RESULT\"
        else
                echo $STR1 and $STR2 not found in \"$TESTFILE\"
        fi
        echo
done

and its output is:

[prn@deimos ~]$ netrom.sh
str1 is 702080003480000 0 str2 is 348000
702080003480000 0 and 348000 not found in "file2_test "

str1 is 729570003480000 0 str2 is 348000
729570003480000 0 and 348000 not found in "file2_test "

str1 is 985150003480000 0 str2 is 348000
985150003480000 0 and 348000 not found in "file2_test "

str1 is 384000003480000 0 str2 is 348000
384000003480000 0 and 348000 not found in "file2_test "

str1 is 076010003480000 0 str2 is 348000
076010003480000 0 and 348000 not found in "file2_test "

Is this what you wanted?

Best Regards,
Paul

grep or a simple script?

grep or a simple script?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment