LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Finding the Big and Exact Match of Sample Data (https://www.linuxquestions.org/questions/programming-9/finding-the-big-and-exact-match-of-sample-data-947554/)

hamijami 05-30-2012 05:29 AM

Finding the Big and Exact Match of Sample Data
 
Hi all,

I have data in two files and i need to find the exact and big match of values in file1 from that in file2 in new file3.

file1 file2 file3(Required Results)
7890 8900 No output
7891 7890 7891
894 7909 No output
8058 89 8058
8059 792 8059


Kindly suggest any way for the resolution of the issue.

Thanks

grail 05-30-2012 05:56 AM

Sounds good ... what have you attempted in an effort to solve the problem?

hamijami 05-30-2012 06:27 AM

I have use nested loop and used awk.
while read file1
do
while read file2
do
awk -F \| '{if('$file2'~"'$file1'")print"$file2"}' smc >> match
done<file2

done<file1

danielbmartin 05-30-2012 06:35 AM

Quote:

Originally Posted by hamijami (Post 4691067)
I have data in two files and i need to find the exact and big match of values in file1 from that in file2 in new file3.

This sounds like an interesting problem. I know what an "exact match" is but don't recognize the term "big match." Is there such a thing as a "small match?" What is the distinction?

Daniel B. Martin

hamijami 05-30-2012 06:47 AM

Quote:

Originally Posted by danielbmartin (Post 4691095)
This sounds like an interesting problem. I know what an "exact match" is but don't recognize the term "big match." Is there such a thing as a "small match?" What is the distinction?

Daniel B. Martin

by big match e.g.89 is a big match for 894 as it can be covered if consider 89 as 89x, x can have value from 0-9. just as like is used in SQL.

Other example of big match

9 can be considered as big match for 999.
990 can be considered as big match for 99094 etc.

grail 05-30-2012 07:30 AM

Well you have lost me then :( I thought "big match" was your way of saying one number bigger than the other.

As it is not, would you please explain further as I am not seeing how any of your examples would explain your first example, for instance:
Code:

8058        89 8058
How is 89 any sort of match in 8058??

hamijami 05-30-2012 07:38 AM

Quote:

Originally Posted by grail (Post 4691131)
Well you have lost me then :( I thought "big match" was your way of saying one number bigger than the other.

As it is not, would you please explain further as I am not seeing how any of your examples would explain your first example, for instance:
Code:

8058        89 8058
How is 89 any sort of match in 8058??

My friend you have to see the whole file2, 89 in file2 is big match for 894(file1), hence no output, for 8058 as it has neither an exact nor big match in file2 hence its mentioned in the output file.

file1 file2 file3(Required Results)
7890 8900 No output
7891 7890 7891
894 7909 No output
8058 89 8058
8059 792 8059

danielbmartin 05-30-2012 07:47 AM

Quote:

Originally Posted by hamijami (Post 4691140)
... 89 in file2 is big match for 894(file1) ...

May we say that "big match" is the same as "partial match?" Does a "big match" always start with the left-most character? If so, 89 is a "big match" for 894 but 94 is not a "big match" for 894.

Please elaborate your criteria for matches and give more examples.

Daniel B. Martin

hamijami 05-30-2012 07:52 AM

Quote:

Originally Posted by danielbmartin (Post 4691147)
May we say that "big match" is the same as "partial match?" Does a "big match" always start with the left-most character? If so, 89 is a "big match" for 894 but 94 is not a "big match" for 894.

Please elaborate your criteria for matches and give more examples.

Daniel B. Martin

Yes Big Match is same as partial match and yes big match always starts with left most character, and yes 89 is a big match for 894 but 94 is not big match fro 894.

More Examples

1 big match for 1234534
2345 big match for 23450000
990 is big match for 9903

danielbmartin 05-30-2012 08:42 AM

Quote:

Originally Posted by hamijami (Post 4691153)
Yes Big Match is same as partial match and yes big match always starts with left most character, and yes 89 is a big match for 894 but 94 is not big match for 894.

Okay, we have a better understanding of terminology. Now, going back to your original post, you gave these examples...
Code:

file1 file2 file3
7890  8900  No output
7891  7890  7891
894  7909  No output
8058  89    8058
8059  792  8059

Regarding the last line: I do not see how 8059 is a Big Match for 792.

Daniel B. Martin

hamijami 05-30-2012 08:48 AM

Quote:

Originally Posted by danielbmartin (Post 4691203)
Okay, we have a better understanding of terminology. Now, going back to your original post, you gave these examples...
Code:

file1 file2 file3
7890  8900  No output
7891  7890  7891
894  7909  No output
8058  89    8058
8059  792  8059

Regarding the last line: I do not see how 8059 is a Big Match for 792.

Daniel B. Martin

for 8059 as it has neither an exact nor big match in file2 hence its mentioned in the output file. The output file contains all the numbers that were neither exact nor a big match.

danielbmartin 05-30-2012 08:57 AM

Quote:

Originally Posted by hamijami (Post 4691207)
for 8059 as it has neither an exact nor big match in file2 hence its mentioned in the output file. The output file contains all the numbers that were neither exact nor a big match.

I had the idea that each line in file1 was compared to the corresponding line in file2. Is this wrong? Is every line in file1 compared to every line in file2?

Daniel B. Martin

hamijami 05-30-2012 08:58 AM

Quote:

Originally Posted by danielbmartin (Post 4691214)
I had the idea that each line in file1 was compared to the corresponding line in file2. Is this wrong? Is every line in file1 compared to every line in file2?

Daniel B. Martin

every line in file1 compared to every line in file2? Yap thats why nested loop :)

danielbmartin 05-30-2012 09:18 AM

Quote:

Originally Posted by hamijami (Post 4691216)
every line in file1 compared to every line in file2? Yap thats why nested loop :)

Please provide larger samples of your three files. Be sure to include cases where exact matches occur. Then we will have a better understanding of all possible cases and maybe have fun writing code.

Daniel B. Martin

grail 05-30-2012 09:22 AM

Still not clear to me so I will ask another question:

The format you have shown, ie file1 followed file2 and the output file3 has nothing to do with the lines being shown as they are?

What we are saying is that for any match of file2 with a line in file1 will give "No output" (is this an empty line or actually these words?) whereas, if none of the items in file2
match the line in file1 then output the file1 value.

Does that sound correct?

hamijami 05-30-2012 09:34 AM

Quote:

Originally Posted by danielbmartin (Post 4691226)
Please provide larger samples of your three files. Be sure to include cases where exact matches occur. Then we will have a better understanding of all possible cases and maybe have fun writing code.

Daniel B. Martin

Please note that file3 is supposed to be the output file. However for sample sake please find the data below

file1 file2 file3
8120 81 No output
8124 846 No output
8140 84259 No output
8308 1234 No output
8347 833 8347
84199 555 84199
84200 8308 No output
84228 842 No output
84249 8449 No output
84250 84200 No output
84258 8766 No output
84259 23456 No output
8435 98765 8435
8437 84240 8437
8449 789 No output
84588 7654 84588
84589 345 84589

hamijami 05-30-2012 09:36 AM

Quote:

Originally Posted by grail (Post 4691232)
Still not clear to me so I will ask another question:

The format you have shown, ie file1 followed file2 and the output file3 has nothing to do with the lines being shown as they are?

What we are saying is that for any match of file2 with a line in file1 will give "No output" (is this an empty line or actually these words?) whereas, if none of the items in file2
match the line in file1 then output the file1 value.

Does that sound correct?

yes, correct

hamijami 05-30-2012 09:38 AM

Quote:

Originally Posted by grail (Post 4691232)
Still not clear to me so I will ask another question:

The format you have shown, ie file1 followed file2 and the output file3 has nothing to do with the lines being shown as they are?

What we are saying is that for any match of file2 with a line in file1 will give "No output" (is this an empty line or actually these words?) whereas, if none of the items in file2
match the line in file1 then output the file1 value.

Does that sound correct?

As for no output or empty line, which ever suits your style... :)

danielbmartin 05-30-2012 10:02 AM

Quote:

Originally Posted by hamijami (Post 4691238)
Please note that file3 is supposed to be the output file. However for sample sake please find the data below

file1 file2 file3
8120 81 No output
8124 846 No output
8140 84259 No output
8308 1234 No output
8347 833 8347
84199 555 84199
84200 8308 No output
84228 842 No output
84249 8449 No output
84250 84200 No output
84258 8766 No output
84259 23456 No output
8435 98765 8435
8437 84240 8437
8449 789 No output
84588 7654 84588
84589 345 84589

Having studied this larger sample I reach this conclusion: One of us doesn't understand the problem to be solved. With humility, I will assume that I am the dimwit and now bow out of this thread.

Daniel B. Martin

grail 05-30-2012 10:15 AM

Maybe something like:
Code:

awk 'FNR==NR{a[$1];next}{for(i in a)if($1 ~ "^"i){print;b=1;break}if(! b)print "no output";b=0}' file2 file1 > file3
If I understood correctly.

hamijami 05-31-2012 02:07 AM

Quote:

Originally Posted by grail (Post 4691261)
Maybe something like:
Code:

awk 'FNR==NR{a[$1];next}{for(i in a)if($1 ~ "^"i){print;b=1;break}if(! b)print "no output";b=0}' file2 file1 > file3
If I understood correctly.

Thanks Grail, this is the exact solution...really appreciate your effort :)


All times are GMT -5. The time now is 02:35 AM.