LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   compare two files using Awk!! (http://www.linuxquestions.org/questions/linux-newbie-8/compare-two-files-using-awk-669742/)

visitnag 09-13-2008 03:39 PM

compare two files using Awk!!
 
I have to files. In the first file I have a unique emp.id as first field and in the second file I have the same emp.id as second field. I want join (or match) these two files on the basis of emp.id (i.e., first file with second file.) and print the matched lines with all the fields of both the files.

Thank you in advance.

Mr. C. 09-13-2008 05:47 PM

Show some examples of what you want.

Please be more clear about the questions you ask. Many of your questions seem rather lazy, with minimal though or effort. You'll get more if you spend more time formulating "intelligent" questions.

See: http://www.catb.org/~esr/faqs/smart-questions.html

jschiwal 09-13-2008 05:56 PM

This sounds like the source of these files are tables from a relational database such as mysql. The emp.id field in the first file sounds like emp is the employee table, with "id" being the primary key; and in the second file, emp.id is a foreign key.
example:

If that is the case, why not use a database client (such as the mysql command) and enter the sql SELECT command to return the information from the database containing the information you want. To paraphrase the evil Russian General from the movie Golden Eye, "Use the database! That's what it's for."

example:
Code:

SELECT tags.title, album.id, album.name
FROM tags INNER JOIN album
ON album.id = tags.album
LIMIT 10;
+-----------------------+-----+----------------------------------------------------+
| title                | id  | name                                              |
+-----------------------+-----+----------------------------------------------------+
| LQ Podcast - 01.24.07 | 339 | LQ Radio - Open Talk...About Open Source and Linux |
| LQ Podcast - 05.09.08 | 339 | LQ Radio - Open Talk...About Open Source and Linux |
| LQ Radio Episode #3  | 339 | LQ Radio - Open Talk...About Open Source and Linux |
| LQ Podcast - 10.29.07 | 339 | LQ Radio - Open Talk...About Open Source and Linux |
| LQ Podcast - 04.20.06 | 339 | LQ Radio - Open Talk...About Open Source and Linux |
| LQ Podcast - 06.22.06 | 339 | LQ Radio - Open Talk...About Open Source and Linux |
| LQ Podcast - 05.18.07 | 339 | LQ Radio - Open Talk...About Open Source and Linux |
| LQ Podcast - 10.22.06 | 339 | LQ Radio - Open Talk...About Open Source and Linux |
| LQ Podcast - 01.17.08 | 339 | LQ Radio - Open Talk...About Open Source and Linux |
| LQ Podcast - 11.09.05 | 339 | LQ Radio - Open Talk...About Open Source and Linux |
+-----------------------+-----+----------------------------------------------------+

---

Generally a lot depends on the particular of your two files. Is there a one-to-one relationship between the emp.id's in the two file? Is every id in one file present in the other and vice versa? If so, then you could sort each file and input the results to the paste command.

Code:

head table1 table2
==> table1 <==
1      A
2      AA
3      AAA
4      Aaberg
5      Aaberg's
6      Aachen
7      Aachen's
8      aah
9      aahed
10      aahing

==> table2 <==
zinked  1
zinkenite      2
zinkenites      3
zinkenite's    4
zinkes  5
zinkiferous    6
zinkification  7
zinkified      8
zinkifies      9
zinkify 10

Look at the output of the paste command before using cut to select the column(s) you want. Make sure that the ids match up.
Code:

paste <(sort -n -k1 table1) <(sort -n -k2 table2) | head
1      A      zinked  1
2      AA      zinkenite      2
3      AAA    zinkenites      3
4      Aaberg  zinkenite's    4
5      Aaberg's        zinkes  5
6      Aachen  zinkiferous    6
7      Aachen's        zinkification  7
8      aah    zinkified      8
9      aahed  zinkifies      9
10      aahing  zinkify 10

Since the ids match up, it's ok to use cut.
Code:

paste table1 <(sort -n -k2 table2 | cut -d'    ' -f1) >table3
The sort, paste and cut commands are all provided by coreutils. Look though the coreutils manual. I downloaded the source for this package and for gawk and ran "./configure && makepdf" to produce a print worthy pdf document. The GAWK: Effective AWK Programming book is excellent. The coreutils package supplies 101 commands and 98 manpages. Having a printed version of the coreutils info manual is a great resource.

At least browse through the commands & the brief topics in the index page by entering "info coreutils".

---

To do what you want to do in a single awk script, you may need to use "readline" to read from table2 while processing a line from table1.

If the same one-to-one conditions exist, then you can use paste to assemble a long line from each table. Then in awk, you can check that the two fields containing the id match up and if they do, print out the fields you want.

visitnag 09-15-2008 12:42 PM

file1.

emp.id name quantity amount
11111 xyx 25 5500
22222 sss 55 7500
33333 wes 120 8898


file2.

add1 add2 add3 empid pin
rd1 blah blah 11111 2324
rd2 blah blah 22222 5565
rd4 blah blah 33333 6548

I want to match above files on emp.id fields using awk.

Thank you all.


All times are GMT -5. The time now is 12:33 PM.