compare columns of uneven length
I have a text file with 2 columns. Column A has 69,000 rows. Column B has 49,000 rows.
Column A has our complete product list Column B has product list from Manufacturer 1 There are only certain/some rows which are common between 2 columns. and also, column B is not a subset of column A. Column A has extra entries and so does column B. I need to know, which rows from Column B, are common with Column A which rows from Column B are not common with Column A. Essentially I want to know from this list, how many of our products are from my manufacturer 1, how many does my manufacturer has which we dont carry. How would I acheive this? My natural approach to solving this kind of obstacle is to reach for MS excel and use its lookup function, but its not working... Its taking forever and hanging up. since the file is so huge and probably my excel skills are really bad. How can I do this from command line? I am looking for awk command if possible instead of sed since I am trying to pick up its syntax and usage etc. My thought process is, sort column A,B, for every row in A, lookup and output based on condition. Dont know if I am on the right track. Any help would be great. Thanks in advance. |
Hi,
Have a look at the comm command. Examples: Code:
$ cat prodlist Both lists do need to be sorted for this to work. Hope this helps. |
awesome, I will look into it. Thanks!
|
You're welcome :)
|
Quote:
Code:
sort -u prodlist > prodlist1 Code:
sort -u manlist > manlist1 Code:
comm -12 prodlist manlist | wc -l I guess what I wanted to ask was, how would I combine it so that I avoid the intermediae step of saving a sorted file separetely. Like I want to do something like this but the output is obviously wrong... Code:
sort -u prodlist1 manlist1 | comm -13 prodlist manlist |
|
Quote:
|
You're welcome! :)
|
All times are GMT -5. The time now is 10:30 AM. |