LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Help with a script (tried but no luck) (https://www.linuxquestions.org/questions/linux-newbie-8/help-with-a-script-tried-but-no-luck-4175503756/)

maddyfreaks 05-02-2014 08:46 PM

Help with a script (tried but no luck)
 
Hi I have 2 files and i want to perform minus between file2 and file1

File2 has all Data and File1 has few data
All i want is 2 things.
1. Generate the difference between file2 and file1
2. Generate "sing quotes" to column1 of file1 start and end either using awk/sed

i Tried using while loop but no luck with the output.
Both files has 2 columns

your help is appreciated.

file1

NAME State
------ ------
Casey MD
David FL
Edison NJ
John CA
Juliet CA
Maddy VA

file2
NAME State
------ ------
Austin AZ
Bretea PA
Casey MD
David FL
Edison NJ
Jermey NC
John CA
Juliet CA
Justin NC
Maddy VA
Michael FL
Pascal CA
Rick TX
Robin NY
Scott CA
Slovas MD

Ser Olmy 05-02-2014 09:33 PM

Do you have to know which file the unique lines belong to? If not, the following procedure should do the trick:
  1. Create a combined file with all the entries minus the heading (tail -n +3 will omit the first two lines)
  2. Sort the combined file with sort
  3. Pipe the resulting file through uniq -u
How does the quote thing fit into this? Is that a separate, unrelated task?

maddyfreaks 05-02-2014 10:08 PM

Thanks for your reply.

I just pulled the sample data. But my file has lines of 100 - 200 so File 1 has only few dataaset and file2 has more data

so i want to pull whats not in file1 and print both columns

Yes singlequote things is seperate ... assuming we get the difference from above as below
John
Casey

then this is what i need
'John',
'Casey',

allend 05-02-2014 11:48 PM

Code:

grep -vFf file1 file2 | sed "s/\(^.*\) \(.*$\)/\'\1\' \'\2\'/"
produces
Quote:

'Austin' 'AZ'
'Bretea' 'PA'
'Jermey' 'NC'
'Justin' 'NC'
'Michael' 'FL'
'Pascal' 'CA'
'Rick' 'TX'
'Robin' 'NY'
'Scott' 'CA'
'Slovas' 'MD'

maddyfreaks 05-03-2014 10:56 AM

Can some one please help me on the issue with difference like how to pull only things which are missing from file 1

Appreciate your help

pan64 05-03-2014 11:04 AM

what about grep -vFf file1 file2 from post #4 ? have you tried it? What do you mean by 'Appreciate your help'

maddyfreaks 05-03-2014 11:09 AM

tried that no luck.

see below please

$ cat f1
Casey MD
David FL
Edison NJ

$ cat f2
Austin AZ
Bretea PA
Casey MD
David FL
Edison NJ
Jermey NC
John CA
Juliet CA

$ grep -vFf f1 f2
Austin AZ
Bretea PA
Casey MD
David FL
Edison NJ
Jermey NC
John CA
Juliet CA

the grep output should not print Casey/David/Edison,

As said all i need is to print the data which is not in file1

pan64 05-03-2014 11:16 AM

it definitely works:
Code:

/tmp$ grep -vFf file1 file2
Austin AZ
Bretea PA
Jermey NC
Justin NC
Michael FL
Pascal CA
Rick TX
Robin NY
Scott CA
Slovas MD
/tmp$ cat file1
Casey MD
David FL
Edison NJ
John CA
Juliet CA
Maddy VA
/tmp$ cat file2
Austin AZ
Bretea PA
Casey MD
David FL
Edison NJ
Jermey NC
John CA
Juliet CA
Justin NC
Maddy VA
Michael FL
Pascal CA
Rick TX
Robin NY
Scott CA
Slovas MD
/tmp$ grep --version
grep (GNU grep) 2.12

what kind of grep do you have?

Ser Olmy 05-03-2014 11:16 AM

I just ran the exact same test, and got this result:
Code:

user@test01:~$ grep -vFf f1 f2
Austin AZ
Bretea PA
Jermey NC
John CA
Juliet CA
user@test01:~$

Could there be subtle differences in the two input files, like tab vs. space delimiters?

maddyfreaks 05-03-2014 11:22 AM

Here is what am using
Code:

$ uname -a
Linux svsrac1.localdomain 3.8.13-26.2.3.el6uek.x86_64 #2 SMP Wed Apr 16 02:51:10 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/issue
Oracle Linux Server release 6.5
Kernel \r on an \m

$ cat f1
Casey  MD
David  FL
Edison  NJ

$ cat f2
Austin  AZ
Bretea  PA
Casey  MD
David  FL
Edison  NJ
Jermey  NC
John    CA
Juliet  CA

$ grep -vFf f1 f2
Austin  AZ
Bretea  PA
Casey  MD
David  FL
Edison  NJ
Jermey  NC
John    CA
Juliet  CA

$ which grep                                                                                                                                                       
/bin/grep

$ grep --version                                                                                                                                                   
GNU grep 2.6.3


pan64 05-03-2014 12:36 PM

try grep -vFxf file1 file2

jpollard 05-03-2014 05:51 PM

The command you are all looking for is called "comm" which will list the records that are common, records that are in file 1 but file 2, records that are in file2 but not file 1.

For most things, the two files should be sorted first.

allend 05-03-2014 08:32 PM

So 'grep -vFf file1 file2' is working for me with grep 2.18 (latest stable) , for pan64 with grep 2.12 (24-Apr-2012), but not for maddyfreaks with the older grep 2.6.3 (02-Apr-2010).

It would have been nice if grep worked to avoid the sort before using comm.

kathirvel 05-05-2014 02:08 PM

To my best knowledge the following command will print the data which is not present in file1.

diff file1 file2 | grep -i ">" | awk '{print $2"\t\t"$3}'

Please reply if this helps you.

Thanks,
Kathirvel.S

jpollard 05-06-2014 07:30 AM

comm -1 -3 file1 file2

will do that. And without the reformatting and stripping processes.

Sorting is only needed if the files are out of order.

If they are out of order, than every method requires multiple passes to find out if something in one is not in the other (one pass through the second file for every record in the first). It is an N^2 problem...

Which is why sorting is done first.


All times are GMT -5. The time now is 07:09 PM.