LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-02-2014, 09:46 PM   #1
maddyfreaks
Member
 
Registered: May 2011
Posts: 70

Rep: Reputation: 0
Help with a script (tried but no luck)


Hi I have 2 files and i want to perform minus between file2 and file1

File2 has all Data and File1 has few data
All i want is 2 things.
1. Generate the difference between file2 and file1
2. Generate "sing quotes" to column1 of file1 start and end either using awk/sed

i Tried using while loop but no luck with the output.
Both files has 2 columns

your help is appreciated.

file1

NAME State
------ ------
Casey MD
David FL
Edison NJ
John CA
Juliet CA
Maddy VA

file2
NAME State
------ ------
Austin AZ
Bretea PA
Casey MD
David FL
Edison NJ
Jermey NC
John CA
Juliet CA
Justin NC
Maddy VA
Michael FL
Pascal CA
Rick TX
Robin NY
Scott CA
Slovas MD

Last edited by maddyfreaks; 05-02-2014 at 09:48 PM. Reason: Updated File Info Clearly
 
Old 05-02-2014, 10:33 PM   #2
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 2,404

Rep: Reputation: Disabled
Do you have to know which file the unique lines belong to? If not, the following procedure should do the trick:
  1. Create a combined file with all the entries minus the heading (tail -n +3 will omit the first two lines)
  2. Sort the combined file with sort
  3. Pipe the resulting file through uniq -u
How does the quote thing fit into this? Is that a separate, unrelated task?
 
Old 05-02-2014, 11:08 PM   #3
maddyfreaks
Member
 
Registered: May 2011
Posts: 70

Original Poster
Rep: Reputation: 0
Thanks for your reply.

I just pulled the sample data. But my file has lines of 100 - 200 so File 1 has only few dataaset and file2 has more data

so i want to pull whats not in file1 and print both columns

Yes singlequote things is seperate ... assuming we get the difference from above as below
John
Casey

then this is what i need
'John',
'Casey',
 
Old 05-03-2014, 12:48 AM   #4
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 4,433

Rep: Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353
Code:
grep -vFf file1 file2 | sed "s/\(^.*\) \(.*$\)/\'\1\' \'\2\'/"
produces
Quote:
'Austin' 'AZ'
'Bretea' 'PA'
'Jermey' 'NC'
'Justin' 'NC'
'Michael' 'FL'
'Pascal' 'CA'
'Rick' 'TX'
'Robin' 'NY'
'Scott' 'CA'
'Slovas' 'MD'

Last edited by allend; 05-03-2014 at 01:06 AM.
 
Old 05-03-2014, 11:56 AM   #5
maddyfreaks
Member
 
Registered: May 2011
Posts: 70

Original Poster
Rep: Reputation: 0
Can some one please help me on the issue with difference like how to pull only things which are missing from file 1

Appreciate your help
 
Old 05-03-2014, 12:04 PM   #6
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 8,133

Rep: Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272
what about grep -vFf file1 file2 from post #4 ? have you tried it? What do you mean by 'Appreciate your help'
 
Old 05-03-2014, 12:09 PM   #7
maddyfreaks
Member
 
Registered: May 2011
Posts: 70

Original Poster
Rep: Reputation: 0
tried that no luck.

see below please

$ cat f1
Casey MD
David FL
Edison NJ

$ cat f2
Austin AZ
Bretea PA
Casey MD
David FL
Edison NJ
Jermey NC
John CA
Juliet CA

$ grep -vFf f1 f2
Austin AZ
Bretea PA
Casey MD
David FL
Edison NJ
Jermey NC
John CA
Juliet CA

the grep output should not print Casey/David/Edison,

As said all i need is to print the data which is not in file1
 
Old 05-03-2014, 12:16 PM   #8
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 8,133

Rep: Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272
it definitely works:
Code:
/tmp$ grep -vFf file1 file2
Austin AZ
Bretea PA
Jermey NC
Justin NC
Michael FL
Pascal CA
Rick TX
Robin NY
Scott CA
Slovas MD
/tmp$ cat file1
Casey MD
David FL
Edison NJ
John CA
Juliet CA
Maddy VA
/tmp$ cat file2
Austin AZ
Bretea PA
Casey MD
David FL
Edison NJ
Jermey NC
John CA
Juliet CA
Justin NC
Maddy VA
Michael FL
Pascal CA
Rick TX
Robin NY
Scott CA
Slovas MD
/tmp$ grep --version
grep (GNU grep) 2.12
what kind of grep do you have?
 
Old 05-03-2014, 12:16 PM   #9
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 2,404

Rep: Reputation: Disabled
I just ran the exact same test, and got this result:
Code:
user@test01:~$ grep -vFf f1 f2
Austin AZ
Bretea PA
Jermey NC
John CA
Juliet CA
user@test01:~$
Could there be subtle differences in the two input files, like tab vs. space delimiters?
 
Old 05-03-2014, 12:22 PM   #10
maddyfreaks
Member
 
Registered: May 2011
Posts: 70

Original Poster
Rep: Reputation: 0
Here is what am using
Code:
$ uname -a
Linux svsrac1.localdomain 3.8.13-26.2.3.el6uek.x86_64 #2 SMP Wed Apr 16 02:51:10 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/issue
Oracle Linux Server release 6.5
Kernel \r on an \m

$ cat f1
Casey   MD
David   FL
Edison  NJ

$ cat f2
Austin  AZ
Bretea  PA
Casey   MD
David   FL
Edison  NJ
Jermey  NC
John    CA
Juliet  CA

$ grep -vFf f1 f2
Austin  AZ
Bretea  PA
Casey   MD
David   FL
Edison  NJ
Jermey  NC
John    CA
Juliet  CA

$ which grep                                                                                                                                                         
/bin/grep

$ grep --version                                                                                                                                                     
GNU grep 2.6.3
 
Old 05-03-2014, 01:36 PM   #11
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 8,133

Rep: Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272Reputation: 2272
try grep -vFxf file1 file2
 
Old 05-03-2014, 06:51 PM   #12
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,604

Rep: Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241
The command you are all looking for is called "comm" which will list the records that are common, records that are in file 1 but file 2, records that are in file2 but not file 1.

For most things, the two files should be sorted first.
 
Old 05-03-2014, 09:32 PM   #13
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 4,433

Rep: Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353Reputation: 1353
So 'grep -vFf file1 file2' is working for me with grep 2.18 (latest stable) , for pan64 with grep 2.12 (24-Apr-2012), but not for maddyfreaks with the older grep 2.6.3 (02-Apr-2010).

It would have been nice if grep worked to avoid the sort before using comm.
 
Old 05-05-2014, 03:08 PM   #14
kathirvel
Member
 
Registered: Jan 2011
Location: Bangalore
Distribution: RHEL,OEL
Posts: 57

Rep: Reputation: 0
To my best knowledge the following command will print the data which is not present in file1.

diff file1 file2 | grep -i ">" | awk '{print $2"\t\t"$3}'

Please reply if this helps you.

Thanks,
Kathirvel.S
 
Old 05-06-2014, 08:30 AM   #15
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,604

Rep: Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241
comm -1 -3 file1 file2

will do that. And without the reformatting and stripping processes.

Sorting is only needed if the files are out of order.

If they are out of order, than every method requires multiple passes to find out if something in one is not in the other (one pass through the second file for every record in the first). It is an N^2 problem...

Which is why sorting is done first.

Last edited by jpollard; 05-06-2014 at 08:33 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Any Luck jv2112 Linux - Hardware 2 11-13-2010 11:13 AM
Not much luck kurcz Linux - Networking 2 09-13-2006 05:39 PM
wish me luck Fraudulent Linux - Certification 4 02-19-2005 03:01 AM
wish me luck gbrewste General 6 06-25-2004 06:46 PM
wish me luck 46&2 General 5 03-15-2004 07:39 PM


All times are GMT -5. The time now is 05:17 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration