LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 02-14-2013, 09:08 PM   #1
sysmicuser
Member
 
Registered: Mar 2010
Posts: 247

Rep: Reputation: 0
Is that limitation of shell script?


Hi Guys,

I think that this may be the limitation of shell script. But I believe in unix and shell script so taking to a next level - meaning asking on this forum.

Task is to look at file1 (attached here) and compare with file2(attached) , interested_nos should be that file whose nos are in file1 and not in file2 after comaprison.

This task is accomplised in MS Excel in fraction of a seconds then why 2.5 hrs for shell script, moreover it doesn't do as it is expected to do so.

Please do note that to meet requirements of attachements (no more than 256KB) I have truuncated approx 3K reocrds form file1.

I have also attached the shell script which does the job but it is futile as after 2.5 hrs it did not as it was expected to.


I am curious to know why is that the case

Your assistance would be highly appreciated.

P.S: I have attached file1 and file 2 after being sorted.
Attached Files
File Type: txt file1.txt (244.6 KB, 16 views)
File Type: txt file2.txt (130.3 KB, 13 views)
File Type: txt comparison.sh.txt (208 Bytes, 17 views)
 
Old 02-14-2013, 10:24 PM   #2
stormpunk
LQ Newbie
 
Registered: Mar 2004
Distribution: windows 7
Posts: 21

Rep: Reputation: 2
Looks to me like you're reading the entire file2 for every value of file1 which is 22k times.

There's better ways to do this but you could read both files into arrays and then go through them numerically. Since your references 2 files that have "sorted" in the filename, you have no reason to read low values of file2. Each time you find a match, short circuit back to file1 and work file2 from where you left off.

Here's a link for part of what I said in case you need help with that.
http://www.linuxquestions.org/questi...newbie-545840/

If you can, just use the diff program or look at some implementations for that for all kinds of inspiration.
 
Old 02-14-2013, 11:17 PM   #3
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Rep: Reputation: 285Reputation: 285Reputation: 285
File2 has a return carriage (^M) at the end, because it has copied from Windows system.
So, first remove all return carriage from file2 and then invoke script again:
Code:
~$ awk '{gsub(/\r/,"",$0);print $0}' file2.sorted  > /tmp/sorted.txt
-------- OR --------
~$ sed -e 's/\r//g' file2.sorted > /tmp/sorted.txt
~$ cat /tmp/sorted.txt > file2.sorted; rm /tmp/sorted.txt
Then invoke your script:
Code:
#!/bin/bash
#set -xv 
while read -r rline
        do
        f=0
        while read -r cline
        do
                if [ $rline -eq $cline ] ; then
                f=1
                break
                fi 
        done < file2.sorted
        if [ $f != 1 ] ; then
        echo $rline >> interested_nos
        fi
done < file1.sorted

Last edited by shivaa; 02-15-2013 at 04:54 AM. Reason: Typo in cmd
 
Old 02-14-2013, 11:22 PM   #4
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,269

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
Does this give the nums you expect
Code:
comm -23 file*|wc -l
13636

comm -13 file*|wc -l
1981
comm http://linux.die.net/man/1/comm

Basically, test with known file diffs to check, but I think you'll find this works.

Note that I had to cvt file2.txt to Unix end-of-line format.
MS uses \r\n, *nix uses \n.
Best to use the dos2unix http://linux.die.net/man/1/dos2unix cmd on both/all the files before using any *nix tools.

Your method is slow because its a lot of compares and shell script is an interpreted lang.

comm is compiled C
Code:
ldd /usr/bin/comm
	linux-vdso.so.1 =>  (0x00007fff2ef59000)
	libc.so.6 => /lib64/libc.so.6 (0x00000033ece00000)
	/lib64/ld-linux-x86-64.so.2 (0x00000033eca00000)
 
1 members found this post helpful.
Old 02-15-2013, 12:42 AM   #5
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,508

Rep: Reputation: 1894Reputation: 1894Reputation: 1894Reputation: 1894Reputation: 1894Reputation: 1894Reputation: 1894Reputation: 1894Reputation: 1894Reputation: 1894Reputation: 1894
I would agree that diff or comm are probably a better choice, but should you need to do more with the data, the following returned on the current data
instantaneously:
Code:
awk 'FNR==NR{_[$0];next}$0 in _' file2.txt file1.txt
I chose this order of the files so the smaller was read into the array. This was after the assumed change of line ending was fixed but awk can cope with
that too if you will be working on Windows based files regularly (let me know if required)
 
1 members found this post helpful.
Old 02-17-2013, 06:05 PM   #6
sysmicuser
Member
 
Registered: Mar 2010
Posts: 247

Original Poster
Rep: Reputation: 0
@chrism01

I did try comm but the other day(after files were sorted and converted to unix based by using dos2unix command) but that didnt work.

However today I did the same and worked wonderful ! There is essentially no need for that program with comm it works beautiful. Thank you very much for enlightment

@grail

May I ask you what that command is doing?
Code:
awk 'FNR==NR{_[$0];next}$0 in _' file2.txt file1.txt

I redirected the output to a third file and compared with interested_nos where interested_nos was obtained using following command.
Code:
comm -23 file.sorted file2.sorted >> interested_nos
Please help me undertsand this.

Thank you.

Last edited by sysmicuser; 02-17-2013 at 06:08 PM.
 
Old 02-17-2013, 06:14 PM   #7
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,269

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
Glad it helped; add that to you bookmarks/mental list
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] script arg or parameters limitation jao_madn Programming 3 04-22-2012 09:39 AM
How to pass command line arguments from one shell script to another shell script VijayaRaghavanLakshman Linux - Newbie 5 01-20-2012 09:12 PM
Executing a Shell script with 654 permissions inside another shell script. changusee2k Linux - Newbie 2 06-07-2011 07:58 PM
Process shell limitation ysar68 Linux - Server 1 07-14-2010 04:14 PM


All times are GMT -5. The time now is 01:35 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration