Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
11-02-2016, 09:01 PM
|
#1
|
Member
Registered: Jul 2016
Location: Hungary
Distribution: Debian
Posts: 69
Rep: 
|
cut and grep commands not found rows
Hello,
I would like to remove rows found in the file found_160k.txt from the file 160k-1.txt.
Is there a line number limitation using this command (because the found row is "0".):
root@SAMSUNG:~# cut -d: -f1 found_160k.txt | grep -vf- 160k-1.txt | wc -l
0
in the found_160k.txt (contains 3000 rows):
00373f5500d74281d926ed11d84b1168:amigo':123456789
160k-1.txt contains (160 000 rows):
00373f5500d74281d926ed11d84b1168:amigo'
Thank you in advance.
|
|
|
11-02-2016, 09:28 PM
|
#2
|
LQ Guru
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524
|
The grep command makes no sense. Your selecting an inverted match of nothing. Are you trying to find the number of lines that were not cut? Grep doesn't know what "-f1" in the cut command means. The dash after f in the grep command should not be there. I'm not sure what you're attempting to do, but omitting the grep command would give you the number of lines in the file.
|
|
1 members found this post helpful.
|
11-02-2016, 09:47 PM
|
#3
|
Member
Registered: Jul 2016
Location: Hungary
Distribution: Debian
Posts: 69
Original Poster
Rep: 
|
Thank you very much.
I just would like to remove rows found in "found_160k.txt" file from the file "160k-1.txt".
|
|
|
11-02-2016, 09:53 PM
|
#4
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
Have you tried with a smaller sample set to see why your command is not working? Most commands do have some type of limitation, however, if you were to hit it then you would have an error message.
First simple test to see if it is a limit thing would be to make a copy of the 'found' file and add a single entry which should get returned.
I would add that this is often a case where you could use a single tool like awk instead of 2 commands which might have issues 
|
|
1 members found this post helpful.
|
11-02-2016, 10:16 PM
|
#5
|
Member
Registered: Jul 2016
Location: Hungary
Distribution: Debian
Posts: 69
Original Poster
Rep: 
|
I tried with a smaller sample test. It worked. So maybe it is a limitation. 
Thank.
|
|
|
11-02-2016, 11:41 PM
|
#6
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
Maybe try using Perl / Python / Ruby as these may have better utilisation of the files than the commands being used.
Another option could also be to use xargs to send the data to grep??
|
|
1 members found this post helpful.
|
11-03-2016, 03:05 AM
|
#7
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,752
|
If the files are in the same order, you could use the utility "comm" to show which lines are unique to the second file. The different options, such as -1 and -3, can be combined.
|
|
1 members found this post helpful.
|
11-03-2016, 05:58 AM
|
#8
|
LQ Addict
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 24,269
|
yes, cut | grep -v -f - file should work in general. If it works with a smaller set you need to check the error code returned. Probably out of memory, or something "strange" happened.
But without real data and reproduction we cannot give you correct answer (just guess...)
|
|
1 members found this post helpful.
|
11-03-2016, 07:39 AM
|
#9
|
LQ 5k Club
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,574
|
Given
Code:
bash-4.4$ cat 160-k1.txt
00373f5500d74281d926ed11d84b1168:amigo'
00473f5500d74281d926ed11d84b1168:amigo'
00573f5500d74281d926ed11d84b1168:amigo'
bash-4.4$ cat found_160k.txt
00373f5500d74281d926ed11d84b1168:amigo':123456789
00473f5500d74281d926ed11d84b1168:amigo':123456789
00773f5500d74281d926ed11d84b1168:amigo':123456789
then
Code:
bash-4.4$ join -t ":" -v1 160-k1.txt found_160k.txt
00573f5500d74281d926ed11d84b1168:amigo'
Note - from 'man join'
Quote:
Important: FILE1 and FILE2 must be sorted on the join fields.
|
|
|
1 members found this post helpful.
|
11-04-2016, 05:35 AM
|
#10
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,752
|
freeroute, how did you solve it with the large files?
|
|
|
11-04-2016, 07:32 AM
|
#11
|
Member
Registered: Jul 2016
Location: Hungary
Distribution: Debian
Posts: 69
Original Poster
Rep: 
|
Quote:
Originally Posted by Turbocapitalist
freeroute, how did you solve it with the large files?
|
Hi,
Thanks for your question.
This week-end I will try a solution.
Do you have a suggestion, maybe? Someone told me, try awk. So I will try, unfortunately I never used awk.
|
|
|
11-04-2016, 07:41 AM
|
#12
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,752
|
My suggestion was with "comm". For example:
Code:
comm -1 -3 <(cut -d : -f 1-2 found.txt | sort) <(sort longlist.txt)
Though I don't know the internal workings to know how to reduce memory dependence or predict which parts might cause it to run out.
|
|
|
11-04-2016, 07:47 AM
|
#13
|
Member
Registered: Jul 2016
Location: Hungary
Distribution: Debian
Posts: 69
Original Poster
Rep: 
|
Thanks. It would be great if "comm" command works. I will reply, if I tried. (I am on a desktop PC now, this evening I can try on laptop (it have only 1 GB RAM)...
|
|
|
11-04-2016, 05:33 PM
|
#14
|
Member
Registered: Jul 2016
Location: Hungary
Distribution: Debian
Posts: 69
Original Poster
Rep: 
|
Quote:
Originally Posted by Turbocapitalist
My suggestion was with "comm". For example:
Code:
comm -1 -3 <(cut -d : -f 1-2 found.txt | sort) <(sort longlist.txt)
Though I don't know the internal workings to know how to reduce memory dependence or predict which parts might cause it to run out.
|
I read the manual and examples of "comm" command. Very simple command and very useful. It works. Thank you very much for your help again. 
|
|
|
All times are GMT -5. The time now is 07:37 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|