LinuxQuestions.org - [SOLVED] cut and grep commands not found rows

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - cut and grep commands not found rows (https://www.linuxquestions.org/questions/linux-newbie-8/cut-and-grep-commands-not-found-rows-4175592763/)

freeroute

11-02-2016 09:01 PM

cut and grep commands not found rows

Hello,

I would like to remove rows found in the file found_160k.txt from the file 160k-1.txt.

Is there a line number limitation using this command (because the found row is "0".):

root@SAMSUNG:~# cut -d: -f1 found_160k.txt | grep -vf- 160k-1.txt | wc -l
0

in the found_160k.txt (contains 3000 rows):

00373f5500d74281d926ed11d84b1168:amigo':123456789

160k-1.txt contains (160 000 rows):
00373f5500d74281d926ed11d84b1168:amigo'

Thank you in advance.

AwesomeMachine

11-02-2016 09:28 PM

The grep command makes no sense. Your selecting an inverted match of nothing. Are you trying to find the number of lines that were not cut? Grep doesn't know what "-f1" in the cut command means. The dash after f in the grep command should not be there. I'm not sure what you're attempting to do, but omitting the grep command would give you the number of lines in the file.

freeroute

11-02-2016 09:47 PM

Thank you very much.
I just would like to remove rows found in "found_160k.txt" file from the file "160k-1.txt".

grail

11-02-2016 09:53 PM

Have you tried with a smaller sample set to see why your command is not working? Most commands do have some type of limitation, however, if you were to hit it then you would have an error message.
First simple test to see if it is a limit thing would be to make a copy of the 'found' file and add a single entry which should get returned.

I would add that this is often a case where you could use a single tool like awk instead of 2 commands which might have issues :)

freeroute

11-02-2016 10:16 PM

I tried with a smaller sample test. It worked. So maybe it is a limitation.:(
Thank.

grail

11-02-2016 11:41 PM

Maybe try using Perl / Python / Ruby as these may have better utilisation of the files than the commands being used.

Another option could also be to use xargs to send the data to grep??

Turbocapitalist

11-03-2016 03:05 AM

If the files are in the same order, you could use the utility "comm" to show which lines are unique to the second file. The different options, such as -1 and -3, can be combined.

pan64

11-03-2016 05:58 AM

yes, cut | grep -v -f - file should work in general. If it works with a smaller set you need to check the error code returned. Probably out of memory, or something "strange" happened.

But without real data and reproduction we cannot give you correct answer (just guess...)

allend

11-03-2016 07:39 AM

Given

Code:

bash-4.4$ cat 160-k1.txt 

00373f5500d74281d926ed11d84b1168:amigo'

00473f5500d74281d926ed11d84b1168:amigo'

00573f5500d74281d926ed11d84b1168:amigo'

bash-4.4$ cat found_160k.txt 

00373f5500d74281d926ed11d84b1168:amigo':123456789

00473f5500d74281d926ed11d84b1168:amigo':123456789

00773f5500d74281d926ed11d84b1168:amigo':123456789

then

Code:

bash-4.4$ join -t ":" -v1 160-k1.txt found_160k.txt

00573f5500d74281d926ed11d84b1168:amigo'

Note - from 'man join'

Quote:

Important: FILE1 and FILE2 must be sorted on the join fields.

Turbocapitalist

11-04-2016 05:35 AM

freeroute, how did you solve it with the large files?

freeroute

11-04-2016 07:32 AM

Quote:

Originally Posted by Turbocapitalist (Post 5626815)

freeroute, how did you solve it with the large files?

Hi,

Thanks for your question.
This week-end I will try a solution.
Do you have a suggestion, maybe? Someone told me, try awk. So I will try, unfortunately I never used awk.

Turbocapitalist

11-04-2016 07:41 AM

My suggestion was with "comm". For example:

Code:

comm -1 -3 <(cut -d : -f 1-2 found.txt | sort) <(sort longlist.txt)

Though I don't know the internal workings to know how to reduce memory dependence or predict which parts might cause it to run out.

freeroute

11-04-2016 07:47 AM

Thanks. It would be great if "comm" command works. I will reply, if I tried. (I am on a desktop PC now, this evening I can try on laptop (it have only 1 GB RAM)...

freeroute

11-04-2016 05:33 PM

Quote:

Originally Posted by Turbocapitalist (Post 5626859)

My suggestion was with "comm". For example:

Code:

comm -1 -3 <(cut -d : -f 1-2 found.txt | sort) <(sort longlist.txt)

Though I don't know the internal workings to know how to reduce memory dependence or predict which parts might cause it to run out.

I read the manual and examples of "comm" command. Very simple command and very useful. It works. Thank you very much for your help again.:)

All times are GMT -5. The time now is 05:01 PM.