LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   cut and grep commands not found rows (https://www.linuxquestions.org/questions/linux-newbie-8/cut-and-grep-commands-not-found-rows-4175592763/)

freeroute 11-02-2016 09:01 PM

cut and grep commands not found rows
 
Hello,

I would like to remove rows found in the file found_160k.txt from the file 160k-1.txt.

Is there a line number limitation using this command (because the found row is "0".):

root@SAMSUNG:~# cut -d: -f1 found_160k.txt | grep -vf- 160k-1.txt | wc -l
0

in the found_160k.txt (contains 3000 rows):

00373f5500d74281d926ed11d84b1168:amigo':123456789

160k-1.txt contains (160 000 rows):
00373f5500d74281d926ed11d84b1168:amigo'


Thank you in advance.

AwesomeMachine 11-02-2016 09:28 PM

The grep command makes no sense. Your selecting an inverted match of nothing. Are you trying to find the number of lines that were not cut? Grep doesn't know what "-f1" in the cut command means. The dash after f in the grep command should not be there. I'm not sure what you're attempting to do, but omitting the grep command would give you the number of lines in the file.

freeroute 11-02-2016 09:47 PM

Thank you very much.
I just would like to remove rows found in "found_160k.txt" file from the file "160k-1.txt".

grail 11-02-2016 09:53 PM

Have you tried with a smaller sample set to see why your command is not working? Most commands do have some type of limitation, however, if you were to hit it then you would have an error message.
First simple test to see if it is a limit thing would be to make a copy of the 'found' file and add a single entry which should get returned.

I would add that this is often a case where you could use a single tool like awk instead of 2 commands which might have issues :)

freeroute 11-02-2016 10:16 PM

I tried with a smaller sample test. It worked. So maybe it is a limitation.:(
Thank.

grail 11-02-2016 11:41 PM

Maybe try using Perl / Python / Ruby as these may have better utilisation of the files than the commands being used.

Another option could also be to use xargs to send the data to grep??

Turbocapitalist 11-03-2016 03:05 AM

If the files are in the same order, you could use the utility "comm" to show which lines are unique to the second file. The different options, such as -1 and -3, can be combined.

pan64 11-03-2016 05:58 AM

yes, cut | grep -v -f - file should work in general. If it works with a smaller set you need to check the error code returned. Probably out of memory, or something "strange" happened.

But without real data and reproduction we cannot give you correct answer (just guess...)

allend 11-03-2016 07:39 AM

Given
Code:

bash-4.4$ cat 160-k1.txt
00373f5500d74281d926ed11d84b1168:amigo'
00473f5500d74281d926ed11d84b1168:amigo'
00573f5500d74281d926ed11d84b1168:amigo'
bash-4.4$ cat found_160k.txt
00373f5500d74281d926ed11d84b1168:amigo':123456789
00473f5500d74281d926ed11d84b1168:amigo':123456789
00773f5500d74281d926ed11d84b1168:amigo':123456789

then
Code:

bash-4.4$ join -t ":" -v1 160-k1.txt found_160k.txt
00573f5500d74281d926ed11d84b1168:amigo'

Note - from 'man join'
Quote:

Important: FILE1 and FILE2 must be sorted on the join fields.

Turbocapitalist 11-04-2016 05:35 AM

freeroute, how did you solve it with the large files?

freeroute 11-04-2016 07:32 AM

Quote:

Originally Posted by Turbocapitalist (Post 5626815)
freeroute, how did you solve it with the large files?

Hi,

Thanks for your question.
This week-end I will try a solution.
Do you have a suggestion, maybe? Someone told me, try awk. So I will try, unfortunately I never used awk.

Turbocapitalist 11-04-2016 07:41 AM

My suggestion was with "comm". For example:

Code:

comm -1 -3 <(cut -d : -f 1-2 found.txt | sort) <(sort longlist.txt)
Though I don't know the internal workings to know how to reduce memory dependence or predict which parts might cause it to run out.

freeroute 11-04-2016 07:47 AM

Thanks. It would be great if "comm" command works. I will reply, if I tried. (I am on a desktop PC now, this evening I can try on laptop (it have only 1 GB RAM)...

freeroute 11-04-2016 05:33 PM

Quote:

Originally Posted by Turbocapitalist (Post 5626859)
My suggestion was with "comm". For example:

Code:

comm -1 -3 <(cut -d : -f 1-2 found.txt | sort) <(sort longlist.txt)
Though I don't know the internal workings to know how to reduce memory dependence or predict which parts might cause it to run out.

I read the manual and examples of "comm" command. Very simple command and very useful. It works. Thank you very much for your help again.:)


All times are GMT -5. The time now is 05:01 PM.