Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
i'm working with some VERY large csv data files (bigger than 100Mb) and i'm looking for a specific number that i know before i search and i know what column it will be in if it's in the file before i search. right now I've been using
Code:
grep -w "12345" file.txt > found.txt
but in a very large file 12345 can occur in several places that are not in the column that i want to search
is there an option in grep that i can use to specify which column ti search for the 12345 in? i couldn't find one
i'm working with some VERY large csv data files ...
druuna gave good advice. I'll offer another idea which may be helpful when dealing with VERY large files.
Sometimes you are searching for a specific string and know there is one and only one match. As soon as that match is found there is no value in continuing the search through the rest of the file. In that case you may terminate the seach by using the exit option.
Similarly you may quit after finding the third match (or whatever suits your purposes).
Here are some examples which use the famous poem by Edgar Allen Poe, "The Raven."
Code:
echo; echo "Method of LQ Member danielbmartin #1"
awk ' $4 ~ "Nevermore" {print}' $Raven > $OutFile
echo "ALL lines containing 'Nevermore' in the fourth word ..."; cat $OutFile
echo; echo "Method of LQ Member danielbmartin #2"
awk ' $4 ~ "Nevermore" {print;exit}' $Raven > $OutFile
echo "The FIRST line containing 'Nevermore' in the fourth word ..."; cat $OutFile
echo; echo "Method of LQ Member danielbmartin #3"
awk ' $4 ~ "Nevermore" {if (++k==2) {print; exit}}' $Raven > $OutFile
echo "The SECOND line containing 'Nevermore' in the fourth word ..."; cat $OutFile
echo; echo "Method of LQ Member danielbmartin #4"
awk ' $4 ~ "Nevermore" {print; if (++k==3) {exit}}' $Raven > $OutFile
echo "The FIRST 3 lines containing 'Nevermore' in the fourth word ..."; cat $OutFile
This is the output generated by that code.
Code:
Method of LQ Member danielbmartin #1
ALL lines containing 'Nevermore' in the fourth word ...
Quoth the raven, 'Nevermore.'
Meant in croaking 'Nevermore.'
Quoth the raven, 'Nevermore.'
Quoth the raven, 'Nevermore.'
Quoth the raven, 'Nevermore.'
Quoth the raven, 'Nevermore.'
Method of LQ Member danielbmartin #2
The FIRST line containing 'Nevermore' in the fourth word ...
Quoth the raven, 'Nevermore.'
Method of LQ Member danielbmartin #3
The SECOND line containing 'Nevermore' in the fourth word ...
Meant in croaking 'Nevermore.'
Method of LQ Member danielbmartin #4
The FIRST 3 lines containing 'Nevermore' in the fourth word ...
Quoth the raven, 'Nevermore.'
Meant in croaking 'Nevermore.'
Quoth the raven, 'Nevermore.'
Daniel B. Martin
Last edited by danielbmartin; 02-01-2013 at 12:47 PM.
Reason: Minor cosmetic improvements.
Except that that first discards all the other columns, then grep only matches the number, which you already know, from the remaining one. It may be useful for determining if the value appears in the file, and perhaps what line number it's on (with the -n switch), but not for much else.
Speaking of grep, it is usually possible to create a regex that matches everything up to, and including, the column you want.
Code:
grep -Ew '^([0-9]+[ ]+){4}12345' infile.txt
This will match the fifth column, assuming that the file is space-delimited and the columns only contain digits. It would have to be customized to suit each individual data format you'd want to use it on.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.