LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-01-2013, 10:40 AM   #1
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Rep: Reputation: Disabled
question on column searching with grep


Hi guys,

i'm working with some VERY large csv data files (bigger than 100Mb) and i'm looking for a specific number that i know before i search and i know what column it will be in if it's in the file before i search. right now I've been using
Code:
 grep -w "12345" file.txt > found.txt
but in a very large file 12345 can occur in several places that are not in the column that i want to search

is there an option in grep that i can use to specify which column ti search for the 12345 in? i couldn't find one

thanks, Tabby

Last edited by atjurhs; 02-01-2013 at 10:42 AM.
 
Old 02-01-2013, 11:24 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
This looks like a job for awk, not grep:
Code:
 awk -F";" ' $4 ~ "^12345$" { print }' infile
The above assumes a ; as separator (green part). The blue part is the column number.
 
1 members found this post helpful.
Old 02-01-2013, 12:36 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by atjurhs View Post
i'm working with some VERY large csv data files ...
druuna gave good advice. I'll offer another idea which may be helpful when dealing with VERY large files.

Sometimes you are searching for a specific string and know there is one and only one match. As soon as that match is found there is no value in continuing the search through the rest of the file. In that case you may terminate the seach by using the exit option.

Similarly you may quit after finding the third match (or whatever suits your purposes).

Here are some examples which use the famous poem by Edgar Allen Poe, "The Raven."
Code:
 
echo; echo "Method of LQ Member danielbmartin #1"
awk ' $4 ~ "Nevermore" {print}' $Raven > $OutFile
echo "ALL lines containing 'Nevermore' in the fourth word ..."; cat $OutFile

echo; echo "Method of LQ Member danielbmartin #2"
awk ' $4 ~ "Nevermore" {print;exit}' $Raven > $OutFile
echo "The FIRST line containing 'Nevermore' in the fourth word ..."; cat $OutFile

echo; echo "Method of LQ Member danielbmartin #3"
awk ' $4 ~ "Nevermore" {if (++k==2) {print; exit}}' $Raven > $OutFile
echo "The SECOND line containing 'Nevermore' in the fourth word ..."; cat $OutFile

echo; echo "Method of LQ Member danielbmartin #4"
awk ' $4 ~ "Nevermore" {print; if (++k==3) {exit}}' $Raven > $OutFile
echo "The FIRST 3 lines containing 'Nevermore' in the fourth word ..."; cat $OutFile
This is the output generated by that code.
Code:
Method of LQ Member danielbmartin #1
ALL lines containing 'Nevermore' in the fourth word ...
Quoth the raven, 'Nevermore.'
Meant in croaking 'Nevermore.'
Quoth the raven, 'Nevermore.'
Quoth the raven, 'Nevermore.'
Quoth the raven, 'Nevermore.'
Quoth the raven, 'Nevermore.'

Method of LQ Member danielbmartin #2
The FIRST line containing 'Nevermore' in the fourth word ...
Quoth the raven, 'Nevermore.'

Method of LQ Member danielbmartin #3
The SECOND line containing 'Nevermore' in the fourth word ...
Meant in croaking 'Nevermore.'

Method of LQ Member danielbmartin #4
The FIRST 3 lines containing 'Nevermore' in the fourth word ...
Quoth the raven, 'Nevermore.'
Meant in croaking 'Nevermore.'
Quoth the raven, 'Nevermore.'
Daniel B. Martin

Last edited by danielbmartin; 02-01-2013 at 12:47 PM. Reason: Minor cosmetic improvements.
 
Old 02-01-2013, 06:33 PM   #4
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,371

Rep: Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750
This could also be done using
Code:
cut -d, -f<column number> file.txt | grep "12345" > found.txt
 
1 members found this post helpful.
Old 02-04-2013, 09:57 AM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Quote:
Originally Posted by allend View Post
This could also be done using
Code:
cut -d, -f<column number> file.txt | grep "12345" > found.txt
Except that that first discards all the other columns, then grep only matches the number, which you already know, from the remaining one. It may be useful for determining if the value appears in the file, and perhaps what line number it's on (with the -n switch), but not for much else.


Speaking of grep, it is usually possible to create a regex that matches everything up to, and including, the column you want.

Code:
grep -Ew '^([0-9]+[ ]+){4}12345' infile.txt
This will match the fifth column, assuming that the file is space-delimited and the columns only contain digits. It would have to be customized to suit each individual data format you'd want to use it on.

It's much better to just use awk for this.
 
Old 02-04-2013, 12:47 PM   #6
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Original Poster
Rep: Reputation: Disabled
yep,

druuna's answer looks very straight forward
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] [bash] grep specific column? hashbang#! Programming 18 11-23-2011 09:29 AM
[SOLVED] How to grep -v and omit all results matching 'x' in column 'y' zongbot Linux - General 11 09-12-2011 02:07 PM
how to grep awk or sed the first row and column Bone11409 Linux - Newbie 2 03-21-2010 08:18 PM
Searching using grep gregarion Programming 10 01-14-2010 09:10 AM
Grep from only a certain column mending73 Linux - Newbie 4 09-16-2009 08:01 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:15 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration