LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-23-2012, 11:53 AM   #1
udiubu
Member
 
Registered: Oct 2011
Posts: 73

Rep: Reputation: Disabled
matching string in specific column and delete line


Dear all,

I have a txt like the one below:

ab 3 alpha
cd 4 beta
xs 12 gamma
cd 3 dexsa
ab 1 chayxe
yx 14 tony

I would like to cancel those lines containing "xs" and "yx" in column one, so that my result file would look like the one below:

ab 3 alpha
cd 4 beta
cd 3 dexsa
ab 1 chayxe

grep -v "xs" would of course look for any other occurence of "xs" everywhere in text.

How can I solve this?

Any suggestion is highly appreciated.

Best,

Udiubu
 
Old 05-23-2012, 12:01 PM   #2
udiubu
Member
 
Registered: Oct 2011
Posts: 73

Original Poster
Rep: Reputation: Disabled
This works:

awk '$1 !~ /xs$/' infile

however, I can I list more than one string to match? I mean not just "xs", but "yx" as well.

Thanks,

Udiubu
 
Old 05-23-2012, 12:02 PM   #3
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Using awk:
Code:
awk '!($1 ~ "xs" || $1 ~ "yx")' file
Here there is no action specified after the expression, so that every time the expression is true it prints out the entire line (default action). Literally the expression means:
Code:
NOT ( $1 matches "xs" OR $1 matches "yx" )
Another form, using character lists in a regular expression:
Code:
awk '$1 !~ /[xy][sx]/' file
The first suggested is longer but more readable. Hope this helps.

Last edited by colucix; 05-23-2012 at 12:05 PM.
 
Old 05-23-2012, 12:36 PM   #4
udiubu
Member
 
Registered: Oct 2011
Posts: 73

Original Poster
Rep: Reputation: Disabled
Colucix you're always the best!

Thanks a lot!
 
Old 05-24-2012, 04:01 PM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
awk is generally the most appropriate tool to use when working with column-delimited text.

But grep can be used here. You just need to give it an a regular expression that targets the appropriate line patterns.

Code:
grep -Ev '^(xs|yx)\>' infile
The expression breaks down as "^", the beginning of the line, "(xs|yx)", either of the strings "xs" or "yx", and "\>", a positional anchor matching the end of a word.

As you can see this particular example is quite easy, as you just need to target the first two characters on the line. For columns in the middle of the line, the regex would have to be more complex.

If you don't already know about regular expressions, I highly recommend taking the time to learn. It's perhaps the single biggest "bang for the buck" topic you can learn in coding. All the major text editing tools support them.

Here are a few regular expressions tutorials:
http://mywiki.wooledge.org/RegularExpression
http://www.grymoire.com/Unix/Regular.html
http://www.regular-expressions.info/


Speaking of regex, Colucix's last example has a slight flaw.

Code:
awk '$1 !~ /[xy][sx]/' file
"[xy][sx]" will match all combinations of those characters, so "xx" and "ys" would also be eliminated from the output. Also, it relies on the assumption that that the field only has two characters, as it would also match any longer entry with those characters in them, such as "abxscd".

So it would be better to use a similar expression to the one I used in grep.

Code:
awk '$1 !~ /^(xs|yx)$/' file
Since we're only testing field one, we can use the more natural "$" line-ending anchor, instead of the "\>" word anchor.

Last edited by David the H.; 05-24-2012 at 04:09 PM. Reason: fixed an oops
 
Old 05-25-2012, 02:29 AM   #6
udiubu
Member
 
Registered: Oct 2011
Posts: 73

Original Poster
Rep: Reputation: Disabled
Hi David,

Thanks for the excellent info.
Your links were exactly what I was looking for.

Best,

Udiubu
 
  


Reply

Tags
delete, grep, line, match


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk with pipe delimited file (specific column matching and multiple pattern matching) lolmon Programming 4 08-31-2011 12:17 PM
[SOLVED] sed: delete last line matching a pattern colucix Programming 3 03-27-2011 01:00 PM
[SOLVED] delete a specific string cruzdelsur Programming 7 08-27-2010 07:25 PM
Delete line in file matching string Black Sun Programming 7 01-28-2010 03:18 AM
SED - Delete line above or below as well as matching line... OldGaf Programming 7 06-26-2008 11:51 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:17 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration