LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   matching string in specific column and delete line (https://www.linuxquestions.org/questions/linux-newbie-8/matching-string-in-specific-column-and-delete-line-946491/)

udiubu 05-23-2012 11:53 AM

matching string in specific column and delete line
 
Dear all,

I have a txt like the one below:

ab 3 alpha
cd 4 beta
xs 12 gamma
cd 3 dexsa
ab 1 chayxe
yx 14 tony

I would like to cancel those lines containing "xs" and "yx" in column one, so that my result file would look like the one below:

ab 3 alpha
cd 4 beta
cd 3 dexsa
ab 1 chayxe

grep -v "xs" would of course look for any other occurence of "xs" everywhere in text.

How can I solve this?

Any suggestion is highly appreciated.

Best,

Udiubu

udiubu 05-23-2012 12:01 PM

This works:

awk '$1 !~ /xs$/' infile

however, I can I list more than one string to match? I mean not just "xs", but "yx" as well.

Thanks,

Udiubu

colucix 05-23-2012 12:02 PM

Using awk:
Code:

awk '!($1 ~ "xs" || $1 ~ "yx")' file
Here there is no action specified after the expression, so that every time the expression is true it prints out the entire line (default action). Literally the expression means:
Code:

NOT ( $1 matches "xs" OR $1 matches "yx" )
Another form, using character lists in a regular expression:
Code:

awk '$1 !~ /[xy][sx]/' file
The first suggested is longer but more readable. Hope this helps.

udiubu 05-23-2012 12:36 PM

Colucix you're always the best!

Thanks a lot!

David the H. 05-24-2012 04:01 PM

awk is generally the most appropriate tool to use when working with column-delimited text.

But grep can be used here. You just need to give it an a regular expression that targets the appropriate line patterns.

Code:

grep -Ev '^(xs|yx)\>' infile
The expression breaks down as "^", the beginning of the line, "(xs|yx)", either of the strings "xs" or "yx", and "\>", a positional anchor matching the end of a word.

As you can see this particular example is quite easy, as you just need to target the first two characters on the line. For columns in the middle of the line, the regex would have to be more complex.

If you don't already know about regular expressions, I highly recommend taking the time to learn. It's perhaps the single biggest "bang for the buck" topic you can learn in coding. All the major text editing tools support them.

Here are a few regular expressions tutorials:
http://mywiki.wooledge.org/RegularExpression
http://www.grymoire.com/Unix/Regular.html
http://www.regular-expressions.info/


Speaking of regex, Colucix's last example has a slight flaw.

Code:

awk '$1 !~ /[xy][sx]/' file
"[xy][sx]" will match all combinations of those characters, so "xx" and "ys" would also be eliminated from the output. Also, it relies on the assumption that that the field only has two characters, as it would also match any longer entry with those characters in them, such as "abxscd".

So it would be better to use a similar expression to the one I used in grep.

Code:

awk '$1 !~ /^(xs|yx)$/' file
Since we're only testing field one, we can use the more natural "$" line-ending anchor, instead of the "\>" word anchor.

udiubu 05-25-2012 02:29 AM

Hi David,

Thanks for the excellent info.
Your links were exactly what I was looking for.

Best,

Udiubu


All times are GMT -5. The time now is 12:48 AM.