LinuxQuestions.org - [SOLVED] matching string in specific column and delete line

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - matching string in specific column and delete line (https://www.linuxquestions.org/questions/linux-newbie-8/matching-string-in-specific-column-and-delete-line-946491/)

matching string in specific column and delete line

Dear all,

I have a txt like the one below:

ab 3 alpha
cd 4 beta
xs 12 gamma
cd 3 dexsa
ab 1 chayxe
yx 14 tony

I would like to cancel those lines containing "xs" and "yx" in column one, so that my result file would look like the one below:

ab 3 alpha
cd 4 beta
cd 3 dexsa
ab 1 chayxe

grep -v "xs" would of course look for any other occurence of "xs" everywhere in text.

How can I solve this?

Any suggestion is highly appreciated.

Best,

Udiubu

This works:

awk '$1 !~ /xs$/' infile

however, I can I list more than one string to match? I mean not just "xs", but "yx" as well.

Thanks,

Udiubu

Using awk:

Code:

awk '!($1 ~ "xs" || $1 ~ "yx")' file

Here there is no action specified after the expression, so that every time the expression is true it prints out the entire line (default action). Literally the expression means:

Code:

NOT ( $1 matches "xs" OR $1 matches "yx" )

Another form, using character lists in a regular expression:

Code:

awk '$1 !~ /[xy][sx]/' file

The first suggested is longer but more readable. Hope this helps.

Colucix you're always the best!

Thanks a lot!

awk is generally the most appropriate tool to use when working with column-delimited text.

But grep can be used here. You just need to give it an a regular expression that targets the appropriate line patterns.

Code:

grep -Ev '^(xs|yx)\>' infile

The expression breaks down as "^", the beginning of the line, "(xs|yx)", either of the strings "xs" or "yx", and "\>", a positional anchor matching the end of a word.

As you can see this particular example is quite easy, as you just need to target the first two characters on the line. For columns in the middle of the line, the regex would have to be more complex.

If you don't already know about regular expressions, I highly recommend taking the time to learn. It's perhaps the single biggest "bang for the buck" topic you can learn in coding. All the major text editing tools support them.

Here are a few regular expressions tutorials:
http://mywiki.wooledge.org/RegularExpression
http://www.grymoire.com/Unix/Regular.html
http://www.regular-expressions.info/

Speaking of regex, Colucix's last example has a slight flaw.

Code:

awk '$1 !~ /[xy][sx]/' file

"[xy][sx]" will match all combinations of those characters, so "xx" and "ys" would also be eliminated from the output. Also, it relies on the assumption that that the field only has two characters, as it would also match any longer entry with those characters in them, such as "abxscd".

So it would be better to use a similar expression to the one I used in grep.

Code:

awk '$1 !~ /^(xs|yx)$/' file

Since we're only testing field one, we can use the more natural "$" line-ending anchor, instead of the "\>" word anchor.

Hi David,

Thanks for the excellent info.
Your links were exactly what I was looking for.

Best,

Udiubu