LinuxQuestions.org - deleting lines from a file with specific pattern using AWK

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - deleting lines from a file with specific pattern using AWK (https://www.linuxquestions.org/questions/programming-9/deleting-lines-from-a-file-with-specific-pattern-using-awk-812506/)

gandhigaurav1986

06-06-2010 12:09 PM

deleting lines from a file with specific pattern using AWK

Hi,

I have a file which contains milion of records. It contains 12 columns seperated by "||" (delimeter).

First two fields contain first name and last name of a person. Now my requirement is to delete all those records from this file for which:

First two fields does not contain any alphabet.

For e.g i have below mentioned records in file:

gaurav||gandhi||123||456||789
#a%bcd||123abc||89|90||91
12345||@@@||89||123||234
***||!!!!||98||76||90

Now, last two lines should be removed from this file since first two fields does not contain any alphabet for these two records.
Please help me out on this.......

colucix

06-06-2010 12:25 PM

Hi and welcome to LinuxQuestions! If other fields does not contain alphabet characters as in your example, you can simply do:

Code:

awk '/[a-zA-Z]/' file

or using sed:

Code:

sed '/[a-zA-Z]/!d' file

otherwise you should match the two fields specifically, for example by means of something like:

Code:

awk -F"|" '$1 ~ /[a-zA-Z]/ && $3 ~ /[a-zA-Z]/' file

Hope this helps.

grail

06-06-2010 08:53 PM

Slight adjustment to colucix's last entry as the delimeter is 2 pipes (and in case you weren't aware, you will need to redirect to a new file):

Code:

awk -F"||" '$1 ~ /[a-zA-Z]/ && $3 ~ /[a-zA-Z]/' file > new_file

syg00

06-06-2010 10:28 PM

Does that work ?. And if it does, wouldn't that be $2 ?.

grail

06-07-2010 12:51 AM

Quote:

Does that work ?. And if it does, wouldn't that be $2 ?.

Seems in my haste I should have done a little testing :redface:

Code:

awk -F"[|][|]" '$1 ~ /[a-zA-Z]/ && $2 ~ /[a-zA-Z]/' file > new_file

colucix

06-07-2010 01:06 AM

Actually I used a single pipe as delimiter and $3 to match the second field ($2 was the null string between the first two pipes).

syg00

06-07-2010 01:23 AM

My comment was directed at @grail post, not yours @colucix.
I'll be more specific in future ... ;)

colucix

06-07-2010 02:28 AM

Mine too. :)

For the sake of the OP, if he will ever pop up again, the field separator in awk can be either a single character or a regular expression. Two or more characters have the side effect to set FS to the last one specified.

In the second example posted by grail the presence of two character lists [...] force awk to interpret it as a regular expression, so that you can actually use two consecutive pipes as field separator.

Cheers!

grail

06-07-2010 03:11 AM

yes ... yes ... shoot me down .. lol

@colucix - thanks for the explanation :)

syg00

06-07-2010 04:34 AM

o.k., let's continue the education (mine).
Why is "[|][|]" considered regex (in this context) but [||] isn't - [||]+ works. (remember I'm still coming to terms with awk).

colucix

06-07-2010 09:58 AM

Quote:

Originally Posted by syg00 (Post 3995146)

o.k., let's continue the education (mine).
Why is "[|][|]" considered regex (in this context) but [||] isn't - [||]+ works. (remember I'm still coming to terms with awk).

Actually both are considered regexp, but [||] is a character list that means "match a single character, be it either | or |" (not needed redundancy). Instead [||]+ (which is the same as [|]+) matches one or more occurrences of the character, as in extended regular expressions. The grail's solution

Code:

[|][|]

matches exactly two consecutive characters, each one taken from a character list.

The same if you use something like

Code:

[|&;][|&;]

that matches any of these combinations:

Code:

|| |& |; && &| &; ;; ;| ;&

gandhigaurav1986

06-07-2010 10:30 PM

Thanks a lot guys.... my problem is solved now :)

grail

06-08-2010 02:08 AM

Quote:

my problem is solved now

Don't forget to mark as SOLVED then :)

All times are GMT -5. The time now is 08:27 AM.