LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   awk commands/script to extract lines with a specific format (http://www.linuxquestions.org/questions/linux-software-2/awk-commands-script-to-extract-lines-with-a-specific-format-4175430078/)

oldfogey 10-02-2012 11:42 AM

awk commands/script to extract lines with a specific format
 
Hi. I'm trying to extract lines from a file, each of which are of a certain format. I can write awk commands to do parts of this as individual steps but I'd like to link everything together so I don't have to issue a sequence of individual commands.

Here's a sample portion of the input file:

F9JcQEcS21=SQZK6if6mfnA
pc\sender4
Wed12:25:17
abc124
ba1
clp1491
N18471
Tam-port
dd
e.44
flash9
1c-64-94-98-87-11

I'd like to extract all lines from the input file that:

1) only contain alphanumeric characters (all alpha characters a-z being lowercase)
2) the beginning of each line must have a minimum of 2 alpha characters
3) the last character must be a number.

So the properly extracted lines from the sample above would be:

abc124
ba1
clp1491
flash9

Thanks in advance for your suggestions!

kakaka 10-02-2012 01:16 PM

Based on what you show as desired output, it sounds as if you want to output lines that start with a minimum of two lower case letters, end with digits, and contain only lower case letters and digits. Using your example data as input, this command gets the desired output:

Code:

gawk --posix '/^([a-z]{2,})([0-9]+)$/ { print $0 }'

cortman 10-02-2012 01:26 PM

You can also do this with grep-

Code:

egrep "^([a-z]){2}[a-z]*[0-9]*[0-9]$"
then direct the output to a file or xargs.

oldfogey 10-02-2012 01:34 PM

Fantastic! You guys are tremendous!

Thank you.

SecretCode 10-02-2012 02:39 PM

Quote:

Originally Posted by kakaka (Post 4795130)
Based on what you show as desired output, it sounds as if you want to output lines that start with a minimum of two lower case letters, end with digits, and contain only lower case letters and digits. Using your example data as input, this command gets the desired output:

Code:

gawk --posix '/^([a-z]{2,})([0-9]+)$/ { print $0 }'

This would produce the example output, but the description given would also allow digits followed by letters - lines like abc123def99

If the description's correct and the examples were missing such a case, you would need
Code:

gawk --posix '/^[a-z]{2}[a-z0-9]*[0-9]+$/ { print $0 }'

oldfogey 10-02-2012 02:50 PM

You're correct. A string such as abc123def99 would not be desired in the output.

The output must be (minimum 2) alphas followed by (at least 1) numeric.

Thanks for the catch!


All times are GMT -5. The time now is 04:42 PM.