Remembering patterns and printing only those patterns using sed

bernie82 · 05-25-2005, 04:01 AM

Hi there, I am looking for some advice with 'sed'. I basically need to remember certain parts of each line in a text file and print only those parts. I can create a regular expression for this but am not sure how to remember and print these parts.
The part i want to remember is i a string of spaces, A-Za-z and commas ','.
and then a one or two digit number.
Thanks in advance.

gerrit_daniels · 05-25-2005, 08:29 AM

I'm not completely sure what you mean but if I understand correctly, you can use the -o option with grep. This way only that part of the text that matches your regular expression is shown on the output.

Hope this helps

bernie82 · 05-25-2005, 05:55 PM

Yep that does help. Basically to make it more clear I want to remember two parts of an expression and print only those two parts. I think I need to do something with the ampersand operator or using the numbers 1-9 on the right hand side of the expression.
e.g. David Warner 123456789 0 5 50
Michael FitzPatrick 234567891 1 4 45
Hans Williamson 345678912 0 3 44

I just want to print the long string of numbers and the last number using sed.
If this isn't clear enough I apologise but thanks for looking anyway.
Cheers.

Tinkster · 05-25-2005, 06:35 PM

If there's no guys with middle-names/initials and always
the same number of fields you could use
awk '{print $3 $4 $5 $6}' file

If neither is static something like
awk '{for (i=1; i<NF; i++){ if( $i ~ /[[:digit:]]+/) printf "%d ", $i} print ""}' file
should work.

Cheers,
Tink

gerrit_daniels · 05-26-2005, 01:48 AM

If you want to use sed:

Code:

cat $FILE | sed -r -e "s/(^[^[:digit:]]*)([[:digit:]]*)( [[:digit:]] [[:digit:]] )([[:digit:]]{2}$)/\2 \4/"

I haven't tried this out so it might be wrong, but it should work as follows. The braces split the regular expression into four parts. The first part represents zero or more non-digit characters at the beginning of the line, the second part is matched by zero or more digits, the third part is the two single digits in the middle, and the fourth part represents the last two digits at the end of the line. Only the second and fourth part is preserved in the output.

PS: You might need to remove the leading '^' and trailing '$' (they represent the empty space at the beginning and the end of the line) to get it to work.

Hope this helps

osvaldomarques · 05-26-2005, 05:18 PM

Hi Bernie82,

Just my two cents. I did write a sed without the option '-r'. It was the form we learned before Open Software has come.

Code:

cat $FILE | sed -e "s/^[^0-9]*\([0-9][0-9]*\) .*[^0-9]\([0-9]\{2\}\)$/\1 \2/"

The parentheses define the expressions to be remembered. In this version, the parentheses must be escaped. The circumflex symbol on the beginning of the expression means beginning of the line; inside a bracket expression means negation of that range. The dollar sign means end of the line. As you have just two expressions to be remembered you don't need to parenthesize all the expressions, just the ones you want. In the substitution part of the sed expression, you refer to the remembered expressions by the numerical escaped order you declared; then the reason for the "\1 \2".
However you must select the lines which can be printed before: if the regular expression is not found, sed passes the entire line to the output. You can copy the expression from the sed to create a grep on the pipe before sed.