Problem with sed regexp
hi,
I have a problem with the sed command, because I don't know how to write this rule: I want to transform this string: foo CONST "STRING CONST" foo And I want to transform it to: foo <const>CONST</const> <string>"STRING CONST"</string> foo NOT TO!!: foo <const>CONST</const> <string>"STRING <const>CONST</const>"</string> foo Now I have this bad regexp: export LC_ALL=C line="foo CONST \"STRING CONST\" foo" line=`echo "$line" |sed -e "s/\([A-Z][A-Z0-9_]*\)/<const>\1<\/const>/g"` line=`echo "$line" |sed -e "s/\([\"].*[\"]\)/<string>\1<\/string>/g"` Thank you for your advices and sorry for my bad english |
Maybe something like this?
Code:
line="foo CONST \"STRING CONST\" foo" |
Sorry, my fault
Quote:
line="foo CONST \"STRING CONST\" foo" line=$(echo "$line" | sed 's/\([A-Z][A-Z0-9_]*\)/<const>\1<\/const>/g') line=$(echo "$line" | sed 's/\(".*"\)/<string>\1<\/string>/g') |
Do you mean you have multiple occurrences of CONST and "STRING CONST" on the same line?
|
Quote:
"foofoo" BAR "foo" BAR BAR "BAR foo" "BAR" and the result what I want is: <string>"foofoo"</string> <const>BAR</const> <string>"foo"</string> <const>BAR</const> <const>BAR</const> <string>"BAR foo"</string> <string>"BAR"</string> |
The problem using regular expressions is that there is not an easy way to distinguish what is outside double quotes pairs. What is inside is a little more straightforward:
Code:
/"[^"]*"/ 1. Add the <string> and </string> tags around the quoted strings. Supposed there are not @ characters in the text, add an opening @ and a closing @ for reasons that will be clear later: Code:
$ line='"foofoo" BAR "foo" BAR BAR "BAR BAR foo BAR foo" "BAR"' Code:
$ line=$(echo "$line" | sed -r 's/([A-Z]+)/<const>\1<\/const>/g') Code:
$ line=$(echo "$line" | sed -r ':again; s/(@<string>[^@]*)<const>([^@]+)<\/const>([^@]*<\/string>@)/\1\2\3/; t again') Code:
$ line=$(echo "$line" | sed 's/@//g') |
Thank you for your answer, but I dont know which characters will be in text. The text is
output from strace program. I read somewhere something about hold space and pattern space features in sed, but I can't work with it. Thank you for your solution, but it is not universal (if I have @ char in text...) |
Ok. Since sed can manage hexadecimal ASCII codes, you can choose a control character which most likely does not appear in the input line, for example the group separator (GS):
Code:
Dec Hex Oct Char Code:
$ line='"foofoo" BAR "foo" BAR BAR "BAR BAR foo BAR foo" "BAR"' Code:
BEGIN { FS = "\""; OFS = "" } |
Thank you for your solution again. Last question:
Is this solution POSIX compliant? Because here: http://pubs.opengroup.org/onlinepubs...ities/sed.html I read that sed hasn't got -r option. |
I wonder how well your description of the input describes the output of strace. It is best to use real samples instead of a wordy explanation. Regular expressions are very finicky. Could you describe what the output of the following input sample would be.
Code:
open("Webinar", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 |
Sorry, if you use "strace -o file", the lines won't be split. The program output was getting mixed in with the strace output.
Code:
open("Webinar", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 |
Ok source input file is: http://dl.dropbox.com/u/21850274/strace.txt
The output file is: http://dl.dropbox.com/u/21850274/out.html Look at the 9th line, to the string. It is wrong. I highlight the strings and constants like this: #strubg highlight line=`echo "$line" |sed 's/\(\"[^"]*\"\)/<span class\="string">\1<\/span>/g'` #constant highlight line=`echo "$line" |sed 's/\([^_A-Za-z0-9]\)\([A-Z][_A-Z0-9]*\)\([^_A-Za-z0-9]\)/\1<span class="const">\2<\/span>\3/g$ line=`echo "$line" |sed 's/\([^_A-Za-z0-9]\)\([A-Z][_A-Z0-9]*\)\([^_A-Za-z0-9]\)/\1<span class="const">\2<\/span>\3/g$ |
Quote:
|
All times are GMT -5. The time now is 05:16 AM. |