Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
hi,
I have a problem with the sed command, because I don't know how to write this rule:
I want to transform this string:
foo CONST "STRING CONST" foo
And I want to transform it to:
foo <const>CONST</const> <string>"STRING CONST"</string> foo
NOT TO!!:
foo <const>CONST</const> <string>"STRING <const>CONST</const>"</string> foo
Now I have this bad regexp:
export LC_ALL=C
line="foo CONST \"STRING CONST\" foo"
line=`echo "$line" |sed -e "s/\([A-Z][A-Z0-9_]*\)/<const>\1<\/const>/g"`
line=`echo "$line" |sed -e "s/\([\"].*[\"]\)/<string>\1<\/string>/g"`
Thank you for your advices and sorry for my bad english
line="foo CONST \"STRING CONST\" foo"
line=$(echo "$line" | sed 's/\([A-Z][A-Z0-9_]*\)/<const>\1<\/const>/')
line=$(echo "$line" | sed 's/\(".*"\)/<string>\1<\/string>/')
The first sed adds the <const> and </const> tags only to the first occurrence of uppercase words, the second one adds the <string> and </string> tags outside the double quotes. I'm not sure if this matches your requirement.
line="foo CONST \"STRING CONST\" foo"
line=$(echo "$line" | sed 's/\([A-Z][A-Z0-9_]*\)/<const>\1<\/const>/')
line=$(echo "$line" | sed 's/\(".*"\)/<string>\1<\/string>/')
The first sed adds the <const> and </const> tags only to the first occurrence of uppercase words, the second one adds the <string> and </string> tags outside the double quotes. I'm not sure if this matches your requirement.
Sorry, it was my fault, the correct form is:
line="foo CONST \"STRING CONST\" foo"
line=$(echo "$line" | sed 's/\([A-Z][A-Z0-9_]*\)/<const>\1<\/const>/g')
line=$(echo "$line" | sed 's/\(".*"\)/<string>\1<\/string>/g')
The problem using regular expressions is that there is not an easy way to distinguish what is outside double quotes pairs. What is inside is a little more straightforward:
Code:
/"[^"]*"/
this matches every quoted string (even if there are multiple ones on the same line). Literally it matches the opening double quotes followed by zero or more occurrences of any character different from double quotes and the closing double quotes. Said that, the solution to your issue is a bit tricky. Here is what I've done:
1. Add the <string> and </string> tags around the quoted strings. Supposed there are not @ characters in the text, add an opening @ and a closing @ for reasons that will be clear later:
Code:
$ line='"foofoo" BAR "foo" BAR BAR "BAR BAR foo BAR foo" "BAR"'
$ line=$(echo "$line" | sed -r 's/("[^"]*")/@<string>\1<\/string>@/g')
$ echo "$line"
@<string>"foofoo"</string>@ BAR @<string>"foo"</string>@ BAR BAR @<string>"BAR BAR foo BAR foo"</string>@ @<string>"BAR"</string>@
2. Now add the <const> and </const> tags around all the uppercase words, even those ones inside double quotes:
3. Now remove recursively every <const> </const> pair inside the @ pairs, that is inside every <string> </string> pair. Now the reason for adding @ is clear, since I need a single character to match any string not containing the multi-character pattern <string> or </string>:
Code:
$ line=$(echo "$line" | sed -r ':again; s/(@<string>[^@]*)<const>([^@]+)<\/const>([^@]*<\/string>@)/\1\2\3/; t again')
$ echo "$line"
@<string>"foofoo"</string>@ <const>BAR</const> @<string>"foo"</string>@ <const>BAR</const> <const>BAR</const> @<string>"BAR BAR foo BAR foo"</string>@ @<string>"BAR"</string>@
4. Now remove the @ characters and the trick is done:
Code:
$ line=$(echo "$line" | sed 's/@//g')
$ echo "$line"
<string>"foofoo"</string> <const>BAR</const> <string>"foo"</string> <const>BAR</const> <const>BAR</const> <string>"BAR BAR foo BAR foo"</string> <string>"BAR"</string>
Feel free to ask for any clarification. Hope this helps.
Thank you for your answer, but I dont know which characters will be in text. The text is
output from strace program.
I read somewhere something about hold space and pattern space features in sed, but I can't work
with it. Thank you for your solution, but it is not universal (if I have @ char in text...)
Ok. Since sed can manage hexadecimal ASCII codes, you can choose a control character which most likely does not appear in the input line, for example the group separator (GS):
Code:
Dec Hex Oct Char
29 1d 035 GS (group separator)
In this case you simply have to substitute @ with \x1d in the sed commands:
Code:
$ line='"foofoo" BAR "foo" BAR BAR "BAR BAR foo BAR foo" "BAR"'
$ line=$(echo "$line" | sed -r 's/("[^"]*")/\x1d<string>\1<\/string>\x1d/g')
$ line=$(echo "$line" | sed -r 's/([A-Z]+)/<const>\1<\/const>/g')
$ line=$(echo "$line" | sed -r ':again; s/(\x1d<string>[^\x1d]*)<const>([^\x1d]+)<\/const>([^\x1d]*<\/string>\x1d)/\1\2\3/; t again')
$ line=$(echo "$line" | sed 's/\x1d//g')
$ echo "$line"
<string>"foofoo"</string> <const>BAR</const> <string>"foo"</string> <const>BAR</const> <const>BAR</const> <string>"BAR BAR foo BAR foo"</string> <string>"BAR"</string>
In alternative, here is a more straightforward awk solution. Here you can easily distinguish between quoted and not quoted strings. Just use the double quotes as field separator:
Code:
BEGIN { FS = "\""; OFS = "" }
{
for ( i = 1; i <= NF; i++ )
if ( i % 2 == 0 )
$i = "<string>\"" $i "\"</string>"
else
$i = gensub(/([A-Z][A-Z0-9_]*)/,"<const>\\1</const>","g",$i)
print
}
Please note the empty string as output field separator, due to the fact that blank spaces are already inside the fields.
I wonder how well your description of the input describes the output of strace. It is best to use real samples instead of a wordy explanation. Regular expressions are very finicky. Could you describe what the output of the following input sample would be.
Note that the write commands are split on two lines. This is when you usually need to use the HOLD register, building up both lines, getting the register back, and then including an `\n' in the LHS expression. So you will probably need more than one command to accomplish what you want.
It would still be appreciated if you gave us how the output should look like. I have no idea where constant & foo describe lines in an strace log, so I don't know what the result would look like.
The -r option here is just to avoid escaping of some special characters, the parenthesis and the plu sign. You can safely remove the -r option and escape these characters.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.