ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
$ awk '
> BEGIN { a = "1abc 2def"
> b = gensub(/(.+) (.+)/, "\\2 \\1", "g", a)
> print b }'
2def 1abc
This matches two strings made of one or more characters separated by space. Ok.
Code:
$ awk '
> BEGIN { a = "1abc 2def"
> b = gensub(/() ()/, "\\2 \\1", "g", a)
> print b }'
1abc 2def
This matches a space embedded by "null strings" and replaces it with a space embedded by the two matched "null string". Just to say it does nothing. To better see it consider the following two examples:
Code:
$ awk '
> BEGIN { a = "1abc 2def"
> b = gensub(/() ()/, "XXX", "g", a)
> print b }'
1abcXXX2def
here the space is replaced by the string constant "XXX". And
Code:
$ awk '
> BEGIN { a = "1abc 2def"
> b = gensub(/()/, "X", "g", a)
> print b }'
X1XaXbXcX X2XdXeXfX
Here any "null string" is replaced by "X". Not sure about your last example. Please, can you explain what do you want to achieve a bit more?
To me the first two examples are not the same...
Not sure about your last example. Please, can you explain what do you want to achieve a bit more?
Thanks for answering. I read about 4 awk manuals but about parenthesis and how to work with them not much informations. I need to escape same characters. The characters are in block of text (variable). I want to replace / * + ( ) [ ] for \/ \* \+ \( \) \[ \]
Code:
awk 'BEGIN { a = " / * + ( ) [ ] "
b = gensub(/([/*+()[]])/, "\\/1", "g", a)
print b }'
Nowhere I see using character class in parantheses so don't know if is it correct. I would expect that the command should find one of the characters of class and to add a backslash before it.
Edit:
You surprised me that I can operate with "null string" in this way. The last example of you is clever.
Well.. first we have to analyze the replacement string. Take a look at the caveats about literal backslashes and ampersands as explained in the GNU awk user's guide, here. In particular look at "Table 8.5 Escape Sequence Processing for gensub". It states you can obtain a literal backslash followed by the matched text using \\\\&.
Let's try a simple example: we want to match a literal asterisk:
Code:
$ awk 'BEGIN { a = "This is an asterisk *"
> b = gensub(/(*)/, "\\\\&", "g", a)
> print b }'
This is an asterisk \*
It works. Now we want to match an asterisk and a plus sign. We use a character list now:
Code:
$ awk 'BEGIN { a = "These are an asterisk * and a plus +"
> b = gensub(/([*+])/, "\\\\&", "g", a)
> print b }'
These are an asterisk \* and a plus \+
Ok. Let's add a slash:
Code:
$ awk 'BEGIN { a = "Here we go * + /"
> b = gensub(/([*+/])/, "\\\\&", "g", a)
> print b }'
Here we go \* \+ \/
This shows us that a slash inside the character list does not act as closing slash for the regular expression. We are lucky.
Now the difficult part: square brackets. We have to be careful because if we put a closing square bracket in the wrong place, awk might think we want to close the character list. Let's try:
Code:
$ awk 'BEGIN { a = "Here we go * + / ]"
> b = gensub(/([*]+/])/, "\\\\&", "g", a)
> print b }'
awk: cmd. line:1: fatal: Unmatched ( or \(: /([*]+/
Naah... wrong place. It thinks we have closed the character list, so that the following slash closes the regular expression and the first open parenthesis ( remains unmatched.
Code:
$ awk 'BEGIN { a = "Here we go * + / ]"
> b = gensub(/([]*+/])/, "\\\\&", "g", a)
> print b }'
awk: cmd. line:1: fatal: Unmatched [ or [^: /([]*+/
This is weird. Placed at the beginning of the character list, the closing bracket should be interpreted literally, instead we have an unmatched [. What if we escape it?
Code:
$ awk 'BEGIN { a = "Here we go * + / ]"
b = gensub(/([\]*+/])/, "\\\\&", "g", a)
print b }'
Here we go \* \+ \/ \]
Hey... this seems to work! But pay attention to the following. We don't escape the closing square bracket but we add an opening one somewhere inside the character list:
Code:
$ awk 'BEGIN { a = "Here we go * + / ] ["
> b = gensub(/([]*+[/])/, "\\\\&", "g", a)
> print b }'
Here we go \* \+ \/ \] \[
This works. The two square brackets inside the character list are interpreted literally and we have matched the opening bracket at the same time. We have fooled awk!
Now the parentheses, but I prefer to leave you the pleasure (?) to find out the caveats, if you don't mind. Here is one of the working solutions:
Code:
awk 'BEGIN { a = "Here we go * + / ] [ ( )"
> b = gensub(/([])(*+[/])/, "\\\\&", "g", a)
> print b }'
Here we go \* \+ \/ \] \[ \( \)
In any case, take in mind that escaping square brackets and parentheses inside a character list is the most straightforward solution. Maybe.
I again have some problems with understanding. First I don't understand term caveat(s). Any synonym?
I read but still didn't understand what written there:
Table 8.4: POSIX 2001 rules for sub
Ok, now I had problem to understand the things about \\\\&
I didn't understand you sentence: "It states you can obtain a literal backslash ... \\\\&." But I think I understand now. So the \\\\ gets backslash and the & is like a reference to the content of parentheses? To a character or more characters in parentheses?
gensub(/(*)/, "\\\\&", "g"
I will study your next examples tommorow. I watched them now, but I am tired so my mind is tired to understand it. I would use something like ([a-zA-Z[]]) , but is it correct or is it interpreted as closed by the bold bracket? Or better your way ([[a-zA-Z]])
Ok, now I had problem to understand the things about \\\\&
I didn't understand you sentence: "It states you can obtain a literal backslash ... \\\\&." But I think I understand now. So the \\\\ gets backslash and the & is like a reference to the content of parentheses? To a character or more characters in parentheses?
1. If you are not using the numbered locators (for example, "\\1") you do not need the round brackets in regex - /(*)/ - only need () if you are going to use \\1
2. & - in the case of sub, gsub or gensub if & is used in the replacement then it is equal to whatever is matched in regex - /<whatever is in here>/ - & = <whatever is in here>
3. \\\\ - four slashes are required as two (\\) would equate to a single \ however we want to have a backslash in our output and this will only provide as an escape of the
next character, in your case the &, so it would equal - \&, not what you want. On the other hand, \\\\ means you will end up with \\&, so an escaped slash plus your ampersand
which as above is a copy of your regex
1. If you are not using the numbered locators (for example, "\\1") you do not need the round brackets in regex - /(*)/ - only need () if you are going to use \\1
http://dictionary.reference.com/browse/caveat
1. If you are not using the numbered locators (for example, "\\1") you do not need the round brackets in regex - /(*)/ - only need () if you are going to use \\1
2. & - in the case of sub, gsub or gensub if & is used in the replacement then it is equal to whatever is matched in regex - /<whatever is in here>/ - & = <whatever is in here>
We don't escape the closing square bracket but we add an opening one somewhere inside the character list:
Code:
b = gensub(/([]*+[/])/, "\\\\&", "g", a)
This works. The two square brackets inside the character list are interpreted literally and we have matched the opening bracket at the same time. We have fooled awk!
Hey, I don't understand it how this can work. I would expect the brackets [] would be interpreted like beginning and end of character class nothing containing.
Does it mean that if right bracket ] is right beside left bracket [ , so it interprets like normal character? Then the second left [ bracket is interpreted as normal character and the second right bracket ] is interpreted as end of char. class.
There was a thread a little while back on LQ about character classes and how to include square brackets []
What was discovered is that as long as the first character after opening square is the closing square that it then perceives this as an item
and not the closing bracket, whereas all other items were the same as other regex options in a character class.
Therefore, as long as it starts with:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.