ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
[...]
* tail
** sub level 1
[1] task foo
** sub level 2
[2.1.c] task bar
* tasks
** [1] task foo
some description
on several lines
** [2] task baz
some description
on several lines
** [2.1.c] task bar
some description
on several lines
* other sections
...
I'd like to write a script that :
Code:
1. Locate the last task listed in * tail section
in this example, it is [2.1.c] task bar
2. Display the description of that task found in the * tasks section.
In this example, it should display :
** [2.1.c] task bar
some description
on several lines
I'm curious if there's a simple sed/awk solution to this. If not, I will turn to python.
But simpler is better.
There's a few ways to access the last line of a variable, perhaps the simplest is to split on newlines and access the last element of the resulting array.
Awk's split is slightly differently to other languages:
Where "input_string" is changed to whatever variable/expression contains the lines, and "my_array" and "my_array_len" are variable which can be named however you like.
(If there's a trailing newline in the input, a -1 could be added to counter that.)
#!/usr/bin/gawk -f
BEGIN {
RS="\n\\* ";
FS="\n\\*\\*";
}
NR==3 {
fieldno=NF-3;
l=split($fieldno,A,"\n");
task=A[l];
print("task is", task);
printf("it was found on third record and %sth field\n", fieldno);
}
NR==4 {
for(i = NF; i > 0; i--) {
#if ($i ~ "learning to use getopts") {
if ($i ~ task ) {
printf("expression found in %sth row, %sth field\n",NR,i);
print $i;
}
}
}
The problem is that task variable contains special characters "[" "]", so the if ($i ~ task) condition will never meet.
With the other test if ($i ~ "learning to use getopts") I get a match :
Code:
12:19:53 ~/CODE/TMP -2- $ ./awk ~/NOTES/LOG/TASKS/nouvelle-vm-dns.flow
task is [16.1.1] learning to use getopts
it was found on third record and 15th field
expression found in 4th row, 43th field
[16.1.1] learning to use getopts
12:33:22 ~/CODE/TMP -2- $
But with the if ($i ~ task) code I get no match
Code:
12:33:22 ~/CODE/TMP -2- $ ./awk ~/NOTES/LOG/TASKS/nouvelle-vm-dns.flow
task is [16.1.1] learning to use getopts
it was found on third record and 15th field
12:33:36 ~/CODE/TMP -2- $
NR==4 {
for(i = NF; i > 0; i--) {
#if ($i ~ "learning to use getopts") {
gsub(/[$^*()+\[\]{}.?\\|]/,"\\\\&",text);
if ($i ~ task ) {
printf("expression found in %sth row, %sth field\n",NR,i);
print $i;
}
}
}
15:04:28 ~/CODE/TMP -2- $ ./awk ~/NOTES/LOG/TASKS/nouvelle-vm-dns.flow
task is [16.1.1] learning to use getopts
it was found on third record and 15th field
15:16:50 ~/CODE/TMP -2- $
but index did.
Code:
NR==4 {
for(i = NF; i > 0; i--) {
#if ($i ~ "learning to use getopts") {
# gsub(/[$^*()+\[\]{}.?\\|]/,"\\\\&",text);
# if ($i ~ task) {
if (index($i,task)) {
printf("expression found in %sth row, %sth field\n",NR,i);
print $i;
}
}
}
15:16:50 ~/CODE/TMP -2- $ ./awk ~/NOTES/LOG/TASKS/nouvelle-vm-dns.flow
task is [16.1.1] learning to use getopts
it was found on third record and 15th field
expression found in 4th row, 43th field
[16.1.1] learning to use getopts
15:17:26 ~/CODE/TMP -2- $
oops, you were right ^^', didn't check the name of the variable.
But there's something intruiguing, if I use the original variable tasks I get a very big load of backslashes printed out (see https://i.imgur.com/mgoKUtJ.png).
Code:
NR==4 {
for(i = NF; i > 0; i--) {
gsub(/[$^*()+\[\]{}.?\\|]/,"\\\\&",task);
if ($i ~ task) {
printf("expression found in %sth row, %sth field\n",NR,i);
print $i;
}
}
}
If I use a copy of the variable, it works as exepcted
Code:
NR==4 {
for(i = NF; i > 0; i--) {
taskc=task;
gsub(/[$^*()+\[\]{}.?\\|]/,"\\\\&",taskc);
if ($i ~ taskc) {
printf("expression found in %sth row, %sth field\n",NR,i);
print $i;
}
}
}
But there's something intruiguing, if I use the original variable tasks I get a very big load of backslashes printed out
...
If I use a copy of the variable, it works as exepcted
Interesting. I guess there's a quirk related to how it modifies the variable in place, and the act of assignment changes some property to resolve that.
Unless there's some documented reason for it, you should check which implementation + version of Awk you're using and probably raise it as a bug.
edit: I was distracted earlier - it's because the gsub is occurring inside a loop, and each iteration adds/doubles backslashes. If used, it should be done prior to the loop.
Quote:
Originally Posted by ychaouche
Code:
gsub(/[$^*()+\[\]{}.?\\|]/,"\\\\&",task);
This is handy. Should I turn into a function? I might use it in future scripts.
Yep - it was translated from one of the functions in a regex library for a different language.
I was a little surprised it wasn't already a built-in function.
Yep, the benefit of escaping is for use inside a larger pattern; when not doing that, the index function is simpler, clearer, more efficient, etc.
Also, I wasn't paying attention earlier - the excess slashes are due to the replace being performed inside a loop; if used it needs to only be done once (hence why resetting the variable immediately prior hid the issue).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.