LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-29-2022, 11:02 AM   #1
ychaouche
Member
 
Registered: Mar 2017
Distribution: Mint, Debian, Q4OS, Mageia, KDE Neon
Posts: 364
Blog Entries: 1

Rep: Reputation: 47
Question Extracting text at a dynamic location


Consider a file that is structured like this

Code:
[...]

* tail
** sub level 1
[1] task foo
** sub level 2
[2.1.c] task bar

* tasks
** [1] task foo
some description
on several lines
** [2] task baz
some description
on several lines
** [2.1.c] task bar
some description
on several lines

* other sections
 ...
I'd like to write a script that :

Code:
1. Locate the last task listed in * tail section 
   in this example, it is [2.1.c] task bar

2. Display the description of that task found in the * tasks section. 
   In this example, it should display : 

** [2.1.c] task bar
some description
on several lines

I'm curious if there's a simple sed/awk solution to this. If not, I will turn to python.
But simpler is better.
 
Old 09-29-2022, 12:32 PM   #2
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,573

Rep: Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534

This sounds familiar... https://www.linuxquestions.org/questions/programming-9/awk-sed-bash-script-to-display-last-changelog-entry-4175716978

That's a very similar problem, so yes, Awk can do this too, and you should be able to apply what you've learned from that thread.

Here are some hints: Use "\n\* " as a record separator, "\n\*\* " as a field separator, and "$NF" to refer to the last field in a record.

Using that information have a go yourself and if you get stuck show your efforts.


Last edited by boughtonp; 09-29-2022 at 12:33 PM.
 
1 members found this post helpful.
Old 10-03-2022, 05:30 AM   #3
ychaouche
Member
 
Registered: Mar 2017
Distribution: Mint, Debian, Q4OS, Mageia, KDE Neon
Posts: 364

Original Poster
Blog Entries: 1

Rep: Reputation: 47
Thanks boughonp! A little update, I need to access last line from a field that has multiple lines,
like so :

Quote:
$ awk -v RS='\n\\* ' -v FS='\n\\*\\* ' 'NR==3 {print $(NF-3)}' ~/NOTES/LOG/TASKS/nouvelle-vm-dns.flow
2022-09-19
09:44:58
[11.1] Plan

09:48:00
[15] override


09:50:45
[16] notes.search.recent

10:08:33
[16.1] refactoring de toutes les fonctions de recherche de notes

10:20:42
[16.1.1] learning to use getopts
$
In this example, I need to extract last line which is :

Quote:
[16.1.1] learning to use getopts
 
Old 10-04-2022, 12:09 PM   #4
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,573

Rep: Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534

There's a few ways to access the last line of a variable, perhaps the simplest is to split on newlines and access the last element of the resulting array.

Awk's split is slightly differently to other languages:
Code:
my_array_len = split(input_string,my_array,"\n");
print my_array[my_array_len];
Where "input_string" is changed to whatever variable/expression contains the lines, and "my_array" and "my_array_len" are variable which can be named however you like.

(If there's a trailing newline in the input, a -1 could be added to counter that.)

 
Old 10-04-2022, 04:14 PM   #5
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,768

Rep: Reputation: 1192Reputation: 1192Reputation: 1192Reputation: 1192Reputation: 1192Reputation: 1192Reputation: 1192Reputation: 1192Reputation: 1192
Or pipe it to tail -1
Code:
awk ... | tail -1
 
Old 10-05-2022, 03:55 AM   #6
ychaouche
Member
 
Registered: Mar 2017
Distribution: Mint, Debian, Q4OS, Mageia, KDE Neon
Posts: 364

Original Poster
Blog Entries: 1

Rep: Reputation: 47
Thank you boughtonp! that code will save me a gratuitious call to length() as in my code

Code:
18:25:20 ~ -1- $ awk  -v RS='\n\\* ' -v FS='\n\\*\\* ' 'NR==3 {fieldno=NF-3; split($fieldno,A,"\n"); print A[length(A)] }'  ~/NOTES/LOG/TASKS/nouvelle-vm-dns.flow
[16.1.1] learning to use getopts
18:25:31 ~ -1- $
@MadeInGermany it's a shame to use another subprocess when you can do everything inside awk.
 
Old 10-05-2022, 06:42 AM   #7
ychaouche
Member
 
Registered: Mar 2017
Distribution: Mint, Debian, Q4OS, Mageia, KDE Neon
Posts: 364

Original Poster
Blog Entries: 1

Rep: Reputation: 47
The code is now in its own file

Code:
#!/usr/bin/gawk -f 
BEGIN {
    RS="\n\\* ";
    FS="\n\\*\\*";
}

NR==3 {
    fieldno=NF-3; 
    l=split($fieldno,A,"\n"); 
    task=A[l];
    print("task is", task);
    printf("it was found on third record and %sth field\n", fieldno);
} 

NR==4 {
    for(i = NF; i > 0; i--) {
	#if ($i ~ "learning to use getopts") {
	if ($i ~ task ) {
	    printf("expression found in %sth row, %sth field\n",NR,i);
	    print $i;
	}
    } 
}


The problem is that task variable contains special characters "[" "]", so the if ($i ~ task) condition will never meet.
With the other test if ($i ~ "learning to use getopts") I get a match :

Code:
12:19:53 ~/CODE/TMP -2- $ ./awk ~/NOTES/LOG/TASKS/nouvelle-vm-dns.flow
task is [16.1.1] learning to use getopts
it was found on third record and 15th field
expression found in 4th row, 43th field
 [16.1.1] learning to use getopts   

12:33:22 ~/CODE/TMP -2- $
But with the if ($i ~ task) code I get no match

Code:
12:33:22 ~/CODE/TMP -2- $ ./awk ~/NOTES/LOG/TASKS/nouvelle-vm-dns.flow
task is [16.1.1] learning to use getopts
it was found on third record and 15th field
12:33:36 ~/CODE/TMP -2- $
 
Old 10-05-2022, 07:48 AM   #8
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,573

Rep: Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534

To do a non-regex find, use index(haystack,needle) - returns position of match, with a starting string returning 1.

But it can be useful to convert a string to a regex pattern, by adding backslashes where required:
Code:
gsub(/[$^*()+\[\]{}.?\\|]/,"\\\\&",text);
Then the pattern can have additional regex metacharacters prefixed/suffixed as required (e.g. to ensure start/end of string, variable prefixes, etc.)

 
Old 10-05-2022, 09:18 AM   #9
ychaouche
Member
 
Registered: Mar 2017
Distribution: Mint, Debian, Q4OS, Mageia, KDE Neon
Posts: 364

Original Poster
Blog Entries: 1

Rep: Reputation: 47
gsub didn't work

Code:
NR==4 {
    for(i = NF; i > 0; i--) {
	#if ($i ~ "learning to use getopts") {
	gsub(/[$^*()+\[\]{}.?\\|]/,"\\\\&",text);
	if ($i ~ task ) {
	    printf("expression found in %sth row, %sth field\n",NR,i);
	    print $i;
	}
    } 
}

15:04:28 ~/CODE/TMP -2- $ ./awk ~/NOTES/LOG/TASKS/nouvelle-vm-dns.flow
task is [16.1.1] learning to use getopts
it was found on third record and 15th field
15:16:50 ~/CODE/TMP -2- $

but index did.

Code:
NR==4 {
    for(i = NF; i > 0; i--) {
	#if ($i ~ "learning to use getopts") {
	# gsub(/[$^*()+\[\]{}.?\\|]/,"\\\\&",text);
	# if ($i ~ task) {
	if (index($i,task)) {
	    printf("expression found in %sth row, %sth field\n",NR,i);
	    print $i;
	}
    } 
} 




15:16:50 ~/CODE/TMP -2- $ ./awk ~/NOTES/LOG/TASKS/nouvelle-vm-dns.flow
task is [16.1.1] learning to use getopts
it was found on third record and 15th field
expression found in 4th row, 43th field
 [16.1.1] learning to use getopts

15:17:26 ~/CODE/TMP -2- $
 
Old 10-05-2022, 09:30 AM   #10
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,573

Rep: Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534

The "text" bit was intended to be generic - in your context you'd want something like:

Code:
...
taskrx = task
gsub(/[$^*()+\[\]{}.?\\|]/,"\\\\&",taskrx);
if ($i ~ taskrx) {
...
But this is approach is more for if you want to add to the pattern - if index is sufficient that's the simpler and more efficient approach.

 
Old 10-05-2022, 10:59 AM   #11
ychaouche
Member
 
Registered: Mar 2017
Distribution: Mint, Debian, Q4OS, Mageia, KDE Neon
Posts: 364

Original Poster
Blog Entries: 1

Rep: Reputation: 47
oops, you were right ^^', didn't check the name of the variable.

But there's something intruiguing, if I use the original variable tasks I get a very big load of backslashes printed out (see https://i.imgur.com/mgoKUtJ.png).

Code:
NR==4 {
    for(i = NF; i > 0; i--) {
	gsub(/[$^*()+\[\]{}.?\\|]/,"\\\\&",task);
	if ($i ~ task) {
	    printf("expression found in %sth row, %sth field\n",NR,i);
	    print $i;
	}
    } 
}
If I use a copy of the variable, it works as exepcted


Code:
NR==4 {
    for(i = NF; i > 0; i--) {
	taskc=task;
	gsub(/[$^*()+\[\]{}.?\\|]/,"\\\\&",taskc);
	if ($i ~ taskc) {
	    printf("expression found in %sth row, %sth field\n",NR,i);
	    print $i;
	}
    } 
}
 
Old 10-05-2022, 11:05 AM   #12
ychaouche
Member
 
Registered: Mar 2017
Distribution: Mint, Debian, Q4OS, Mageia, KDE Neon
Posts: 364

Original Poster
Blog Entries: 1

Rep: Reputation: 47
Code:
 gsub(/[$^*()+\[\]{}.?\\|]/,"\\\\&",task);
This is handy. Should I turn into a function? I might use it in future scripts.
 
Old 10-05-2022, 11:17 AM   #13
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,573

Rep: Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534
Quote:
Originally Posted by ychaouche View Post
But there's something intruiguing, if I use the original variable tasks I get a very big load of backslashes printed out
...

If I use a copy of the variable, it works as exepcted
Interesting. I guess there's a quirk related to how it modifies the variable in place, and the act of assignment changes some property to resolve that.

Unless there's some documented reason for it, you should check which implementation + version of Awk you're using and probably raise it as a bug.


edit: I was distracted earlier - it's because the gsub is occurring inside a loop, and each iteration adds/doubles backslashes. If used, it should be done prior to the loop.


Quote:
Originally Posted by ychaouche View Post
Code:
 gsub(/[$^*()+\[\]{}.?\\|]/,"\\\\&",task);
This is handy. Should I turn into a function? I might use it in future scripts.
Yep - it was translated from one of the functions in a regex library for a different language.

I was a little surprised it wasn't already a built-in function.


Last edited by boughtonp; 10-05-2022 at 05:11 PM.
 
Old 10-05-2022, 01:32 PM   #14
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,768

Rep: Reputation: 1192Reputation: 1192Reputation: 1192Reputation: 1192Reputation: 1192Reputation: 1192Reputation: 1192Reputation: 1192Reputation: 1192
Oh my dear
Go for the index() function!
 
Old 10-05-2022, 05:17 PM   #15
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,573

Rep: Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534Reputation: 2534
Quote:
Originally Posted by MadeInGermany View Post
Go for the index() function!
Yep, the benefit of escaping is for use inside a larger pattern; when not doing that, the index function is simpler, clearer, more efficient, etc.


Also, I wasn't paying attention earlier - the excess slashes are due to the replace being performed inside a loop; if used it needs to only be done once (hence why resetting the variable immediately prior hid the issue).

 
  


Reply

Tags
awk, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Google I/O Android News: Location, Location, Location (Plus Cloud Messaging and Bluetooth) LXer Syndicated Linux News 0 06-05-2013 01:00 PM
[SOLVED] Extracting every nth line in a text file to a new text file? paradeboy Linux - General 4 03-29-2012 10:03 PM
Extracting certain lines from a text and outputting to new text files? paradeboy Linux - Newbie 4 03-14-2012 12:02 AM
extracting a chunk of text from a large text file lothario Linux - Software 3 02-28-2007 08:16 AM
location, location, location! mermxx LQ Suggestions & Feedback 9 09-25-2004 03:08 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:30 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration