LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-03-2021, 03:45 AM   #1
Faki
Member
 
Registered: Oct 2021
Posts: 574

Rep: Reputation: Disabled
Using comment symbol for matching comment sections


I have the following bash function to extract sections between `## Mode: org` and `## # End of org`, using `#` as the comment character.

I would like to enable other comment characters instead of handling only `#`. Using a Character Class `[#;c!C]` could be a good plan.

I also want to allow a user-defined literal string for the texinfo comment identifier `@c`.

Code:
 
capture ()
 {
  local efile="$1"
  begorg='^[[:space:]]*## Mode: org$'
  endorg='^[[:space:]]*## # End of org$'
  awk -v bego="$begorg" -v endo="$endorg" \
    '$0 ~ bego { found=1; next } 
     $0 ~ endo { found=0; } 
     found { sub(/^[[:space:]]*#+[[:space:]]*/,""); print }' "$efile"
 }

Last edited by Faki; 11-03-2021 at 05:14 AM.
 
Old 11-03-2021, 03:50 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,129

Rep: Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374
and what is your problem exactly?
 
Old 11-03-2021, 04:15 AM   #3
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,847

Rep: Reputation: 1223Reputation: 1223Reputation: 1223Reputation: 1223Reputation: 1223Reputation: 1223Reputation: 1223Reputation: 1223Reputation: 1223
A [ ] character set is an OR for single characters.
awk uses ERE that has
| for OR
and
( ) for grouping
Code:
([#;!]|@[Cc])
is one of # ; ! or @ followed by one of C c
The ( ) limit the scope of the |
in case something is appended (or prepended)
 
Old 11-03-2021, 04:26 AM   #4
Faki
Member
 
Registered: Oct 2021
Posts: 574

Original Poster
Rep: Reputation: Disabled
I would like to use a variable for character class. And also take care of C-Language comments //
Capital `C` is for Fortran Fixed Form files.

For single comment characters `[#;!C]` I would like the possibility any number of comment characters, hence `([#;!C])+`. But for other comment types (e.g. `//` for C-Language, and `@c` for Texinfo-Language) there will be no repeats, thus just
`[//|@c]`.

Code:
capture ()
{
 local efile="$1"
 
 local begorg endorg charcl

 i="1"
 if [ "$i" = "1" ]; then 
   charcl='^[[:space:]]*(#|;|!)+[[:space:]]*' 
 elif [ "$i" = "2" ]; then
   charcl='^[[:space:]]*(//|@c)[[:space:]]*' 
 fi 
 
 begorg="${charcl}"'Mode: org$'
 endorg="${charcl}"'# End of org$'
 
 awk -v ccls="$charcl" -v bego="$begorg" -v endo="$endorg" \
   '$0 ~ bego { found=1; next } 
    $0 ~ endo { found=0; } 
    found { sub(/ccls/,""); print }' "$efile"
}

Last edited by Faki; 11-03-2021 at 07:42 AM.
 
Old 11-03-2021, 05:28 AM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,129

Rep: Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374
That's why you need to write more than one sub calls:
Code:
sub(/^[[:space:]]*#+[[:space:]]*/,"")
sub(/^[[:space:]]*//[[:space:]]*/,"")
... whatever ...
Or you can specify a complex regex too, but that would be the hard way....[
Code:
sub(/^[[:space:]]*<your regexp>[[:space:]]*/,"")
In that case probably I would rather use perl.
 
Old 11-03-2021, 06:21 AM   #6
Faki
Member
 
Registered: Oct 2021
Posts: 574

Original Poster
Rep: Reputation: Disabled
I agree. Have updated the function but have to recognise the language (whether `#`, `;`, `!`).

To avoid too complicated patterns, the if statement can be used to distinguish between single character comments (#;!) and double character comments (//|@c).

Last edited by Faki; 11-03-2021 at 06:37 AM.
 
Old 11-03-2021, 06:24 AM   #7
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,129

Rep: Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374
you can specify the language and in a BEGIN block (awk) you can specify that regex depending on the language
 
Old 11-03-2021, 07:03 AM   #8
Faki
Member
 
Registered: Oct 2021
Posts: 574

Original Poster
Rep: Reputation: Disabled
It would like to figure out the language from the function itself automatically by checking the comment character from the mode line "Mode: org"

Or I can use the extension of the input file.

Last edited by Faki; 11-03-2021 at 07:52 AM.
 
Old 11-03-2021, 07:39 AM   #9
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,129

Rep: Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374Reputation: 7374
so you need to implement that (in awk).
Code:
/<pattern>/ { set variable to A }
/other pattern/ { set variable to B }
... use this variable in sub
 
Old 11-03-2021, 09:50 AM   #10
Faki
Member
 
Registered: Oct 2021
Posts: 574

Original Poster
Rep: Reputation: Disabled
Have modified to check for any of the comment cases in a single regex.

Code:
capture ()
{
 local efile="$1"
 
 local charcl begorg endorg

 charcl='^[[:space:]]*([#;!]+|@c|\/\/)[[:space:]]*' 
 begrec="${charcl}"'Mode: org$'
 endrec="${charcl}"'# End of org$'
 
 awk -v ccls="$charcl" -v bego="$begorg" -v endo="$endorg" \
   '$0 ~ bego { found=1; next }
    $0 ~ endo { found=0; }
    found { sub(/ccls/,""); print }' "$efile"
}
But when I tried it, the comment characters are still displayed, rather than deleted.

Code:
awk: warning: escape sequence `\/' treated as plain `/'
 #  Assigns shell positional parameters or changes the values of shell
 #  options.  The -- option assigns the positional parameters to the
 #  arguments of {set}, even when some of them start with an option
 #  prefix `-'.
 ;  Assigns shell positional parameters or changes the values of shell
 ;  options.  The -- option assigns the positional parameters to the
 ;  arguments of {set}, even when some of them start with an option
 ;  prefix `-'.
 @c Assigns shell positional parameters or changes the values of shell
 @c options.  The -- option assigns the positional parameters to the
 @c arguments of {set}, even when some of them start with an option
 @c prefix `-'.
This was the input

Code:
 ## Mode: org
 #  Assigns shell positional parameters or changes the values of shell
 #  options.  The -- option assigns the positional parameters to the
 #  arguments of {set}, even when some of them start with an option
 #  prefix `-'.
 ## # End of org

 ;; Mode: org
 ;  Assigns shell positional parameters or changes the values of shell
 ;  options.  The -- option assigns the positional parameters to the
 ;  arguments of {set}, even when some of them start with an option
 ;  prefix `-'.
 ;; # End of org
 
 @c Mode: org
 @c Assigns shell positional parameters or changes the values of shell
 @c options.  The -- option assigns the positional parameters to the
 @c arguments of {set}, even when some of them start with an option
 @c prefix `-'.
 @c # End of org

Last edited by Faki; 11-03-2021 at 01:19 PM.
 
  


Reply

Tags
bash, matching



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Comment out a line using sed matching multiple patterns. Prasad321 Linux - Newbie 2 08-07-2019 11:15 PM
What are the differences between the normal symbol table, the dynamic symbol table, and the debugging symbol table? watchintv Linux - Software 5 10-22-2016 08:38 AM
Find/grep command to find matching files, print filename, then print matching content stefanlasiewski Programming 9 06-30-2016 05:30 PM
[SOLVED] Matching two tables of non-matching sizes astroumut Programming 3 03-03-2011 07:05 AM
Perl Script needed to be reversed to output matching, not non-matching 0bfuscated Programming 2 07-20-2010 10:51 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 09:18 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration