Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a source code with comments.
the comment symbols are (* and *)
everything between these symbols is a comment.
The (* and *) can be on different lines
in that case all those lines between (* and *)
are comments. I want to delete all comments
from the file. I found that it can be done
with sed by one-liner like this:
# delete section of file between two regular expressions (inclusive)
sed -n '/tag1/,/tag2/d'
I tried
sed -n "/(\*/,/\*)/d"
sed -n "/\(\*/,/\*\)/d"
sed -n "/{(\*}/,/{\*)}/d"
sed -n "\*/:,:\*):d"
nothing of these worked. So, my question is how to do that?
What is the regular expression for tag (* and tag *)
awk '/\(\*/{
match($0,/\(\*/) #find where the (* is
print substr($0,1,RSTART-1) # print only from start of line to where the (* is
f=1 #set a flag
next
}
/\*\)/ && f{ #if flag is set ( ie (* is found ) and *) is found
match($0,/\*\)/) #find where the *) is
print substr($0,RSTART+2) #print from where *) is till the end of line
f=0 #remove flag and go search for the next (*
next
}
!/\(\*/ && !/\*\)/ && f{next} #don't do when there are no comment lines
1' file
output
Code:
# more file
this is a word (* followed by comment *)
This is a line with a (* split
comment over three
lines *)
And one with no comments
So what's this (* gonna do
if there's *) anything behind comments?
# ./test.sh
this is a word
This is a line with a
And one with no comments
So what's this
anything behind comments?
Last edited by ghostdog74; 06-22-2009 at 12:20 AM.
Reason: show output
~/tmp$ cat odd_comments
this is a word (* followed by comment *)
This is a line with a (* split
comment over three
lines *)
And one with no comments
So what's this (* gonna do
if there's *) anything behind comments?
~/tmp$ sed -r '/\(\*/,/\*\)/d' odd_comments
And one with no comments
Neither solution deals with embedded comments in the one line - with data before and/or after. A simple sed piped into ghostdog74 awk offering worked fine.
When it gets to that point, I start to think perl .
I have to admit to something of a strained relationship with sed across line breaks.
Historically, I've just given up and written something small in python or C.
I'd actually suggest a tokenised approach would be best - ie, start at a count of zero, and while the count is zero, output a character, and increment count each time the pattern "(*" is found, and decrement the count each time "*)" is found. Allows for nested comments, etc.
Pretty simple to do, and probably quicker than playing around with sed (at least for me).
Edit - syg00 beat me to it.
Last edited by billymayday; 06-21-2009 at 08:51 PM.
# more file
this is a word (* followed by comment *)
This is a line with a (* split
comment over three
lines *)
And one with no comments
So what's this (* gonna do
if there's *) anything behind comments?
# ./test.sh
this is a word
This is a line with a
And one with no comments
So what's this
anything behind comments?
Thanks for help.
I had hoped that there was sed's one-liner that could do that.
It turns out that sed has a defect. Not good. We need to find
a responsible and severely punish him. Where should I file a complaint? .
Could you please explain me how your awk code works?
'sed -n' only prints lines specified with the p command or the p flag of the s command. So none of the sed commands in the first post will give any output.
For sed '/tag1/,/tag2/d' to work, tag1 and tag2 have to be on different lines.
Complete lines are deleted, not just the tags and characters between the tags.
The following method needs GNU sed.
It replaces *) with \a, a GNU escape extension which produces or matches a bel character, ascii 7.
(It doesn't have to be \a; any character which will never be in a comment will do.)
Changing the end tag into a single character enables "greedy matching" to be limited using [^\a]*\a
':a N;$!ba' puts the whole file into the pattern space.
's/\*)/\a/g' replaces all *) with \a
's/(\*[^\a]*\a//g' deletes all the comments
Code:
sed ':a N;$!ba; s/\*)/\a/g; s/(\*[^\a]*\a//g' infile > outfile
# Input
this is a word (* followed by comment *)
This is a line with a (* split
comment over three
lines *)
And one with no comments
So what's this (* gonna do
if there's *) anything behind comments?
(*comment*)some data (*comment*)some more data
# Output
this is a word
This is a line with a
And one with no comments
So what's this anything behind comments?
some data some more data
If loss of formatting (the line-break in the split comment) is
not an issue a perl one liner will do, too.
Code:
perl -nle '$/ = "" ; s/\(\*[^\)]+\*\)//gm; print $_' file
this is a word
This is a line with a
And one with no comments
So what's this anything behind comments?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.