LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   how to remove comments with sed (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-remove-comments-with-sed-734631/)

igor.R 06-21-2009 06:08 PM

how to remove comments with sed
 
Hi, everybody,

I have the following problem:

I have a source code with comments.
the comment symbols are (* and *)
everything between these symbols is a comment.
The (* and *) can be on different lines
in that case all those lines between (* and *)
are comments. I want to delete all comments
from the file. I found that it can be done
with sed by one-liner like this:

# delete section of file between two regular expressions (inclusive)
sed -n '/tag1/,/tag2/d'


I tried

sed -n "/(\*/,/\*)/d"

sed -n "/\(\*/,/\*\)/d"

sed -n "/{(\*}/,/{\*)}/d"

sed -n ":(\*/:,:\*):d"

nothing of these worked. So, my question is how to do that?
What is the regular expression for tag (* and tag *)

Best,

Thank you for help.

Tinkster 06-21-2009 06:39 PM

Code:

sed -r '/\(\*/,/\*\)/d' odd_comments
Note that this will butcher lines where there's code
before or after your comment, too, though.



Cheers,
Tink

ghostdog74 06-21-2009 07:13 PM

Code:

awk '/\(\*/{
  match($0,/\(\*/) #find where the (* is
  print substr($0,1,RSTART-1)  # print only from start of line to where the (* is
  f=1 #set a flag
  next
}
/\*\)/ && f{ #if flag is set ( ie (* is found ) and *) is found
  match($0,/\*\)/)    #find where the *) is
  print substr($0,RSTART+2)  #print from where *) is till the end of line
  f=0 #remove flag and go search for the next (*
  next 
}
!/\(\*/ && !/\*\)/ && f{next} #don't do when there are no comment lines
1' file

output
Code:

# more file
this is a word (* followed by comment *)
This is a line with a (* split
comment over three
lines *)
And one with no comments
So what's this (* gonna do
if there's *) anything behind comments?

# ./test.sh
this is a word
This is a line with a

And one with no comments
So what's this
 anything behind comments?


syg00 06-21-2009 07:16 PM

Hopefully Tink meant Note

I might be inclined to only do this if there was only whitespace before and after the comment indicators.

igor.R 06-21-2009 07:50 PM

Quote:

Originally Posted by Tinkster (Post 3581742)
Code:

sed -r '/\(\*/,/\*\)/d' odd_comments
Not that this will butcher lines where there's code
before or after your comment, too, though.



Cheers,
Tink


Uh. This did not work. It does not butcher lines where there's code.
It removes all lines everywhere. Very weird indeed.

Tinkster 06-21-2009 08:27 PM

Hmmm ...
Code:

~/tmp$ cat odd_comments                                                                                   
this is a word (* followed by comment *)
This is a line with a (* split
comment over three
lines *)
And one with no comments
So what's this (* gonna do
if there's *) anything behind comments?


~/tmp$ sed -r '/\(\*/,/\*\)/d' odd_comments                                                                                       
And one with no comments

Works here - copied & pasted verbatim.




Cheers,
Tink

Tinkster 06-21-2009 08:33 PM

Quote:

Originally Posted by syg00 (Post 3581760)
Hopefully Tink meant Note

I might be inclined to only do this if there was only whitespace before and after the comment indicators.

Indeed.

billymayday 06-21-2009 08:35 PM

Doesn't he want the output to be
Quote:

this is a word
This is a line with a
And one with no comments
So what's this
anything behind comments?
?

ghostdog74 06-21-2009 08:40 PM

i would believe so.

Tinkster 06-21-2009 08:41 PM

I'm just guiding him in the use of the sed-invocation
he spotted :D ...

syg00 06-21-2009 08:46 PM

Neither solution deals with embedded comments in the one line - with data before and/or after. A simple sed piped into ghostdog74 awk offering worked fine.
When it gets to that point, I start to think perl .

billymayday 06-21-2009 08:49 PM

I have to admit to something of a strained relationship with sed across line breaks.

Historically, I've just given up and written something small in python or C.

I'd actually suggest a tokenised approach would be best - ie, start at a count of zero, and while the count is zero, output a character, and increment count each time the pattern "(*" is found, and decrement the count each time "*)" is found. Allows for nested comments, etc.

Pretty simple to do, and probably quicker than playing around with sed (at least for me).

Edit - syg00 beat me to it.

igor.R 06-21-2009 11:57 PM

Quote:

Originally Posted by ghostdog74 (Post 3581756)
Code:

awk '/\(\*/{
  match($0,/\(\*/)
  print substr($0,1,RSTART-1) 
  f=1
  next
}
/\*\)/ && f{
  match($0,/\*\)/) 
  print substr($0,RSTART+2)
  f=0
  next 
}
!/\(\*/ && !/\*\)/ && f{next}
1' file

output
Code:

# more file
this is a word (* followed by comment *)
This is a line with a (* split
comment over three
lines *)
And one with no comments
So what's this (* gonna do
if there's *) anything behind comments?

# ./test.sh
this is a word
This is a line with a

And one with no comments
So what's this
 anything behind comments?


Thanks for help.
I had hoped that there was sed's one-liner that could do that.
It turns out that sed has a defect. :( Not good. We need to find
a responsible and severely :D punish him. Where should I file a complaint? :D:D.

Could you please explain me how your awk code works?

Kenhelm 06-22-2009 06:03 AM

'sed -n' only prints lines specified with the p command or the p flag of the s command. So none of the sed commands in the first post will give any output.
For sed '/tag1/,/tag2/d' to work, tag1 and tag2 have to be on different lines.
Complete lines are deleted, not just the tags and characters between the tags.

The following method needs GNU sed.
It replaces *) with \a, a GNU escape extension which produces or matches a bel character, ascii 7.
(It doesn't have to be \a; any character which will never be in a comment will do.)
Changing the end tag into a single character enables "greedy matching" to be limited using [^\a]*\a

':a N;$!ba' puts the whole file into the pattern space.
's/\*)/\a/g' replaces all *) with \a
's/(\*[^\a]*\a//g' deletes all the comments
Code:

sed ':a N;$!ba; s/\*)/\a/g; s/(\*[^\a]*\a//g' infile > outfile

# Input
this is a word (* followed by comment *)
This is a line with a (* split
comment over three
lines *)
And one with no comments
So what's this (* gonna do
if there's *) anything behind comments?
(*comment*)some data (*comment*)some more data

# Output
this is a word
This is a line with a
And one with no comments
So what's this  anything behind comments?
some data some more data


Tinkster 06-22-2009 11:54 AM

If loss of formatting (the line-break in the split comment) is
not an issue a perl one liner will do, too.
Code:

perl -nle '$/ = "" ; s/\(\*[^\)]+\*\)//gm; print $_' file
this is a word
This is a line with a
And one with no comments
So what's this  anything behind comments?



All times are GMT -5. The time now is 08:50 AM.