LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices



Reply
 
Search this Thread
Old 06-21-2009, 07:08 PM   #1
igor.R
Member
 
Registered: Mar 2004
Location: Atlanta
Distribution: Redhat 9.0
Posts: 49

Rep: Reputation: 16
how to remove comments with sed


Hi, everybody,

I have the following problem:

I have a source code with comments.
the comment symbols are (* and *)
everything between these symbols is a comment.
The (* and *) can be on different lines
in that case all those lines between (* and *)
are comments. I want to delete all comments
from the file. I found that it can be done
with sed by one-liner like this:

# delete section of file between two regular expressions (inclusive)
sed -n '/tag1/,/tag2/d'


I tried

sed -n "/(\*/,/\*)/d"

sed -n "/\(\*/,/\*\)/d"

sed -n "/{(\*}/,/{\*)}/d"

sed -n "\*/:,:\*):d"

nothing of these worked. So, my question is how to do that?
What is the regular expression for tag (* and tag *)

Best,

Thank you for help.
 
Old 06-21-2009, 07:39 PM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,005
Blog Entries: 11

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Code:
sed -r '/\(\*/,/\*\)/d' odd_comments
Note that this will butcher lines where there's code
before or after your comment, too, though.



Cheers,
Tink

Last edited by Tinkster; 06-21-2009 at 09:37 PM. Reason: [b]e[/b] ... thx syg00 :D
 
Old 06-21-2009, 08:13 PM   #3
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Code:
awk '/\(\*/{
  match($0,/\(\*/) #find where the (* is
  print substr($0,1,RSTART-1)   # print only from start of line to where the (* is 
  f=1 #set a flag
  next 
}
/\*\)/ && f{ #if flag is set ( ie (* is found ) and *) is found
   match($0,/\*\)/)    #find where the *) is 
   print substr($0,RSTART+2)  #print from where *) is till the end of line
   f=0 #remove flag and go search for the next (*
   next   
}
!/\(\*/ && !/\*\)/ && f{next} #don't do when there are no comment lines
1' file
output
Code:
# more file
this is a word (* followed by comment *)
This is a line with a (* split
comment over three
lines *)
And one with no comments
So what's this (* gonna do
if there's *) anything behind comments?

# ./test.sh
this is a word
This is a line with a

And one with no comments
So what's this
 anything behind comments?

Last edited by ghostdog74; 06-22-2009 at 01:20 AM. Reason: show output
 
Old 06-21-2009, 08:16 PM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,505

Rep: Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079
Hopefully Tink meant Note

I might be inclined to only do this if there was only whitespace before and after the comment indicators.
 
Old 06-21-2009, 08:50 PM   #5
igor.R
Member
 
Registered: Mar 2004
Location: Atlanta
Distribution: Redhat 9.0
Posts: 49

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by Tinkster View Post
Code:
sed -r '/\(\*/,/\*\)/d' odd_comments
Not that this will butcher lines where there's code
before or after your comment, too, though.



Cheers,
Tink

Uh. This did not work. It does not butcher lines where there's code.
It removes all lines everywhere. Very weird indeed.
 
Old 06-21-2009, 09:27 PM   #6
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,005
Blog Entries: 11

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Hmmm ...
Code:
~/tmp$ cat odd_comments                                                                                     
this is a word (* followed by comment *)
This is a line with a (* split 
comment over three 
lines *)
And one with no comments 
So what's this (* gonna do
if there's *) anything behind comments?


~/tmp$ sed -r '/\(\*/,/\*\)/d' odd_comments                                                                                        
And one with no comments
Works here - copied & pasted verbatim.




Cheers,
Tink
 
Old 06-21-2009, 09:33 PM   #7
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,005
Blog Entries: 11

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Quote:
Originally Posted by syg00 View Post
Hopefully Tink meant Note

I might be inclined to only do this if there was only whitespace before and after the comment indicators.
Indeed.
 
Old 06-21-2009, 09:35 PM   #8
billymayday
Guru
 
Registered: Mar 2006
Location: Sydney, Australia
Distribution: Fedora, CentOS, OpenSuse, Slack, Gentoo, Debian, Arch, PCBSD
Posts: 6,678

Rep: Reputation: 122Reputation: 122
Doesn't he want the output to be
Quote:
this is a word
This is a line with a
And one with no comments
So what's this
anything behind comments?
?
 
Old 06-21-2009, 09:40 PM   #9
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
i would believe so.
 
Old 06-21-2009, 09:41 PM   #10
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,005
Blog Entries: 11

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
I'm just guiding him in the use of the sed-invocation
he spotted :D ...
 
Old 06-21-2009, 09:46 PM   #11
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,505

Rep: Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079
Neither solution deals with embedded comments in the one line - with data before and/or after. A simple sed piped into ghostdog74 awk offering worked fine.
When it gets to that point, I start to think perl .
 
Old 06-21-2009, 09:49 PM   #12
billymayday
Guru
 
Registered: Mar 2006
Location: Sydney, Australia
Distribution: Fedora, CentOS, OpenSuse, Slack, Gentoo, Debian, Arch, PCBSD
Posts: 6,678

Rep: Reputation: 122Reputation: 122
I have to admit to something of a strained relationship with sed across line breaks.

Historically, I've just given up and written something small in python or C.

I'd actually suggest a tokenised approach would be best - ie, start at a count of zero, and while the count is zero, output a character, and increment count each time the pattern "(*" is found, and decrement the count each time "*)" is found. Allows for nested comments, etc.

Pretty simple to do, and probably quicker than playing around with sed (at least for me).

Edit - syg00 beat me to it.

Last edited by billymayday; 06-21-2009 at 09:51 PM.
 
Old 06-22-2009, 12:57 AM   #13
igor.R
Member
 
Registered: Mar 2004
Location: Atlanta
Distribution: Redhat 9.0
Posts: 49

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by ghostdog74 View Post
Code:
awk '/\(\*/{
  match($0,/\(\*/)
  print substr($0,1,RSTART-1)  
  f=1
  next
}
/\*\)/ && f{
   match($0,/\*\)/)   
   print substr($0,RSTART+2) 
   f=0
   next   
}
!/\(\*/ && !/\*\)/ && f{next}
1' file
output
Code:
# more file
this is a word (* followed by comment *)
This is a line with a (* split
comment over three
lines *)
And one with no comments
So what's this (* gonna do
if there's *) anything behind comments?

# ./test.sh
this is a word
This is a line with a

And one with no comments
So what's this
 anything behind comments?
Thanks for help.
I had hoped that there was sed's one-liner that could do that.
It turns out that sed has a defect. Not good. We need to find
a responsible and severely punish him. Where should I file a complaint? .

Could you please explain me how your awk code works?

Last edited by igor.R; 06-22-2009 at 12:58 AM.
 
Old 06-22-2009, 07:03 AM   #14
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 333

Rep: Reputation: 141Reputation: 141
'sed -n' only prints lines specified with the p command or the p flag of the s command. So none of the sed commands in the first post will give any output.
For sed '/tag1/,/tag2/d' to work, tag1 and tag2 have to be on different lines.
Complete lines are deleted, not just the tags and characters between the tags.

The following method needs GNU sed.
It replaces *) with \a, a GNU escape extension which produces or matches a bel character, ascii 7.
(It doesn't have to be \a; any character which will never be in a comment will do.)
Changing the end tag into a single character enables "greedy matching" to be limited using [^\a]*\a

':a N;$!ba' puts the whole file into the pattern space.
's/\*)/\a/g' replaces all *) with \a
's/(\*[^\a]*\a//g' deletes all the comments
Code:
sed ':a N;$!ba; s/\*)/\a/g; s/(\*[^\a]*\a//g' infile > outfile

# Input
this is a word (* followed by comment *)
This is a line with a (* split
comment over three
lines *)
And one with no comments
So what's this (* gonna do
if there's *) anything behind comments?
(*comment*)some data (*comment*)some more data

# Output
this is a word
This is a line with a
And one with no comments
So what's this  anything behind comments?
some data some more data
 
Old 06-22-2009, 12:54 PM   #15
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,005
Blog Entries: 11

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
If loss of formatting (the line-break in the split comment) is
not an issue a perl one liner will do, too.
Code:
perl -nle '$/ = "" ; s/\(\*[^\)]+\*\)//gm; print $_' file
this is a word 
This is a line with a 
And one with no comments
So what's this  anything behind comments?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to remove a value with sed mierdatuti Programming 3 12-18-2008 06:36 PM
RHEL 5 Kickstart copying files from DVD & inserting comments with sed ilo Linux - Enterprise 0 01-31-2008 01:01 PM
using sed to remove all but ip addresses chess Programming 10 07-02-2007 02:54 AM
Remove Java code comments HiOctane21 Programming 9 04-28-2007 10:52 AM
How sed strip /* comments aaronzh Programming 1 06-05-2003 06:16 PM


All times are GMT -5. The time now is 09:09 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration