LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 01-09-2008, 12:35 PM   #1
jgombos
Member
 
Registered: Jul 2003
Posts: 256

Rep: Reputation: 32
Need a script to remove last comma in a file


I've written a sed script that does almost everything I want. The job that remains is to remove the last comma of a file. A typical file resembles this after sed is done with it:
Code:
enum MyEnum
{
   element1, //some comment
   element2, //some comment
   element3, //some comment
};
The C++ compiler won't accept this because the last element has a comma. And there seems to be no way to make sed give special treatment to the last line. I could almost pull it off by using the sed "N" command to combine two lines, and do a replacement on "element3, //some comment\n};" for example, but it wouldn't work on enums that have an even number of elements.

Can awk handle this job? It needs to be scripted in a language that are common in the Make environment, because it will have to be embedded in a makefile.

Each enum gets its own file, btw.

Last edited by jgombos; 01-09-2008 at 12:45 PM.
 
Old 01-09-2008, 01:20 PM   #2
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,775

Rep: Reputation: 481Reputation: 481Reputation: 481Reputation: 481Reputation: 481
If the files are all short like that, you might just flip the text of the entire file with 'rev' and strip out the last comma before flipping it back.
 
Old 01-09-2008, 01:20 PM   #3
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,396
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Here's my offering, in perl.
Code:
#! /usr/bin/perl -w
#
#   LQjgombos.pl  - delete last comma from file
#
#   Usage: LQjgombos.pl file.txt

use strict;

    open( INFILE, $ARGV[0] ) || die "Cannot read $ARGV[0]: $!\n";
    my @file = <INFILE>;
    close INFILE;
    my $i = @file-1;
    while( $file[$i] !~ m/,/ ){ $i--; }
    $file[$i] =~ s/,//;
    open( OUTFILE, ">$ARGV[0]" ) || die "Cannot write $ARGV[0]: $!\n";
    print OUTFILE @file;
    close OUTFILE;
Assumes the input file obeys the specified format.
--- rod.
 
Old 01-09-2008, 02:19 PM   #4
jgombos
Member
 
Registered: Jul 2003
Posts: 256

Original Poster
Rep: Reputation: 32
Quote:
Originally Posted by theNbomr View Post
Here's my offering, in perl.
...
Assumes the input file obeys the specified format.
--- rod.
Thanks for the help!
I'm figuring I'll be able to embed that into a makefile like this:
Code:
define strip-last-comma
         $(PL) -e 'use strict;\
                   open( INFILE, $ARGV[0] ) || die "Cannot read $ARGV[0]: $!\n";\
                   my @file = <INFILE>;\
                   close INFILE;\
                   my $i = @file-1;\
                   while( $file[$i] !~ m/,/ ){ $i--; }\
                   $file[$i] =~ s/,//;\
                   open( OUTFILE, ">$ARGV[0]" ) || die "Cannot write $ARGV[0]: $!\n";\
                   print OUTFILE @file;\
                   close OUTFILE;' $@
endef
 
Old 01-09-2008, 02:21 PM   #5
jgombos
Member
 
Registered: Jul 2003
Posts: 256

Original Poster
Rep: Reputation: 32
Quote:
Originally Posted by gnashley View Post
If the files are all short like that, you might just flip the text of the entire file with 'rev' and strip out the last comma before flipping it back.
The rev command apparently reverses the sequence of characters on each line, but it doesn't change the order of lines. If it had flipped the order of lines, then it might have been useful to couple it with a sed script.
 
Old 01-09-2008, 08:43 PM   #6
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
please ignore if I misunderstood your requirement
Code:
# sed 's|, *\/\/| \/\/|g' file
enum MyEnum
{
   element1 //some comment
   element2 //some comment
   element3 //some comment
};
 
Old 01-09-2008, 11:10 PM   #7
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,154

Rep: Reputation: 333Reputation: 333Reputation: 333Reputation: 333
Here's an awk program (since you asked):
Code:
$ cat test.awk
# If the line contains an open bracket . . .
/{/ {   prior=$0; # Save the line
        getline;  # Read the next line
        test = ((match($0,"(^.*)(//.*$)",list) != 0) ? list[1] : $0); # Strip any comment
        while (match(test, /} *;/) == 0) { # loop until a closing bracket is found
                print prior;
                prior=$0;
                getline;
                test = ((match($0,"(^.*)(//.*$)",list) != 0) ? list[1] : $0);
        }
# Process the line preceding the closing bracket
        if (match(prior,"(^.*)(,)( *//.*$)",list) != 0 ) {
                print list[1] " " list[3]; # If here, it contained a comma preceding the comment
        }
        else {if (match(prior,"(^.*)(,)( *$)", list) != 0) {
                print list[1]; # If here, we have a comma but no comment
                }
                else {
                        print prior; # If here, no comma to remove
                }
        }
}
# No open bracket? Just copy it to stdout
/^[^{]*$/ {print;}
<edit>
Note: The code does not handle multiple brackets nor lines with both open and closing brackets.
</edit>

Last edited by PTrenholme; 01-09-2008 at 11:16 PM.
 
Old 01-10-2008, 04:41 PM   #8
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 70
Quote:
Originally Posted by jgombos View Post
The rev command apparently reverses the sequence of characters on each line, but it doesn't change the order of lines. If it had flipped the order of lines, then it might have been useful to couple it with a sed script.
You could always use tac instead. But beware when using either tac or rev—these are GNU utilities that may not be present on Solaris or *BSD machines.

Btw, here is a sed one-liner which replaces the final comma with a space (it makes some assumptions about the formatting: there will be a series of lines each containing one comma per line and the closing brace and semi-colon are on the line immediately following the line containing the final comma). So this will obviously have trouble if multi-line comments are dispersed through the enum.
Code:
sed -e '/,/{x;/^$/d};/};/{H;x;s/,/ /}' file
If you want to deal with a file containing multiple enums, just add a small substitution:
Code:
sed -e '/,/{x;/^$/d};/};/{H;s/.*//;x;s/,/ /}' file
The above filter will work with the following example file:
Code:
enum MyEnum
{
   element1, //some comment
   element2, //some comment
   element3, //some comment
};

enum YourEnum
{
   Y1, //some comment
   Y2, //some comment
   Y3, //some comment
   Y4, //some comment
};
 
Old 01-11-2008, 02:04 PM   #9
jgombos
Member
 
Registered: Jul 2003
Posts: 256

Original Poster
Rep: Reputation: 32
Thanks for the alternate methods folks. It's great to have a few solutions to choose from. All this may turn out to be in vain, however.. I've discovered getting my original shell script ported into the Makefile is more complex than I had estimated (possible, but also tedious). Although I'll probably at least incorporate one of these scripts within the bash script, and run it manually.

Osor, I'm glad you came up with a sed method. I've run into a similar problem before (needing sed to take an action on the last occurrance of an expression in a stream), and the sed mailing list said there's no practical way of knowing when something appears last in a file (or last of a part of a file). I'll have to get familiar with what the holding pattern will do for me.

That "N" command is not as useful as one would expect. Sed could really use a variant of the N command that will execute on every line (effectively processing every line twice), as opposed to just the even lines from the first line in the address range.

Last edited by jgombos; 01-11-2008 at 02:08 PM.
 
Old 01-11-2008, 09:04 PM   #10
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 70
Quote:
Originally Posted by jgombos View Post
Osor, I'm glad you came up with a sed method. I've run into a similar problem before (needing sed to take an action on the last occurrance of an expression in a stream), and the sed mailing list said there's no practical way of knowing when something appears last in a file (or last of a part of a file). I'll have to get familiar with what the holding pattern will do for me.
There’s no way of knowing where something appears in a file relative to the last line. This is because sed is a stream editor and your file could itself be stdin. So sed will process line by line, and if it reaches an EOF, it will know that the last line was entered (but it cannot arrive at the second-to-last line and predict that the next line will be the last). The hold space comes in handy, but can also be abused.

For example, here is how you would deal with the original problem in an ed script:
Code:
echo -e '$-1s/,/ /\n,p' | ed - file
If you want to use this sort of functionality in sed, you have to “cheat” by abusing the hold buffer. Effectively, you can read the entire file into one “line” and operate on it after that.

For example, in a normal use of sed:
Code:
sed -e 'commands' file
The file is split up into lines, and on each line commands is executed. So for a file that looks like this:
Code:
line1
line2
line3
…
lineN-1
lineN
the normal use of sed looks kind of like this:
Code:
execute commands on "line1"
execute commands on "line2"
execute commands on "line3"
…
execute commands on "lineN"
If you want to “cheat” when using sed you do something like this:
Code:
sed -e '1h;1!H;$!d;${s/.*//;x};commands' file
Now, all lines are reduced to one and your commands are executed on that line. It looks kind of like this:
Code:
execute commands on "line1\nline2\nline3\n…\nlineN"
So to get rid of the last comma in the whole file you can do:
Code:
sed -e '1h;1!H;$!d;${s/.*//;x};s/\(.*\),/\1 /' file
The reason I call this “cheating” is that the purpose of the hold buffer is to buffer (not to hold the entire file). POSIX specifies that the minimum size of the hold buffer is 8192 bytes, so a portable script should assume only so much. GNU sed happens to have a dynamically-sized hold buffer, so it will accommodate any such cheating, but other implementations might not be so forthcoming. The hold buffer is supposed to hold small amounts of text (two or three lines’ worth at most).

So you could also do this (a proper/portable use of the hold buffer):
Code:
sed -e '$!x;1d;${H;x};commands'
Which looks like this:
Code:
execute commands on "line1"
execute commands on "line2"
execute commands on "line3"
…
execute commands on "lineN-2"
execute commands on "lineN-1\nlineN"
The only difference is that if you use absolute addresses in commands, you need an offset of +1 (but the last line is always “$”).

So you might use the above as a generic way to solve the similar problems you’ve run into before. For example, an alternate answer to your question is:
Code:
sed -e '$!x;1d;${H;x};$s/,/ /'

Last edited by osor; 01-11-2008 at 09:29 PM.
 
Old 01-12-2008, 02:39 PM   #11
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 719

Rep: Reputation: 72
Hi.

The flexibility of perl allows us to avoid some of the complexity that osar discussed.

In particular, this sample script essentially scans from the end of the file rather than scanning every line from the beginning. In a small file, this is not much of an improvement, but as the file gets larger, the improvement may be useful: regular expression matching can be expensive.

The trade-off is that the entire file is held in an array. This is not a source of complexity in the script however, because perl manages its own memory. Naturally, for really large files, the load on the system may become more noticeable.

Still, with all of that, the script is fairly small:
Code:
#!/usr/bin/perl

# @(#) p1       Demonstrate operation on last matched line.

use warnings;
use strict;

my($debug);
$debug = 0;
$debug = 1;
my ($p) = `basename $0`;

my ($pattern) = shift || die " $p: need a pattern.\n";
print " (debug, pattern is :$pattern:)\n" if $debug;

my (@l) = <>;
my ($lines) = scalar @l;
my ($i);
my ($hit) = 0;
print " (debug, read $lines lines.)\n" if $debug;

for ($i = $lines-1 ; $i >= 0 ; $i-- ) {
  if ( $l[$i] =~ /$pattern/ ) {
        $l[$i] =~ s/$pattern//;
        my($t1) = ++$i;
        print " (debug, matched and changed at line $t1)\n" if $debug;
        $hit = 1;
        last;
  }
}

print @l;

warn " (Warning - no hits found.)\n" if not $hit;

exit(0);
Producing (using the original data):
Code:
% ./p1 "," data1
 (debug, pattern is :,:)
 (debug, read 6 lines.)
 (debug, matched and changed at line 5)
enum MyEnum
{
   element1, //some comment
   element2, //some comment
   element3 //some comment
};
The operation need not be a simple deletion, it can be anything on any line in the file that can be expressed in perl code.

Best wishes ... cheers, makyo
 
Old 01-12-2008, 04:56 PM   #12
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
With GNU Awk:

Code:
awk 'NR==FNR{if($0~/,/)x=FNR;next}FNR==x{$0=gensub(/,([^,]*)$/," \\1",1)}1' data data
or (for small files, with array):

Code:
awk '{ x[NR] = $0 } /,/ { y = FNR 
} END {
	for(i=1; i<=NR; i++)
		print (i == y ? x[i] = gensub(/,([^,]*)$/, " \\1", 1, x[i]) : x[i]) 
}' data

Last edited by radoulov; 01-12-2008 at 05:15 PM.
 
Old 01-12-2008, 05:24 PM   #13
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 70
Quote:
Originally Posted by makyo View Post
The flexibility of perl allows us to avoid some of the complexity that osar discussed.
Or you could keep some of the complexity… that is, flexibility + complexity = sweet one-liner

For example,
Code:
perl -ne 'if(eof){$t.=$_;$t=~s/,/ /}print $t; $t=$_' file
As you can tell, the entire file is not read into an array all at once (to save memory), but at the same time, you forgo applying the regular expression to each and every line (you save it for the last line only).
 
Old 01-13-2008, 10:23 AM   #14
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 719

Rep: Reputation: 72
Oops, overlooked post

Hi.

I was doing some timings to see if there were any remarkable differences in the methods posted. As I read the posts, I see that I had over-looked rod's post #3 in perl. It is written in more idiomatic perl compared to the one I posted. I used the same technique as did he -- read entire file, search backwards from the end to find the last match, etc. Mine allowed the pattern (a small convenience, easily added), but also checked for a missing pattern, where rod's enters an unterminated loop; his re-writes the file, a good design for the Makefile inclusion. Both scripts are general in that they do not assume that the line to be changed is next-to-last.

Apologies to rod for omitting his contribution -- it was unintentional.

It also came to me later that osor's one-liner essentially uses "$t" as the equivalent of the sed hold buffer, but it is an improvement to omit the scan for every line (the first print of $t might be undefined, but it seemed to cause no trouble) ... cheers, makyo
 
Old 01-13-2008, 07:25 PM   #15
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 70
Quote:
Originally Posted by makyo View Post
I was doing some timings to see if there were any remarkable differences in the methods posted.
So what are the results? Personally, I don’t like the slurp-then-edit technique which is common among many perl hackers. Slurping is necessary for some problems, but in others, it seems forced. However, I think slurp-then-edit may (counterintuitively) be faster (especially on today’s machines) than line-by-line editing.

If you want to resort to slurp-then-edit, what’s wrong with something like this?
Code:
perl -w0pe 's/(.*),/$1 /s' file
You could modify the above to remove the first comma on the last comma-containing line (so if commas existed in the final comment, they would be ignored). Unfortunately, there is no such thing as a zero-width, variable-length lookbehind, so we have to use a very big backreference (though this might be optimized out). Alternative to a big backreference, you could use a lookahead:
Code:
perl -w0pe 's/,(?=[^,]*$)/ /s' file
You might also test this one (which is more along the lines of yours):
Code:
perl -we '@t=reverse<>;s/,/ /&&last for@t;print reverse @t' file
Or this one:
Code:
perl -w0pe '$_=reverse;s/,/ /;$_=reverse' file
Please post any results you find.

Last edited by osor; 01-14-2008 at 12:24 PM. Reason: Added more examples
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to remove first 2 lines of a file in a script nazs Programming 16 02-19-2007 08:08 AM
need to replace spaces with comma (ascii file) ferradura Linux - Software 3 02-02-2007 09:39 PM
remove text from file with script paul_mat Linux - Software 3 11-17-2005 01:21 PM
Sed(?); Appending a comma-delineated file ice_hockey Linux - General 2 05-27-2005 09:42 AM
comma delimited file cdragon Programming 5 06-21-2002 08:55 PM


All times are GMT -5. The time now is 07:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration