LinuxQuestions.org - Need a script to remove last comma in a file

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Need a script to remove last comma in a file (https://www.linuxquestions.org/questions/programming-9/need-a-script-to-remove-last-comma-in-a-file-612372/)

Need a script to remove last comma in a file

I've written a sed script that does almost everything I want. The job that remains is to remove the last comma of a file. A typical file resembles this after sed is done with it:

Code:

enum MyEnum

{

  element1, //some comment

  element2, //some comment

  element3, //some comment

};

The C++ compiler won't accept this because the last element has a comma. And there seems to be no way to make sed give special treatment to the last line. I could almost pull it off by using the sed "N" command to combine two lines, and do a replacement on "element3, //some comment\n};" for example, but it wouldn't work on enums that have an even number of elements.

Can awk handle this job? It needs to be scripted in a language that are common in the Make environment, because it will have to be embedded in a makefile.

Each enum gets its own file, btw.

If the files are all short like that, you might just flip the text of the entire file with 'rev' and strip out the last comma before flipping it back.

Here's my offering, in perl.

Code:

#! /usr/bin/perl -w

#

#  LQjgombos.pl  - delete last comma from file

#

#  Usage: LQjgombos.pl file.txt



use strict;



    open( INFILE, $ARGV[0] ) || die "Cannot read $ARGV[0]: $!\n";

    my @file = <INFILE>;

    close INFILE;

    my $i = @file-1;

    while( $file[$i] !~ m/,/ ){ $i--; }

    $file[$i] =~ s/,//;

    open( OUTFILE, ">$ARGV[0]" ) || die "Cannot write $ARGV[0]: $!\n";

    print OUTFILE @file;

    close OUTFILE;

Assumes the input file obeys the specified format.
--- rod.

Quote:

Originally Posted by theNbomr (Post 3016887)

Here's my offering, in perl.
...
Assumes the input file obeys the specified format.
--- rod.

Thanks for the help!
I'm figuring I'll be able to embed that into a makefile like this:

Code:

define strip-last-comma

        $(PL) -e 'use strict;\

                  open( INFILE, $ARGV[0] ) || die "Cannot read $ARGV[0]: $!\n";\

                  my @file = <INFILE>;\

                  close INFILE;\

                  my $i = @file-1;\

                  while( $file[$i] !~ m/,/ ){ $i--; }\

                  $file[$i] =~ s/,//;\

                  open( OUTFILE, ">$ARGV[0]" ) || die "Cannot write $ARGV[0]: $!\n";\

                  print OUTFILE @file;\

                  close OUTFILE;' $@

endef

Quote:

Originally Posted by gnashley (Post 3016886)

If the files are all short like that, you might just flip the text of the entire file with 'rev' and strip out the last comma before flipping it back.

The rev command apparently reverses the sequence of characters on each line, but it doesn't change the order of lines. If it had flipped the order of lines, then it might have been useful to couple it with a sed script.

please ignore if I misunderstood your requirement

Code:

# sed 's|, *\/\/| \/\/|g' file

enum MyEnum

{

  element1 //some comment

  element2 //some comment

  element3 //some comment

};

Here's an awk program (since you asked):

Code:

$ cat test.awk

# If the line contains an open bracket . . .

/{/ {  prior=$0; # Save the line

        getline;  # Read the next line

        test = ((match($0,"(^.*)(//.*$)",list) != 0) ? list[1] : $0); # Strip any comment

        while (match(test, /} *;/) == 0) { # loop until a closing bracket is found

                print prior;

                prior=$0;

                getline;

                test = ((match($0,"(^.*)(//.*$)",list) != 0) ? list[1] : $0);

        }

# Process the line preceding the closing bracket

        if (match(prior,"(^.*)(,)( *//.*$)",list) != 0 ) {

                print list[1] " " list[3]; # If here, it contained a comma preceding the comment

        }

        else {if (match(prior,"(^.*)(,)( *$)", list) != 0) {

                print list[1]; # If here, we have a comma but no comment

                }

                else {

                        print prior; # If here, no comma to remove

                }

        }

}

# No open bracket? Just copy it to stdout

/^[^{]*$/ {print;}

<edit>
Note: The code does not handle multiple brackets nor lines with both open and closing brackets.
</edit>

Quote:

Originally Posted by jgombos (Post 3016953)

You could always use tac instead. But beware when using either tac or rev—these are GNU utilities that may not be present on Solaris or *BSD machines.

Btw, here is a sed one-liner which replaces the final comma with a space (it makes some assumptions about the formatting: there will be a series of lines each containing one comma per line and the closing brace and semi-colon are on the line immediately following the line containing the final comma). So this will obviously have trouble if multi-line comments are dispersed through the enum.

Code:

sed -e '/,/{x;/^$/d};/};/{H;x;s/,/ /}' file

If you want to deal with a file containing multiple enums, just add a small substitution:

Code:

sed -e '/,/{x;/^$/d};/};/{H;s/.*//;x;s/,/ /}' file

The above filter will work with the following example file:

Code:

enum MyEnum

{

  element1, //some comment

  element2, //some comment

  element3, //some comment

};



enum YourEnum

{

  Y1, //some comment

  Y2, //some comment

  Y3, //some comment

  Y4, //some comment

};

Thanks for the alternate methods folks. It's great to have a few solutions to choose from. All this may turn out to be in vain, however.. I've discovered getting my original shell script ported into the Makefile is more complex than I had estimated (possible, but also tedious). Although I'll probably at least incorporate one of these scripts within the bash script, and run it manually.

Osor, I'm glad you came up with a sed method. I've run into a similar problem before (needing sed to take an action on the last occurrance of an expression in a stream), and the sed mailing list said there's no practical way of knowing when something appears last in a file (or last of a part of a file). I'll have to get familiar with what the holding pattern will do for me.

That "N" command is not as useful as one would expect. Sed could really use a variant of the N command that will execute on every line (effectively processing every line twice), as opposed to just the even lines from the first line in the address range.

Quote:

Originally Posted by jgombos (Post 3019259)

Osor, I'm glad you came up with a sed method. I've run into a similar problem before (needing sed to take an action on the last occurrance of an expression in a stream), and the sed mailing list said there's no practical way of knowing when something appears last in a file (or last of a part of a file). I'll have to get familiar with what the holding pattern will do for me.

There’s no way of knowing where something appears in a file relative to the last line. This is because sed is a stream editor and your file could itself be stdin. So sed will process line by line, and if it reaches an EOF, it will know that the last line was entered (but it cannot arrive at the second-to-last line and predict that the next line will be the last). The hold space comes in handy, but can also be abused.

For example, here is how you would deal with the original problem in an ed script:

Code:

echo -e '$-1s/,/ /\n,p' | ed - file

If you want to use this sort of functionality in sed, you have to “cheat” by abusing the hold buffer. Effectively, you can read the entire file into one “line” and operate on it after that.

For example, in a normal use of sed:

Code:

sed -e 'commands' file

The file is split up into lines, and on each line commands is executed. So for a file that looks like this:

Code:

line1

line2

line3

…

lineN-1

lineN

the normal use of sed looks kind of like this:

Code:

execute commands on "line1"

execute commands on "line2"

execute commands on "line3"

…

execute commands on "lineN"

If you want to “cheat” when using sed you do something like this:

Code:

sed -e '1h;1!H;$!d;${s/.*//;x};commands' file

Now, all lines are reduced to one and your commands are executed on that line. It looks kind of like this:

Code:

execute commands on "line1\nline2\nline3\n…\nlineN"

So to get rid of the last comma in the whole file you can do:

Code:

sed -e '1h;1!H;$!d;${s/.*//;x};s/$.*$,/\1 /' file

The reason I call this “cheating” is that the purpose of the hold buffer is to buffer (not to hold the entire file). POSIX specifies that the minimum size of the hold buffer is 8192 bytes, so a portable script should assume only so much. GNU sed happens to have a dynamically-sized hold buffer, so it will accommodate any such cheating, but other implementations might not be so forthcoming. The hold buffer is supposed to hold small amounts of text (two or three lines’ worth at most).

So you could also do this (a proper/portable use of the hold buffer):

Code:

sed -e '$!x;1d;${H;x};commands'

Which looks like this:

Code:

execute commands on "line1"

execute commands on "line2"

execute commands on "line3"

…

execute commands on "lineN-2"

execute commands on "lineN-1\nlineN"

The only difference is that if you use absolute addresses in commands, you need an offset of +1 (but the last line is always “$”).

So you might use the above as a generic way to solve the similar problems you’ve run into before. For example, an alternate answer to your question is:

Code:

sed -e '$!x;1d;${H;x};$s/,/ /'

Hi.

The flexibility of perl allows us to avoid some of the complexity that osar discussed.

In particular, this sample script essentially scans from the end of the file rather than scanning every line from the beginning. In a small file, this is not much of an improvement, but as the file gets larger, the improvement may be useful: regular expression matching can be expensive.

The trade-off is that the entire file is held in an array. This is not a source of complexity in the script however, because perl manages its own memory. Naturally, for really large files, the load on the system may become more noticeable.

Still, with all of that, the script is fairly small:

Code:

#!/usr/bin/perl



# @(#) p1      Demonstrate operation on last matched line.



use warnings;

use strict;



my($debug);

$debug = 0;

$debug = 1;

my ($p) = `basename $0`;



my ($pattern) = shift || die " $p: need a pattern.\n";

print " (debug, pattern is :$pattern:)\n" if $debug;



my (@l) = <>;

my ($lines) = scalar @l;

my ($i);

my ($hit) = 0;

print " (debug, read $lines lines.)\n" if $debug;



for ($i = $lines-1 ; $i >= 0 ; $i-- ) {

  if ( $l[$i] =~ /$pattern/ ) {

        $l[$i] =~ s/$pattern//;

        my($t1) = ++$i;

        print " (debug, matched and changed at line $t1)\n" if $debug;

        $hit = 1;

        last;

  }

}



print @l;



warn " (Warning - no hits found.)\n" if not $hit;



exit(0);

Producing (using the original data):

Code:

% ./p1 "," data1

 (debug, pattern is :,:)

 (debug, read 6 lines.)

 (debug, matched and changed at line 5)

enum MyEnum

{

  element1, //some comment

  element2, //some comment

  element3 //some comment

};

The operation need not be a simple deletion, it can be anything on any line in the file that can be expressed in perl code.

Best wishes ... cheers, makyo

With GNU Awk:

Code:

awk 'NR==FNR{if($0~/,/)x=FNR;next}FNR==x{$0=gensub(/,([^,]*)$/," \\1",1)}1' data data

or (for small files, with array):

Code:

awk '{ x[NR] = $0 } /,/ { y = FNR 

} END {

        for(i=1; i<=NR; i++)

                print (i == y ? x[i] = gensub(/,([^,]*)$/, " \\1", 1, x[i]) : x[i]) 

}' data

Quote:

Originally Posted by makyo (Post 3020279)

The flexibility of perl allows us to avoid some of the complexity that osar discussed.

Or you could keep some of the complexity… that is, flexibility + complexity = sweet one-liner :D

For example,

Code:

perl -ne 'if(eof){$t.=$_;$t=~s/,/ /}print $t; $t=$_' file

As you can tell, the entire file is not read into an array all at once (to save memory), but at the same time, you forgo applying the regular expression to each and every line (you save it for the last line only).

Oops, overlooked post

Hi.

I was doing some timings to see if there were any remarkable differences in the methods posted. As I read the posts, I see that I had over-looked rod's post #3 in perl. It is written in more idiomatic perl compared to the one I posted. I used the same technique as did he -- read entire file, search backwards from the end to find the last match, etc. Mine allowed the pattern (a small convenience, easily added), but also checked for a missing pattern, where rod's enters an unterminated loop; his re-writes the file, a good design for the Makefile inclusion. Both scripts are general in that they do not assume that the line to be changed is next-to-last.

Apologies to rod for omitting his contribution -- it was unintentional.

It also came to me later that osor's one-liner essentially uses "$t" as the equivalent of the sed hold buffer, but it is an improvement to omit the scan for every line (the first print of $t might be undefined, but it seemed to cause no trouble) ... cheers, makyo

Quote:

Originally Posted by makyo (Post 3020991)

I was doing some timings to see if there were any remarkable differences in the methods posted.

So what are the results? Personally, I don’t like the slurp-then-edit technique which is common among many perl hackers. Slurping is necessary for some problems, but in others, it seems forced. However, I think slurp-then-edit may (counterintuitively) be faster (especially on today’s machines) than line-by-line editing.

If you want to resort to slurp-then-edit, what’s wrong with something like this?

Code:

perl -w0pe 's/(.*),/$1 /s' file

You could modify the above to remove the first comma on the last comma-containing line (so if commas existed in the final comment, they would be ignored). Unfortunately, there is no such thing as a zero-width, variable-length lookbehind, so we have to use a very big backreference (though this might be optimized out). Alternative to a big backreference, you could use a lookahead:

Code:

perl -w0pe 's/,(?=[^,]*$)/ /s' file

You might also test this one (which is more along the lines of yours):

Code:

perl -we '@t=reverse<>;s/,/ /&&last for@t;print reverse @t' file

Or this one:

Code:

perl -w0pe '$_=reverse;s/,/ /;$_=reverse' file

Please post any results you find.