[SOLVED] awk: remove the last character in the file

cristalp · 11-02-2011, 05:03 AM

Dear Experts,

I have a file with multiple lines inside which looks like:

Code:

aaaaaaaaaaaaaaaa,
bbbbbbbbbbbbbbbb,
cccccccccccccccc,
dddddddddddddddd,
eeeeeeeeeeeeeeee,

I want to remove the last comma in the last line. The output file should look like:

Code:

aaaaaaaaaaaaaaaa,
bbbbbbbbbbbbbbbb,
cccccccccccccccc,
dddddddddddddddd,
eeeeeeeeeeeeeeee

I tried

Code:

awk '{gsub(/,$/,"");print}' FILENAME

and

Code:

sed 's#[\]$##' FILENAME

Both of these code remove all the comma in the file, which is not what I pursued. So, How could I just remove the last comma simply by awk?

Thanks a lot!

crts · 11-02-2011, 05:24 AM

Hi,

if the last line is not a blank line, i.e. the line which you want the comma removed then you could try something like:

Code:

sed '$ s/,$//' file

Nominal Animal · 11-02-2011, 08:32 AM

This GNU awk snippet keeps newlines intact, and removes the final comma even if there are empty lines following it:

Code:

gawk 'BEGIN { RS=",[\t\n\v\f\r ]*[\n\r]+" } { printf("%s%s", nl, $0) ; nl=RT } END { sub(/^\,/, "", nl); printf("%s", nl) }'

The idea is to use a record (line) separator consisting of a comma, optional whitespace, and one or more newlines. Using the automatic variable RT provided by GNU awk, we retain the record separators; we only output it just before the next record. When all records have been output, the comma (if any) is stripped from the final record separator, and the final separator is output.

The end result is that the file stays exactly the same, except when there is a final comma followed by (optional whitespace) and at least one newline; then the comma is stripped away.

Note that if there is no newline after the final comma, i.e. the comma is the last character in the file (except for optional spaces and tabs), it is not stripped. If you suspect you may have such files, better use a slightly more complicated variant that handles that case too:

Code:

gawk 'BEGIN { RS=",[\t\n\v\f\r ]*[\n\r]+" } { printf("%s%s", ln, nl); ln = $0; nl = RT } END { if (length(nl) > 0) printf("%s%s", ln, gensub(/^,/, "", "g", nl)); else printf("%s", gensub(/,([\t\v\f ]*)$/, "\\1", "g", ln)) }'

Reuti · 11-03-2011, 08:26 AM

Will you feed this to any other application and need the final LF? Otherwise the head command might work too:

Code:

$ head -c -2 FILENAME

But it will remove the comma plus the final LF.

cristalp · 11-03-2011, 09:47 AM

Quote:

Originally Posted by Reuti

Will you feed this to any other application and need the final LF? Otherwise the head command might work too:

Code:

$ head -c -2 FILENAME

But it will remove the comma plus the final LF.

This is really smart!! Thanks!!!

cristalp · 11-03-2011, 10:19 AM

Quote:

Originally Posted by Nominal Animal

This GNU awk snippet keeps newlines intact, and removes the final comma even if there are empty lines following it:

Code:

gawk 'BEGIN { RS=",[\t\n\v\f\r ]*[\n\r]+" } { printf("%s%s", nl, $0) ; nl=RT } END { sub(/^\,/, "", nl); printf("%s", nl) }'

The idea is to use a record (line) separator consisting of a comma, optional whitespace, and one or more newlines. Using the automatic variable RT provided by GNU awk, we retain the record separators; we only output it just before the next record. When all records have been output, the comma (if any) is stripped from the final record separator, and the final separator is output.

The end result is that the file stays exactly the same, except when there is a final comma followed by (optional whitespace) and at least one newline; then the comma is stripped away.

Note that if there is no newline after the final comma, i.e. the comma is the last character in the file (except for optional spaces and tabs), it is not stripped. If you suspect you may have such files, better use a slightly more complicated variant that handles that case too:

Code:

gawk 'BEGIN { RS=",[\t\n\v\f\r ]*[\n\r]+" } { printf("%s%s", ln, nl); ln = $0; nl = RT } END { if (length(nl) > 0) printf("%s%s", ln, gensub(/^,/, "", "g", nl)); else printf("%s", gensub(/,([\t\v\f ]*)$/, "\\1", "g", ln)) }'

I got the point, thanks for the detailed explanation!!! I do not even know the variable RT before. It seems a really powerful application.