LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   awk: remove the last character in the file (http://www.linuxquestions.org/questions/programming-9/awk-remove-the-last-character-in-the-file-911443/)

cristalp 11-02-2011 06:03 AM

awk: remove the last character in the file
 
Dear Experts,

I have a file with multiple lines inside which looks like:
Code:

aaaaaaaaaaaaaaaa,
bbbbbbbbbbbbbbbb,
cccccccccccccccc,
dddddddddddddddd,
eeeeeeeeeeeeeeee,

I want to remove the last comma in the last line. The output file should look like:
Code:

aaaaaaaaaaaaaaaa,
bbbbbbbbbbbbbbbb,
cccccccccccccccc,
dddddddddddddddd,
eeeeeeeeeeeeeeee

I tried
Code:

awk '{gsub(/,$/,"");print}' FILENAME
and
Code:

sed 's#[\]$##' FILENAME
Both of these code remove all the comma in the file, which is not what I pursued. So, How could I just remove the last comma simply by awk?

Thanks a lot!

crts 11-02-2011 06:24 AM

Hi,

if the last line is not a blank line, i.e. the line which you want the comma removed then you could try something like:
Code:

sed '$ s/,$//' file

Nominal Animal 11-02-2011 09:32 AM

This GNU awk snippet keeps newlines intact, and removes the final comma even if there are empty lines following it:
Code:

gawk 'BEGIN { RS=",[\t\n\v\f\r ]*[\n\r]+" } { printf("%s%s", nl, $0) ; nl=RT } END { sub(/^\,/, "", nl); printf("%s", nl) }'
The idea is to use a record (line) separator consisting of a comma, optional whitespace, and one or more newlines. Using the automatic variable RT provided by GNU awk, we retain the record separators; we only output it just before the next record. When all records have been output, the comma (if any) is stripped from the final record separator, and the final separator is output.

The end result is that the file stays exactly the same, except when there is a final comma followed by (optional whitespace) and at least one newline; then the comma is stripped away.

Note that if there is no newline after the final comma, i.e. the comma is the last character in the file (except for optional spaces and tabs), it is not stripped. If you suspect you may have such files, better use a slightly more complicated variant that handles that case too:
Code:

gawk 'BEGIN { RS=",[\t\n\v\f\r ]*[\n\r]+" } { printf("%s%s", ln, nl); ln = $0; nl = RT } END { if (length(nl) > 0) printf("%s%s", ln, gensub(/^,/, "", "g", nl)); else printf("%s", gensub(/,([\t\v\f ]*)$/, "\\1", "g", ln)) }'

Reuti 11-03-2011 09:26 AM

Will you feed this to any other application and need the final LF? Otherwise the head command might work too:
Code:

$ head -c -2 FILENAME
But it will remove the comma plus the final LF.

cristalp 11-03-2011 10:47 AM

Quote:

Originally Posted by Reuti (Post 4514845)
Will you feed this to any other application and need the final LF? Otherwise the head command might work too:
Code:

$ head -c -2 FILENAME
But it will remove the comma plus the final LF.

This is really smart!! Thanks!!!

cristalp 11-03-2011 11:19 AM

Quote:

Originally Posted by Nominal Animal (Post 4514098)
This GNU awk snippet keeps newlines intact, and removes the final comma even if there are empty lines following it:
Code:

gawk 'BEGIN { RS=",[\t\n\v\f\r ]*[\n\r]+" } { printf("%s%s", nl, $0) ; nl=RT } END { sub(/^\,/, "", nl); printf("%s", nl) }'
The idea is to use a record (line) separator consisting of a comma, optional whitespace, and one or more newlines. Using the automatic variable RT provided by GNU awk, we retain the record separators; we only output it just before the next record. When all records have been output, the comma (if any) is stripped from the final record separator, and the final separator is output.

The end result is that the file stays exactly the same, except when there is a final comma followed by (optional whitespace) and at least one newline; then the comma is stripped away.

Note that if there is no newline after the final comma, i.e. the comma is the last character in the file (except for optional spaces and tabs), it is not stripped. If you suspect you may have such files, better use a slightly more complicated variant that handles that case too:
Code:

gawk 'BEGIN { RS=",[\t\n\v\f\r ]*[\n\r]+" } { printf("%s%s", ln, nl); ln = $0; nl = RT } END { if (length(nl) > 0) printf("%s%s", ln, gensub(/^,/, "", "g", nl)); else printf("%s", gensub(/,([\t\v\f ]*)$/, "\\1", "g", ln)) }'

I got the point, thanks for the detailed explanation!!! I do not even know the variable RT before. It seems a really powerful application.


All times are GMT -5. The time now is 04:13 PM.