How to use sed to remove carriage returns except #

Johng · 03-12-2019, 12:35 AM

I have a gpx file which I wish to convert to a csv file. After adding a comma to the end of every line, I want to use sed to remove the carriage return from all lines except those ending with a character such as #

syg00 · 03-12-2019, 12:51 AM

Quote:

Originally Posted by Johng

I have a gpx file which I wish to convert to a csv file. After adding a comma to the end of every line, I want to use sed to remove the carriage return from all lines except those ending with a character such as #

If you add a comma to the end of every line, none will end with "#" or any other character.
I would also be very surprised if a carriage return was used. Ahhhh - dammit. I always forget Windoze screws everything up. Is this a Win formatted file ending in crlf ?.

So define your data and your requirements properly, and maybe someone can help. Show some input samples (in translated hex if needed to show the location of the c/r), and desired output.

Turbocapitalist · 03-12-2019, 01:19 AM

It's doable, assuming that the lines themselves do not already have commas anywhere within them.

What have you tried so far and where are you stuck? I expect that you're making use of the N command.

Johng · 03-12-2019, 02:19 AM

Thank you for the quick replies.
I have a file with 1400 lines. I have added a comma to the end of each line. Around 140 lines end with a </wpt>,
To make a csv file I need to remove the carriage return from all the lines except those ending with </wpt>,

I have looked at a number of posts using M N n etc, but can't make sense of them, that's why I asking for help.

Turbocapitalist · 03-12-2019, 02:25 AM

Can you show a few sanitized lines of input?

If the ending of the record is variable, then that complicates things a little. sed is best when things are very consistent. Otherwise you're better off with a perl script.

One way would be to put a label (:) at the beginning of the script and then, if the end of record marker is not found, to append the next line to the pattern space (N) and branch to that label. If the end of record marker is present, then substitute (s///) the newline characters with commas and then print.

Johng · 03-12-2019, 03:06 AM

This is first 30 lines of the file:

Quote:

<wpt lon="174.99636841" lat="-41.14546967">,
<ele>22.4763</ele>,
<time>2017-11-16T06:24:46Z</time>,
<name>S1</name>,
<cmt>17-OCT-17 9:56:01</cmt>,
<sym>Golf Course</sym>,
<extensions>,
<ql:key>15108134862163</ql:key>,
</extensions>,
</wpt>,
<wpt lon="174.99630737" lat="-41.14559937">,
<ele>21.515</ele>,
<time>2017-11-16T06:24:46Z</time>,
<name>S2</name>,
<cmt>17-OCT-17 9:56:44</cmt>,
<sym>Golf Course</sym>,
<extensions>,
<ql:key>15108134862174</ql:key>,
</extensions>,
</wpt>,
<wpt lon="174.99623108" lat="-41.14572906">,
<ele>21.0344</ele>,
<time>2017-11-16T06:24:46Z</time>,
<name>S3</name>,
<cmt>17-OCT-17 9:57:11</cmt>,
<sym>Golf Course</sym>,
<extensions>,
<ql:key>15108134862185</ql:key>,
</extensions>,
</wpt>,

The idea is to remove all the carriage returns from the above to produce three lines ending with " </wpt>, "

Turbocapitalist · 03-12-2019, 03:12 AM

Thanks. The method described in #5 above works on that data. Just say s/\n/ /g instead of s/\n/,/g

syg00 · 03-12-2019, 03:42 AM

Stop saying "carriage return" - that is a totally different action (and code-point) to "newline".

Whenever you need to go to this extent with sed, then you are using the wrong tool IMHO. I long ago eschewed perl for awk for this sort of thing - use the "</wpt>," as a record separator and simply remove all the newlines.
Removing carriage returns has no effect using the data as presented.

Johng · 03-12-2019, 03:54 AM

OK, it's newline.

I tried: sed -i.bak 's/\n/ /g' GPXfile.txt
The new file was the same as the original. Obviously I missed something!

Turbocapitalist · 03-12-2019, 04:01 AM

Quote:

Originally Posted by Johng

Obviously I missed something!

The N command along with an end-of-record test. The awk way might be easier, but if you stay with a sed script, make a loop around the N command and stay in that loop until you hit the string you are using as the end-of-record marker. Then substitute the newlines with commas and let the result pass to stdout.

It's hard to coach through the process, that scripting language is a little terse, and it is very tempting to just post a read answer, or an alternative in awk or perl. But everyone is better off with increased learning, which is one of the goals of this site.

syg00 · 03-12-2019, 04:14 AM

sed, like almost all *nix tools is stream oriented. You get the stream until an end of record (usually a null or newline) is reached. You do not get the eor character. Hence you need to take action to ensure it's put back in if you really need it - the "N" command in this case. Then you take them out at the end. How arcane.
You need to do similar using any tool, but sed just make it as opaque as possible. It belongs in another time, but is massively useful non-the-less.

Johng · 03-12-2019, 04:37 AM

Quote:

But everyone is better off with increased learning, which is one of the goals of this site.

OK, thank you. My current task is a conservation project, which I was hoping to progress. May be it would be quicker to use a text editor and hit the delete key 1400 times. But then, learning by example is another option?

Turbocapitalist · 03-12-2019, 05:19 AM

In both examples I have # in place of the end-of-record separator which is actually <\/wpt>

Code:

sed -e ':a;
        /#/!{
                N;
                ba;
        }; 
        s/\n/,/g;' bigfile.data > bigfile.csv

The :a is a target to which the script can jump to.
The /#/!{...} means do the following clause if the pattern is not found.
N reads in the next file line from the data stream and appends it to the pattern space.
ba jumps (branches) to :a. That is the loop.
If the /#/ pattern is not found the script drops through to s///

AWK requires a bit of a trick to use a designated Output Field Separator instead of just passing the data through unmodified. It's, in my opinion non-obvious, so I'll just show it.

Code:

awk '{$1=$1}1' OFS=',' RS="#\n" ORS="#\n" bigfile.data > bigfile.csv

The $1=$1 copies the first field to itself again, forcing a reformatting of the output using the designated Output Field Separator (OFS). The 1 is a second clause, but being 1 it is always true. The default action in the case of a true operation is to print. It would be the equivalent to 1{print $0}

AWK is just a bunch of if-then statements, written in shorthand.

syg00 · 03-12-2019, 05:28 AM

It should be noted that both those (excellent) examples presume that the commas already added by the OP weren't. Given the current state of the file, this might suffice.

Code:

awk '{gsub("\n", "") ; print $0 RS}' RS='</wpt>,' bigfile.data > bigfile.csv

Linux is all about choice ...

Johng · 03-12-2019, 05:49 AM

Thank you Turbocapitalist and syg00.
That was perfect. I will be able to proceed my project, and study the example to advance my knowledge.
Cheers