LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-12-2019, 12:35 AM   #1
Johng
Member
 
Registered: Feb 2002
Location: NZ
Distribution: Kubuntu, Mint
Posts: 408

Rep: Reputation: 31
How to use sed to remove carriage returns except #


I have a gpx file which I wish to convert to a csv file. After adding a comma to the end of every line, I want to use sed to remove the carriage return from all lines except those ending with a character such as #
 
Old 03-12-2019, 12:51 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Quote:
Originally Posted by Johng View Post
I have a gpx file which I wish to convert to a csv file. After adding a comma to the end of every line, I want to use sed to remove the carriage return from all lines except those ending with a character such as #
If you add a comma to the end of every line, none will end with "#" or any other character.
I would also be very surprised if a carriage return was used. Ahhhh - dammit. I always forget Windoze screws everything up. Is this a Win formatted file ending in crlf ?.

So define your data and your requirements properly, and maybe someone can help. Show some input samples (in translated hex if needed to show the location of the c/r), and desired output.

Last edited by syg00; 03-12-2019 at 01:29 AM. Reason: crlf
 
Old 03-12-2019, 01:19 AM   #3
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,308
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
It's doable, assuming that the lines themselves do not already have commas anywhere within them.

What have you tried so far and where are you stuck? I expect that you're making use of the N command.
 
Old 03-12-2019, 02:19 AM   #4
Johng
Member
 
Registered: Feb 2002
Location: NZ
Distribution: Kubuntu, Mint
Posts: 408

Original Poster
Rep: Reputation: 31
Thank you for the quick replies.
I have a file with 1400 lines. I have added a comma to the end of each line. Around 140 lines end with a </wpt>,
To make a csv file I need to remove the carriage return from all the lines except those ending with </wpt>,

I have looked at a number of posts using M N n etc, but can't make sense of them, that's why I asking for help.
 
Old 03-12-2019, 02:25 AM   #5
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,308
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Can you show a few sanitized lines of input?

If the ending of the record is variable, then that complicates things a little. sed is best when things are very consistent. Otherwise you're better off with a perl script.

One way would be to put a label (:) at the beginning of the script and then, if the end of record marker is not found, to append the next line to the pattern space (N) and branch to that label. If the end of record marker is present, then substitute (s///) the newline characters with commas and then print.
 
Old 03-12-2019, 03:06 AM   #6
Johng
Member
 
Registered: Feb 2002
Location: NZ
Distribution: Kubuntu, Mint
Posts: 408

Original Poster
Rep: Reputation: 31
This is first 30 lines of the file:
Quote:
<wpt lon="174.99636841" lat="-41.14546967">,
<ele>22.4763</ele>,
<time>2017-11-16T06:24:46Z</time>,
<name>S1</name>,
<cmt>17-OCT-17 9:56:01</cmt>,
<sym>Golf Course</sym>,
<extensions>,
<ql:key>15108134862163</ql:key>,
</extensions>,
</wpt>,
<wpt lon="174.99630737" lat="-41.14559937">,
<ele>21.515</ele>,
<time>2017-11-16T06:24:46Z</time>,
<name>S2</name>,
<cmt>17-OCT-17 9:56:44</cmt>,
<sym>Golf Course</sym>,
<extensions>,
<ql:key>15108134862174</ql:key>,
</extensions>,
</wpt>,
<wpt lon="174.99623108" lat="-41.14572906">,
<ele>21.0344</ele>,
<time>2017-11-16T06:24:46Z</time>,
<name>S3</name>,
<cmt>17-OCT-17 9:57:11</cmt>,
<sym>Golf Course</sym>,
<extensions>,
<ql:key>15108134862185</ql:key>,
</extensions>,
</wpt>,
The idea is to remove all the carriage returns from the above to produce three lines ending with " </wpt>, "
 
Old 03-12-2019, 03:12 AM   #7
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,308
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Thanks. The method described in #5 above works on that data. Just say s/\n/ /g instead of s/\n/,/g
 
Old 03-12-2019, 03:42 AM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Stop saying "carriage return" - that is a totally different action (and code-point) to "newline".

Whenever you need to go to this extent with sed, then you are using the wrong tool IMHO. I long ago eschewed perl for awk for this sort of thing - use the "</wpt>," as a record separator and simply remove all the newlines.
Removing carriage returns has no effect using the data as presented.
 
2 members found this post helpful.
Old 03-12-2019, 03:54 AM   #9
Johng
Member
 
Registered: Feb 2002
Location: NZ
Distribution: Kubuntu, Mint
Posts: 408

Original Poster
Rep: Reputation: 31
OK, it's newline.

I tried: sed -i.bak 's/\n/ /g' GPXfile.txt
The new file was the same as the original. Obviously I missed something!
 
Old 03-12-2019, 04:01 AM   #10
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,308
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Quote:
Originally Posted by Johng View Post
Obviously I missed something!
The N command along with an end-of-record test. The awk way might be easier, but if you stay with a sed script, make a loop around the N command and stay in that loop until you hit the string you are using as the end-of-record marker. Then substitute the newlines with commas and let the result pass to stdout.

It's hard to coach through the process, that scripting language is a little terse, and it is very tempting to just post a read answer, or an alternative in awk or perl. But everyone is better off with increased learning, which is one of the goals of this site.
 
1 members found this post helpful.
Old 03-12-2019, 04:14 AM   #11
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
sed, like almost all *nix tools is stream oriented. You get the stream until an end of record (usually a null or newline) is reached. You do not get the eor character. Hence you need to take action to ensure it's put back in if you really need it - the "N" command in this case. Then you take them out at the end. How arcane.
You need to do similar using any tool, but sed just make it as opaque as possible. It belongs in another time, but is massively useful non-the-less.
 
Old 03-12-2019, 04:37 AM   #12
Johng
Member
 
Registered: Feb 2002
Location: NZ
Distribution: Kubuntu, Mint
Posts: 408

Original Poster
Rep: Reputation: 31
Quote:
But everyone is better off with increased learning, which is one of the goals of this site.
OK, thank you. My current task is a conservation project, which I was hoping to progress. May be it would be quicker to use a text editor and hit the delete key 1400 times. But then, learning by example is another option?
 
Old 03-12-2019, 05:19 AM   #13
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,308
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
In both examples I have # in place of the end-of-record separator which is actually <\/wpt>

Code:
sed -e ':a;
        /#/!{
                N;
                ba;
        }; 
        s/\n/,/g;' bigfile.data > bigfile.csv
The :a is a target to which the script can jump to.
The /#/!{...} means do the following clause if the pattern is not found.
N reads in the next file line from the data stream and appends it to the pattern space.
ba jumps (branches) to :a. That is the loop.
If the /#/ pattern is not found the script drops through to s///

AWK requires a bit of a trick to use a designated Output Field Separator instead of just passing the data through unmodified. It's, in my opinion non-obvious, so I'll just show it.

Code:
awk '{$1=$1}1' OFS=',' RS="#\n" ORS="#\n" bigfile.data > bigfile.csv
The $1=$1 copies the first field to itself again, forcing a reformatting of the output using the designated Output Field Separator (OFS). The 1 is a second clause, but being 1 it is always true. The default action in the case of a true operation is to print. It would be the equivalent to 1{print $0}

AWK is just a bunch of if-then statements, written in shorthand.

Last edited by Turbocapitalist; 03-12-2019 at 05:51 AM. Reason: file->line
 
Old 03-12-2019, 05:28 AM   #14
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
It should be noted that both those (excellent) examples presume that the commas already added by the OP weren't. Given the current state of the file, this might suffice.
Code:
awk '{gsub("\n", "") ; print $0 RS}' RS='</wpt>,' bigfile.data > bigfile.csv
Linux is all about choice ...

Last edited by syg00; 03-12-2019 at 05:29 AM.
 
1 members found this post helpful.
Old 03-12-2019, 05:49 AM   #15
Johng
Member
 
Registered: Feb 2002
Location: NZ
Distribution: Kubuntu, Mint
Posts: 408

Original Poster
Rep: Reputation: 31
Thank you Turbocapitalist and syg00.
That was perfect. I will be able to proceed my project, and study the example to advance my knowledge.
Cheers
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Bash Script remove line breaks and carriage returns buee Linux - Newbie 4 06-07-2012 09:38 AM
How to remove carriage returns in a bunch of perl scripts? rebel Red Hat 4 04-12-2005 02:27 PM
XDrawString and carriage returns jpbarto Programming 1 03-25-2004 03:07 PM
Kword: How to remove carriage returns? RockyRed Linux - Newbie 1 07-21-2003 06:52 AM
Carriage Returns Trouble sancho5 Linux - General 2 08-29-2001 08:59 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:51 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration