How to use sed to remove carriage returns except #
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
How to use sed to remove carriage returns except #
I have a gpx file which I wish to convert to a csv file. After adding a comma to the end of every line, I want to use sed to remove the carriage return from all lines except those ending with a character such as #
I have a gpx file which I wish to convert to a csv file. After adding a comma to the end of every line, I want to use sed to remove the carriage return from all lines except those ending with a character such as #
If you add a comma to the end of every line, none will end with "#" or any other character. I would also be very surprised if a carriage return was used. Ahhhh - dammit. I always forget Windoze screws everything up. Is this a Win formatted file ending in crlf ?.
So define your data and your requirements properly, and maybe someone can help. Show some input samples (in translated hex if needed to show the location of the c/r), and desired output.
Last edited by syg00; 03-12-2019 at 01:29 AM.
Reason: crlf
Thank you for the quick replies.
I have a file with 1400 lines. I have added a comma to the end of each line. Around 140 lines end with a </wpt>,
To make a csv file I need to remove the carriage return from all the lines except those ending with </wpt>,
I have looked at a number of posts using M N n etc, but can't make sense of them, that's why I asking for help.
If the ending of the record is variable, then that complicates things a little. sed is best when things are very consistent. Otherwise you're better off with a perl script.
One way would be to put a label (:) at the beginning of the script and then, if the end of record marker is not found, to append the next line to the pattern space (N) and branch to that label. If the end of record marker is present, then substitute (s///) the newline characters with commas and then print.
Stop saying "carriage return" - that is a totally different action (and code-point) to "newline".
Whenever you need to go to this extent with sed, then you are using the wrong tool IMHO. I long ago eschewed perl for awk for this sort of thing - use the "</wpt>," as a record separator and simply remove all the newlines.
Removing carriage returns has no effect using the data as presented.
The N command along with an end-of-record test. The awk way might be easier, but if you stay with a sed script, make a loop around the N command and stay in that loop until you hit the string you are using as the end-of-record marker. Then substitute the newlines with commas and let the result pass to stdout.
It's hard to coach through the process, that scripting language is a little terse, and it is very tempting to just post a read answer, or an alternative in awk or perl. But everyone is better off with increased learning, which is one of the goals of this site.
sed, like almost all *nix tools is stream oriented. You get the stream until an end of record (usually a null or newline) is reached. You do not get the eor character. Hence you need to take action to ensure it's put back in if you really need it - the "N" command in this case. Then you take them out at the end. How arcane.
You need to do similar using any tool, but sed just make it as opaque as possible. It belongs in another time, but is massively useful non-the-less.
But everyone is better off with increased learning, which is one of the goals of this site.
OK, thank you. My current task is a conservation project, which I was hoping to progress. May be it would be quicker to use a text editor and hit the delete key 1400 times. But then, learning by example is another option?
The :a is a target to which the script can jump to.
The /#/!{...} means do the following clause if the pattern is not found.
N reads in the next file line from the data stream and appends it to the pattern space.
ba jumps (branches) to :a. That is the loop.
If the /#/ pattern is not found the script drops through to s///
AWK requires a bit of a trick to use a designated Output Field Separator instead of just passing the data through unmodified. It's, in my opinion non-obvious, so I'll just show it.
The $1=$1 copies the first field to itself again, forcing a reformatting of the output using the designated Output Field Separator (OFS). The 1 is a second clause, but being 1 it is always true. The default action in the case of a true operation is to print. It would be the equivalent to 1{print $0}
AWK is just a bunch of if-then statements, written in shorthand.
Last edited by Turbocapitalist; 03-12-2019 at 05:51 AM.
Reason: file->line
It should be noted that both those (excellent) examples presume that the commas already added by the OP weren't. Given the current state of the file, this might suffice.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.