LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 05-27-2020, 06:01 AM   #1
littlebigman
Member
 
Registered: Aug 2008
Location: France
Posts: 658

Rep: Reputation: 35
Question [SOLVED] ssed removes \r?\n


Hello,

I'm using super-sed 3.62, and noticed that ssed seems to remove (\r)\n from the regex before proceeding :

Code:
#NOK
ssed -R "s@^.+<time>.+</time>\r?\n@@g" < test.gpx > test.TIME.gpx

#OK
ssed -R "s@^.+<time>.+</time>@@g" < test.gpx > test.TIME.gpx
Is there a work-around?

Thank you.

Last edited by littlebigman; 05-27-2020 at 07:29 AM.
 
Old 05-27-2020, 06:13 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Stream-based tools (almost) never see the newline. Try your first test without it.
 
Old 05-27-2020, 06:29 AM   #3
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
ssed is obsoleted by GNU sed anyway. It was a fork of GNU sed to test some experimental features that long since found its way back into the GNU sed. What distro are you using? I suppose, something Debian-based? Or is it FreeBSD? It seems nobody else packages ssed anymore. Unless it's a very old release like Debian 8 (Jessie) there's no point in using ssed.

It seems like you're trying to parse an XML/SGML document with sed. Please don't. sed is a line-oriented tool, HTML/XML/SGML is in no way line-oriented.

If \n is at the end of line your sed expression won't work, but sed can handle \n embedded into the string by other means, e.g. via its commands N and G.

Last edited by shruggy; 05-27-2020 at 08:47 AM.
 
Old 05-27-2020, 06:29 AM   #4
littlebigman
Member
 
Registered: Aug 2008
Location: France
Posts: 658

Original Poster
Rep: Reputation: 35
Debian.

A regex is fine when parsing a single line, like it is here.

But I'd like to also remove the \r?\n. Without it, I end up with a blank line.

Last edited by littlebigman; 05-27-2020 at 06:31 AM.
 
Old 05-27-2020, 06:35 AM   #5
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,599

Rep: Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546

Regular sed allows changing the line delimiter from \n to \0 (null character), using the -z option, but that was added in 2012 and the last release of super-sed appears to be 2005 so I guess it doesn't support that.

You could try comparing the codebases to see how much effort it is to update super-sed with new features.

Or, for this specific example that doesn't appear to be doing anything special, just use regular sed.

 
Old 05-27-2020, 06:41 AM   #6
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
I still don't understand what are you trying to do. Could you provide a sample of the input data?

Flag M can be specified for the s command in GNU sed. Is it what you're looking for?
 
Old 05-27-2020, 06:53 AM   #7
littlebigman
Member
 
Registered: Aug 2008
Location: France
Posts: 658

Original Poster
Rep: Reputation: 35
I used ssed because it supports non-greediness, but I'll be happy to use GNU sed (4.4) if it does what I need.

I'm simply tring to remove all the lines that contain time in GPX files, including the carriage return:

Code:
<trk>
	<trkseg>
		<trkpt lon="2.325975" lat="48.821938">
			<ele>71.75</ele>
			<time>2020-05-13T15:31:04Z</time>
		</trkpt>

Last edited by littlebigman; 05-27-2020 at 06:57 AM.
 
Old 05-27-2020, 06:55 AM   #8
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,849

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
Quote:
Originally Posted by littlebigman View Post
Debian.

A regex is fine when parsing a single line, like it is here.

But I'd like to also remove the \r?\n. Without it, I end up with a blank line.
Again, sed does not see \n, because that is the delimiter. With other words: input is splitted into lines and sed commands work on these lines.
If you want to make sed work differently you need to check the option -z (see man sed).
 
Old 05-27-2020, 06:58 AM   #9
littlebigman
Member
 
Registered: Aug 2008
Location: France
Posts: 658

Original Poster
Rep: Reputation: 35
Because you replied before my own reply.

The magic of non-face to face conversations :-)

Code:
NOK sed -E "s@^.+<time>.+</time>\r?\n@@g" < test.gpx > test.TIME.gpx

sed -E "s@^.+<time>.+</time>\r$@@g" < test.gpx > test.TIME.gpx
sed: -e expression #1, char 24: unterminated `s' command

NOK sed -E "s@^.+<time>.+</time>\n@@g" < test.gpx > test.TIME.gpx

CTRL+V CTRL+M
NOK sed -E "s@^.+<time>.+</time>^M@@g" < test.gpx > test.TIME.gpx
NOK sed -e "s@^.+<time>.+</time>^M@@g" < test.gpx > test.TIME.gpx

sed -i "s@^.+<time>.+</time>$@@g" < test.gpx > test.TIME.gpx
sed: -e expression #1, char 22: unterminated `s' command

NOK sed -z "s@^.+<time>.+</time>\r?\n@@g" < test.gpx > test.TIME.gpx

NOK sed -z "s@^.+<time>.+</time>\n@@g" < test.gpx > test.TIME.gpx
NOK sed -z 's@^.+<time>.+</time>\n@@g' < test.gpx > test.TIME.gpx

Last edited by littlebigman; 05-27-2020 at 07:08 AM.
 
Old 05-27-2020, 07:20 AM   #10
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
Code:
xml2 <test.gpx|grep -v ^/gpx/trk/trkseg/trkpt/time=|2xml

Last edited by shruggy; 05-27-2020 at 07:22 AM.
 
Old 05-27-2020, 07:29 AM   #11
littlebigman
Member
 
Registered: Aug 2008
Location: France
Posts: 658

Original Poster
Rep: Reputation: 35
Thank you. I'll try xml2/2xml if I can't get sed to work.
 
Old 05-27-2020, 08:30 AM   #12
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
xmlstarlet is also an option. Actually, it was my first choice, but it's picky about namespaces. E.g the sample GPX document from Wikipedia includes xmlns attributes, so you'll need to run it like this:
Code:
xmlstarlet ed -d //_:time test.gpx
OTOH, the example from OSM Wiki doesn't, so:
Code:
xmlstarlet ed -d //time test.gpx
Since I know nothing about your gpx data, I opted for the dumber xml2 .

Last edited by shruggy; 05-27-2020 at 09:06 AM.
 
Old 05-27-2020, 09:39 AM   #13
littlebigman
Member
 
Registered: Aug 2008
Location: France
Posts: 658

Original Poster
Rep: Reputation: 35
Thanks.

The first examples I found about sed on the Net didn't mention that it removed carriage returns before handling the regex.

This will remove all empty lines, which is fine in my particular case:
Code:
sed -r "s@^.+<time>.+</time>@@g" < test.CRLF.gpx > test.CRLF.TIME.gpx
sed -r "/^$/d" test.CRLF.TIME.gpx > test.CRLF.TIME.stripped.gpx
 
Old 05-27-2020, 09:44 AM   #14
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
This can be done in one sed command:
Code:
sed -E 's@^.+<time>.+</time>@@g;/^$/d'
I still don't understand why you cannot just remove the lines in question:
Code:
sed '\@<time>.\+</time>@d'

Last edited by shruggy; 05-27-2020 at 10:04 AM.
 
Old 05-27-2020, 09:47 AM   #15
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,599

Rep: Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546
Quote:
Originally Posted by littlebigman View Post
I used ssed because it supports non-greediness...
Fair enough, but I'd rather go with Perl itself in that situation. (You're not using non-greediness here, and you don't need to.)

For basic substitutions it can be a simple case of switching from "sed 'EXPR'" to "perl -pe 'EXPR'".

For example:
Code:
perl -pe 's@^.+<time>.+</time>\r?\n@@g' < test.gpx > test.TIME.gpx
(That appears to remove newlines despite being in sed-like mode.)

Of course, that should be written as:
Code:
perl -pe 's@^[\t ]*<time>[^<]+</time>\r?\n@@g' < test.gpx > test.TIME.gpx
Because otherwise the .+ is likely to match to the end of the input before backtracking, whereas using accurate character classes will match only the characters needed, (which is both more efficient and makes intent clearer).


As various people have said, parsing markup with regex can be difficult and is not the best option - but there's a difference between handling wild HTML and simply removing lines from consistent well-formed XML - so this specific example is possibly ok - but if you're not 100% certain that's what you have and will always have, use a markup-aware tool instead.


Last edited by boughtonp; 05-27-2020 at 09:49 AM.
 
  


Reply

Tags
gpx, xml



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
*nix program that removes/crack Windows EFS? nizbit Linux - Software 0 01-23-2005 05:01 PM
how dpkg removes old files bobwall Debian 2 09-30-2004 11:19 AM
iptables port forwarding removes net access? Avatar Linux - Networking 2 05-21-2004 12:56 PM
nano removes ifconfig ? shania Linux - Newbie 4 10-30-2003 03:22 PM
starting to get real p$%ssed off Jason P Linux - Software 10 01-13-2003 09:51 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 07:43 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration