LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-23-2022, 01:38 PM   #1
__John_L.
LQ Newbie
 
Registered: Feb 2021
Posts: 14

Rep: Reputation: Disabled
sed with wildcard matching


I have a DOS file with several record sequences as follows:

...
Online
1234 Main St
Anytown
...

I'm trying to edit the file with the following sed command;

sed -z 's/Online\r\n\(.*\)\r\n\(.*\)/Online\r\n\1 \2/g' test.txt

but there is no change in the stdout stream, much to my surprise.

Thanks in advance for any insight you can provide.
 
Old 11-23-2022, 01:50 PM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 6,540
Blog Entries: 3

Rep: Reputation: 3410Reputation: 3410Reputation: 3410Reputation: 3410Reputation: 3410Reputation: 3410Reputation: 3410Reputation: 3410Reputation: 3410Reputation: 3410Reputation: 3410
Run it through the dos2unix utility first. Or use tr or Perl.

Code:
dos2unix -n input.file.txt output.file.txt

tr -d '\r' < input.file.txt > output.file.txt

perl -p -e 's/\r\n/\n/g' input.file.txt > output.file.txt
But since you mention records, this might be a task for Perl with the -a and -l options or just plain AWK. Please describe in a bit more detail what you aim to do.
 
Old 11-23-2022, 03:20 PM   #3
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 2,983

Rep: Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120

What is output of "sed --version" and "uname" commands?

Change ".*" to "[^\r\n]*", i.e:
Code:
sed -rz 's/(Online)\r?\n([^\r\n]*)\r?\n([^\r\n]*)/\1\n\2 \3/'
 
Old 11-23-2022, 03:26 PM   #4
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 2,983

Rep: Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120
[ignore - lq duplicated post]

Last edited by boughtonp; 11-23-2022 at 03:27 PM.
 
Old 11-23-2022, 03:34 PM   #5
__John_L.
LQ Newbie
 
Registered: Feb 2021
Posts: 14

Original Poster
Rep: Reputation: Disabled
I'd like to remove the new-line in the 2nd of 3 records in a series of records, "Online" being the contents of the 1st record.

I'd prefer to understand sed before moving to perl or awk.
 
Old 11-23-2022, 03:38 PM   #6
__John_L.
LQ Newbie
 
Registered: Feb 2021
Posts: 14

Original Poster
Rep: Reputation: Disabled
sed v4.9
uname v9.1
 
Old 11-23-2022, 04:20 PM   #7
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 2,983

Rep: Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120
Quote:
Originally Posted by __John_L. View Post
I'd like to remove the new-line in the 2nd of 3 records in a series of records, "Online" being the contents of the 1st record.
That's not what Turbocapitalist was asking for - you've just repeated what the command is doing, not what you're trying to achieve.

Quote:
I'd prefer to understand sed before moving to perl or awk.
If you want to learn Sed that's fine, but you shouldn't consider it a requirement or prerequisite to learn it (either first or at all).

Depending on the actual structure of the data, Awk might make working with it easier, because of how it is oriented around record and fields.

Perl has a far more powerful regex engine than Sed (or Awk), and has more flexibility if you need to move beyond simple substitutions.


Quote:
Originally Posted by __John_L. View Post
uname v9.1
I didn't say "uname --version"; I'm asking for confirmation of the OS you're using.


Last edited by boughtonp; 11-23-2022 at 04:21 PM.
 
Old 11-25-2022, 01:42 AM   #8
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,139

Rep: Reputation: 953Reputation: 953Reputation: 953Reputation: 953Reputation: 953Reputation: 953Reputation: 953Reputation: 953
Your capture groups are wrong, you must capture what you want to keep.
And .* is greedy; in -z mode it might span over the entire file till the last line.
Concatening lines can be done without -z
Code:
sed '/Online/{N;N;s/\r\n/ /g;}' test.txt
The N comand appends the next line to the input buffer, and the newline in between becomes a \n
 
2 members found this post helpful.
Old 11-25-2022, 10:48 AM   #9
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 2,983

Rep: Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120Reputation: 2120
Quote:
Originally Posted by MadeInGermany View Post
And .* is greedy; in -z mode it might span over the entire file till the last line.
That's an important point I neglected to make: the ".*" will consume as much as possible (i.e. the entire rest of the file) before gradually backtracking (as little as possible) in order for the pattern to find a match.

In a multi-record file this will result in incorrect behaviour, because \1 will be most of the rest of the file and \2 may well be empty (if there's a trailing \r\n) otherwise it'd be the last line of the last record.

That's the reason for using "[^\r\n]*" instead of ".*" - in almost all cases when people write "." they really want "[^delimiter]" (some will use ".*?" which can work but is less efficient).

 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Find/grep command to find matching files, print filename, then print matching content stefanlasiewski Programming 9 06-30-2016 06:30 PM
sed and regexp matching (GNU sed version 4.2.1) Ashkhan Programming 8 02-27-2012 10:12 AM
[SOLVED] Matching two tables of non-matching sizes astroumut Programming 3 03-03-2011 08:05 AM
copy wildcard files from directories to matching ones in another dir akufoo Linux - Newbie 1 02-12-2011 12:56 AM
Perl Script needed to be reversed to output matching, not non-matching 0bfuscated Programming 2 07-20-2010 11:51 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:22 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration