sed with wildcard matching

__John_L. · 11-23-2022, 12:38 PM

I have a DOS file with several record sequences as follows:

...
Online
1234 Main St
Anytown
...

I'm trying to edit the file with the following sed command;

sed -z 's/Online\r\n\(.*\)\r\n\(.*\)/Online\r\n\1 \2/g' test.txt

but there is no change in the stdout stream, much to my surprise.

Thanks in advance for any insight you can provide.

Turbocapitalist · 11-23-2022, 12:50 PM

Run it through the dos2unix utility first. Or use tr or Perl.

Code:

dos2unix -n input.file.txt output.file.txt

tr -d '\r' < input.file.txt > output.file.txt

perl -p -e 's/\r\n/\n/g' input.file.txt > output.file.txt

But since you mention records, this might be a task for Perl with the -a and -l options or just plain AWK. Please describe in a bit more detail what you aim to do.

boughtonp · 11-23-2022, 02:20 PM

What is output of "sed --version" and "uname" commands?

Change ".*" to "[^\r\n]*", i.e:

Code:

sed -rz 's/(Online)\r?\n([^\r\n]*)\r?\n([^\r\n]*)/\1\n\2 \3/'

boughtonp · 11-23-2022, 02:26 PM

[ignore - lq duplicated post]

__John_L. · 11-23-2022, 02:34 PM

I'd like to remove the new-line in the 2nd of 3 records in a series of records, "Online" being the contents of the 1st record.

I'd prefer to understand sed before moving to perl or awk.

__John_L. · 11-23-2022, 02:38 PM

sed v4.9
uname v9.1

boughtonp · 11-23-2022, 03:20 PM

Quote:

Originally Posted by __John_L.

I'd like to remove the new-line in the 2nd of 3 records in a series of records, "Online" being the contents of the 1st record.

That's not what Turbocapitalist was asking for - you've just repeated what the command is doing, not what you're trying to achieve.

Quote:

I'd prefer to understand sed before moving to perl or awk.

If you want to learn Sed that's fine, but you shouldn't consider it a requirement or prerequisite to learn it (either first or at all).

Depending on the actual structure of the data, Awk might make working with it easier, because of how it is oriented around record and fields.

Perl has a far more powerful regex engine than Sed (or Awk), and has more flexibility if you need to move beyond simple substitutions.

Quote:

Originally Posted by __John_L.

uname v9.1

I didn't say "uname --version"; I'm asking for confirmation of the OS you're using.

MadeInGermany · 11-25-2022, 12:42 AM

Your capture groups are wrong, you must capture what you want to keep.
And .* is greedy; in -z mode it might span over the entire file till the last line.
Concatening lines can be done without -z

Code:

sed '/Online/{N;N;s/\r\n/ /g;}' test.txt

The N comand appends the next line to the input buffer, and the newline in between becomes a \n

boughtonp · 11-25-2022, 09:48 AM

Quote:

Originally Posted by MadeInGermany

And .* is greedy; in -z mode it might span over the entire file till the last line.

That's an important point I neglected to make: the ".*" will consume as much as possible (i.e. the entire rest of the file) before gradually backtracking (as little as possible) in order for the pattern to find a match.

In a multi-record file this will result in incorrect behaviour, because \1 will be most of the rest of the file and \2 may well be empty (if there's a trailing \r\n) otherwise it'd be the last line of the last record.

That's the reason for using "[^\r\n]*" instead of ".*" - in almost all cases when people write "." they really want "[^delimiter]" (some will use ".*?" which can work but is less efficient).