LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   sed with wildcard matching (https://www.linuxquestions.org/questions/programming-9/sed-with-wildcard-matching-4175719069/)

__John_L. 11-23-2022 12:38 PM

sed with wildcard matching
 
I have a DOS file with several record sequences as follows:

...
Online
1234 Main St
Anytown
...

I'm trying to edit the file with the following sed command;

sed -z 's/Online\r\n\(.*\)\r\n\(.*\)/Online\r\n\1 \2/g' test.txt

but there is no change in the stdout stream, much to my surprise.

Thanks in advance for any insight you can provide.

Turbocapitalist 11-23-2022 12:50 PM

Run it through the dos2unix utility first. Or use tr or Perl.

Code:

dos2unix -n input.file.txt output.file.txt

tr -d '\r' < input.file.txt > output.file.txt

perl -p -e 's/\r\n/\n/g' input.file.txt > output.file.txt

But since you mention records, this might be a task for Perl with the -a and -l options or just plain AWK. Please describe in a bit more detail what you aim to do.

boughtonp 11-23-2022 02:20 PM


 
What is output of "sed --version" and "uname" commands?

Change ".*" to "[^\r\n]*", i.e:
Code:

sed -rz 's/(Online)\r?\n([^\r\n]*)\r?\n([^\r\n]*)/\1\n\2 \3/'

boughtonp 11-23-2022 02:26 PM

[ignore - lq duplicated post]

__John_L. 11-23-2022 02:34 PM

I'd like to remove the new-line in the 2nd of 3 records in a series of records, "Online" being the contents of the 1st record.

I'd prefer to understand sed before moving to perl or awk.

__John_L. 11-23-2022 02:38 PM

sed v4.9
uname v9.1

boughtonp 11-23-2022 03:20 PM

Quote:

Originally Posted by __John_L. (Post 6393964)
I'd like to remove the new-line in the 2nd of 3 records in a series of records, "Online" being the contents of the 1st record.

That's not what Turbocapitalist was asking for - you've just repeated what the command is doing, not what you're trying to achieve.

Quote:

I'd prefer to understand sed before moving to perl or awk.
If you want to learn Sed that's fine, but you shouldn't consider it a requirement or prerequisite to learn it (either first or at all).

Depending on the actual structure of the data, Awk might make working with it easier, because of how it is oriented around record and fields.

Perl has a far more powerful regex engine than Sed (or Awk), and has more flexibility if you need to move beyond simple substitutions.


Quote:

Originally Posted by __John_L. (Post 6393966)
uname v9.1

I didn't say "uname --version"; I'm asking for confirmation of the OS you're using.


MadeInGermany 11-25-2022 12:42 AM

Your capture groups are wrong, you must capture what you want to keep.
And .* is greedy; in -z mode it might span over the entire file till the last line.
Concatening lines can be done without -z
Code:

sed '/Online/{N;N;s/\r\n/ /g;}' test.txt
The N comand appends the next line to the input buffer, and the newline in between becomes a \n

boughtonp 11-25-2022 09:48 AM

Quote:

Originally Posted by MadeInGermany (Post 6394246)
And .* is greedy; in -z mode it might span over the entire file till the last line.

That's an important point I neglected to make: the ".*" will consume as much as possible (i.e. the entire rest of the file) before gradually backtracking (as little as possible) in order for the pattern to find a match.

In a multi-record file this will result in incorrect behaviour, because \1 will be most of the rest of the file and \2 may well be empty (if there's a trailing \r\n) otherwise it'd be the last line of the last record.

That's the reason for using "[^\r\n]*" instead of ".*" - in almost all cases when people write "." they really want "[^delimiter]" (some will use ".*?" which can work but is less efficient).



All times are GMT -5. The time now is 06:09 PM.