sed with wildcard matching
I have a DOS file with several record sequences as follows:
... Online 1234 Main St Anytown ... I'm trying to edit the file with the following sed command; sed -z 's/Online\r\n\(.*\)\r\n\(.*\)/Online\r\n\1 \2/g' test.txt but there is no change in the stdout stream, much to my surprise. Thanks in advance for any insight you can provide. |
Run it through the dos2unix utility first. Or use tr or Perl.
Code:
dos2unix -n input.file.txt output.file.txt |
What is output of "sed --version" and "uname" commands?
Change ".*" to "[^\r\n]*", i.e: Code:
sed -rz 's/(Online)\r?\n([^\r\n]*)\r?\n([^\r\n]*)/\1\n\2 \3/' |
[ignore - lq duplicated post]
|
I'd like to remove the new-line in the 2nd of 3 records in a series of records, "Online" being the contents of the 1st record.
I'd prefer to understand sed before moving to perl or awk. |
sed v4.9
uname v9.1 |
Quote:
Quote:
Depending on the actual structure of the data, Awk might make working with it easier, because of how it is oriented around record and fields. Perl has a far more powerful regex engine than Sed (or Awk), and has more flexibility if you need to move beyond simple substitutions. Quote:
|
Your capture groups are wrong, you must capture what you want to keep.
And .* is greedy; in -z mode it might span over the entire file till the last line. Concatening lines can be done without -z Code:
sed '/Online/{N;N;s/\r\n/ /g;}' test.txt |
Quote:
In a multi-record file this will result in incorrect behaviour, because \1 will be most of the rest of the file and \2 may well be empty (if there's a trailing \r\n) otherwise it'd be the last line of the last record. That's the reason for using "[^\r\n]*" instead of ".*" - in almost all cases when people write "." they really want "[^delimiter]" (some will use ".*?" which can work but is less efficient). |
All times are GMT -5. The time now is 06:09 PM. |