There is more to sed(1) than s///
Tags sed, system administration
There is more to sed(1) than s///
There are books written about sed(1), a stream editor which can filter and transform streams of text, because it can do a lot more than many suspect. It actually uses a very simple and compact language which, though limited, is concise. I used sed(1) for ages and ages without exploring more of what it can do. It's most often used only with the s/// substitution, but it also has other capabilities.
It wasn't that I couldn't explore sed(1), it was more that I did not know what it could do so as to focus my time and instead turned mostly to familiar perlre(1).
There are several variations on sed(1), the -i option is available in GNU sed(1) but not others. See Sed - An Introduction and Tutorial by Bruce Barnett, as well as Sed One-Liners and Sed One-Liners Explained. The mailing list for sed(1) is still at Yahoo.
As sed(1) is so simple, it's not uncommon to encounter nearly identical solutions that have been developed independently.
A good way, I think, to use sed(1) is to remember that certain things are "easy" and then search for a recipe or cheat sheet with the solution. And then use that as a starting point for learning. However, as one scales up, awk(1) is more powerful and Perl is more powerful still. Both are maybe easier to read when doing complex tasks. All three are also available on all systems by default, along with grep(1).
Anyway, here are some random examples of what sed(1) is capable of.
Print Only Matching Lines
By default, sed(1) prints out everything in the pattern space as it passes through. Using the -n option suppresses that so that a p is needed to force printing when things are found or manipulated.
The first example prints only lines containing the found pattern. The second example prints only lines when a substitution has taken place.
Use Slashes in Patterns without Needing Escapes
The following uses hashes for delimiters rather than slashes. This is because there are slashes in the string to be substituted. With the hashes, the slashes do not need to be escaped and can be written plainly. Anyway, it inserts a command into the beginning of each public key in the current directory, each such key consisting of only a single line.
So the basic expression is s### there rather than s/// but any matching characters can be used. Another common one is s||| which allows both slashes and hashes in the pattern and replacement.
Swap Strings
sed(1) can swap the order of strings and change delimiters. Here it swaps what's on either side of a comma delimiter and replaces it with a colon. So with this data:
You can do the manipulation in sed(1) like this:
The -n and p ensure that lines that do not match the search pattern are not printed, thus functioning a little like grep(1).
Conditional Operations
The utility rsync(1) can use the --dry-run option to check for changes but not actually do a transfer. The option --itemize-changes prints out a change summary. But between the two, even directories are printed, not just the files in them. In order to list just the files, a conditional is needed before doing the pattern substitution. The -a is short for --archive, which is the equivalent of -rlptgoD
Files that are in the directory ./X/ but not ./Y/ will be listed, along with files that are in both but not the same. The /^>f/{ } means to search for lines starting with a greater than sign followed by an f. Then on those lines, do what is in the curly braces, which is attempt a subsitution and then unconditionally print out the line.
Using the Hold Space
This is from sed One-Liners. It will double-space a text file, removing any extra blank lines by using the hold space.
Loops
sed(1) loops over the input by design. Additional loops can be carried out over subsets of the incoming data.
This example is from Unix Sed Tutorial: 6 Examples for Sed Branching Operation.
It searches for a pattern to do a substitution on. If it does, then it loops and tries again on the same line, repeating until the pattern is no longer found. The basic expression doing the work is s///; t; where t is a conditional branch to a previous label.
An Afterthought on sed(1) and XML or HTML Parsing
It's tempting to try XML or HTML parsing in sed(1). "I'll just ..." Nope. There are too many variations on how elements and attributes turn up in actual files. So don't try automated processing of XML or XHTML using sed(1), there are just too many edge cases to chase down. It's a solved problem in perl(1) though. So perl(1) already has that covered in the CPAN modules XML::TreeBuilder, XML::Parser, XML::TokeParser, and their HTML equivalents. There is no point in trying to re-invent the wheel unless you are going to reinvent a better one, and in that case sed(1) does not have enough power to deal with XML or XHTML. XML or XHTML parsing is a job for perl(1) and not sed(1).
When choosing between grep(1), sed(1), awk(1), and perl(1), it is a matter of knowing the tools at least a little. Again, some things are easy for each one. So choose the right tool for the job. That said, sed(1) can easily do an awful lot more than just s///
Ok. nuff sed on sed(1)
There are books written about sed(1), a stream editor which can filter and transform streams of text, because it can do a lot more than many suspect. It actually uses a very simple and compact language which, though limited, is concise. I used sed(1) for ages and ages without exploring more of what it can do. It's most often used only with the s/// substitution, but it also has other capabilities.
- selecting or deleting lines
- alternate pattern delimiters
- selecting or deleting lines
- conditional operation
- swapping strings
- pattern space and hold space
- loops
It wasn't that I couldn't explore sed(1), it was more that I did not know what it could do so as to focus my time and instead turned mostly to familiar perlre(1).
There are several variations on sed(1), the -i option is available in GNU sed(1) but not others. See Sed - An Introduction and Tutorial by Bruce Barnett, as well as Sed One-Liners and Sed One-Liners Explained. The mailing list for sed(1) is still at Yahoo.
As sed(1) is so simple, it's not uncommon to encounter nearly identical solutions that have been developed independently.
A good way, I think, to use sed(1) is to remember that certain things are "easy" and then search for a recipe or cheat sheet with the solution. And then use that as a starting point for learning. However, as one scales up, awk(1) is more powerful and Perl is more powerful still. Both are maybe easier to read when doing complex tasks. All three are also available on all systems by default, along with grep(1).
Anyway, here are some random examples of what sed(1) is capable of.
Print Only Matching Lines
By default, sed(1) prints out everything in the pattern space as it passes through. Using the -n option suppresses that so that a p is needed to force printing when things are found or manipulated.
Code:
sed -n '/pattern/p' < input.txt > output.txt sed -n 's/pattern/replacement/p' < input.txt > output.txt
Use Slashes in Patterns without Needing Escapes
The following uses hashes for delimiters rather than slashes. This is because there are slashes in the string to be substituted. With the hashes, the slashes do not need to be escaped and can be written plainly. Anyway, it inserts a command into the beginning of each public key in the current directory, each such key consisting of only a single line.
Code:
sed -i 's#^#command="/usr/bin/uptime -p" #' *.pub
Swap Strings
sed(1) can swap the order of strings and change delimiters. Here it swaps what's on either side of a comma delimiter and replaces it with a colon. So with this data:
Code:
# From this format: oAZkAse75vY3DUyTwXhW*K$wqoWNvN,7173fecf6b9199ac5ed5ec0d1b66ac9474ec5fb794fc55 3ZIdz2V538b@L36c@Ces%MWKvkyXri,961bddc4a121ce4f251dc3a3ece660f351a88d777ec43d # To this format: 7173fecf6b9199ac5ed5ec0d1b66ac9474ec5fb794fc55:oAZkAse75vY3DUyTwXhW*K$wqoWNvN 961bddc4a121ce4f251dc3a3ece660f351a88d777ec43d:3ZIdz2V538b@L36c@Ces%MWKvkyXri
Code:
sed -n "s/\(.*\),\([a-f0-9]*\)/\2:\1/p" input.txt > output.txt
Conditional Operations
The utility rsync(1) can use the --dry-run option to check for changes but not actually do a transfer. The option --itemize-changes prints out a change summary. But between the two, even directories are printed, not just the files in them. In order to list just the files, a conditional is needed before doing the pattern substitution. The -a is short for --archive, which is the equivalent of -rlptgoD
Code:
rsync --dry-run --itemize-changes -a ./X/ ./Y/ \ | sed -ne '/^>f/{ s/^[^ ]* //;p }'
Using the Hold Space
This is from sed One-Liners. It will double-space a text file, removing any extra blank lines by using the hold space.
Code:
sed '/^$/d;G' somefile.txt
Loops
sed(1) loops over the input by design. Additional loops can be carried out over subsets of the incoming data.
This example is from Unix Sed Tutorial: 6 Examples for Sed Branching Operation.
Code:
sed ':loop; s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/; t loop'
An Afterthought on sed(1) and XML or HTML Parsing
It's tempting to try XML or HTML parsing in sed(1). "I'll just ..." Nope. There are too many variations on how elements and attributes turn up in actual files. So don't try automated processing of XML or XHTML using sed(1), there are just too many edge cases to chase down. It's a solved problem in perl(1) though. So perl(1) already has that covered in the CPAN modules XML::TreeBuilder, XML::Parser, XML::TokeParser, and their HTML equivalents. There is no point in trying to re-invent the wheel unless you are going to reinvent a better one, and in that case sed(1) does not have enough power to deal with XML or XHTML. XML or XHTML parsing is a job for perl(1) and not sed(1).
When choosing between grep(1), sed(1), awk(1), and perl(1), it is a matter of knowing the tools at least a little. Again, some things are easy for each one. So choose the right tool for the job. That said, sed(1) can easily do an awful lot more than just s///
Ok. nuff sed on sed(1)
Total Comments 5
Comments
-
One other (overlooked) thing about sed, and I frequently see head piped to sed is that sed can operate on lines by line number as well. So, one can do things like:
Code:sed -i '1,10s#^#command="/usr/bin/uptime -p" #' *.pub
One other point, something people fail to remember (or even read at all), what makes sed less than ideal for X/HTML as you mention is that it only has an idea of a single line, and in these files a block can span multiple.Posted 10-17-2016 at 01:58 AM by goumba
Updated 10-17-2016 at 02:00 AM by goumba -
Quote:
I have the feeling that maybe line numbers and thus operating on a file by line number was a lot more relevant in the paper teletype days. One couldn't just scroll back and forth interactively.
About the line numbers, I was thinking about how sed(1) can do ranges as well.
sed '3,5s/foo/bar/' will do replacements on lines 3 to 5, inclusive.
The criteria can also be patterns.
sed '/c/,/e/s/foo/bar/' will do replacements on lines between one that has a c to on that has an e, inclusive. Though if the first pattern is not found, it will never start trying the replacements. And if the second pattern is not found, it will keep replacing on through to the end of the file.
What's a good use-case for line numbers?Posted 10-17-2016 at 04:45 AM by Turbocapitalist -
I see now I said 'line number' where I really should have said 'range'. At the time I could not think of the correct term. You caught on to my idea, however.
And for fun, you can combine ranges and patterns - replace foo with bar on lines 3 to 5 inclusive, only if they contain baz:
sed '3,5{/baz/s/foo/bar/}'Posted 10-20-2016 at 08:47 AM by goumba -
syg00 posted a good example of conditional replacement spanning multiple lines. The idea is to remove any line breaks between a string at the start of the line and the next occurring semicolon, whether it is on the same line or one of the next. It also leaves other lines untouched.
Code:sed '/^Test/ {:a;/;/!{N;$!ba};s/\\n/ /g;n};' test.file
So with the following input:
Code:Test_Macro(abc, def, "\n string1 string2 \n test string", "test string2 \n"); printf("Welcome to C world..\n"); // Some code or text Test_Macro(asdsadas, abc.str(), "test String1\n"); Another_Macro("Done with this file...\n\n"); // Some code...
Code:Test_Macro(abc, def, " string1 string2 test string", "test string2 "); printf("Welcome to C world..\n"); // Some code or text Test_Macro(asdsadas, abc.str(), "test String1"); Another_Macro("Done with this file...\n\n"); // Some code...
And grail came up with a method using the record separator and output record separator instead:
Code:awk '/Test/{gsub(/\\n/,"")}1' RS=";" ORS=";" file
Posted 10-28-2016 at 01:52 AM by Turbocapitalist
Updated 10-28-2016 at 02:18 AM by Turbocapitalist -
As a followup, bradvan had asked about using sed to delete a block if it contains specific text.
The following deletes all blocks of text that begin with 'aa' and end with 'cc' if they also happend to contain 'bb' somewhere in between:
Code:Begin='aa'; End='cc'; sed -e "/^$Begin/,/^$End/{H;\$!d;}; x;/bb/d;\${p;x;}" somefile.txt;
Posted 07-16-2018 at 01:56 PM by Turbocapitalist