There is more to sed(1) than s///

Posted 10-16-2016 at 04:16 AM by Turbocapitalist
Updated 10-16-2016 at 08:57 AM by Turbocapitalist

There is more to sed(1) than s///

There are books written about sed(1), a stream editor which can filter and transform streams of text, because it can do a lot more than many suspect. It actually uses a very simple and compact language which, though limited, is concise. I used sed(1) for ages and ages without exploring more of what it can do. It's most often used only with the s/// substitution, but it also has other capabilities.

selecting or deleting lines
alternate pattern delimiters
selecting or deleting lines
conditional operation
swapping strings
pattern space and hold space
loops

It wasn't that I couldn't explore sed(1), it was more that I did not know what it could do so as to focus my time and instead turned mostly to familiar perlre(1).

There are several variations on sed(1), the -i option is available in GNU sed(1) but not others. See Sed - An Introduction and Tutorial by Bruce Barnett, as well as Sed One-Liners and Sed One-Liners Explained. The mailing list for sed(1) is still at Yahoo.

As sed(1) is so simple, it's not uncommon to encounter nearly identical solutions that have been developed independently.

A good way, I think, to use sed(1) is to remember that certain things are "easy" and then search for a recipe or cheat sheet with the solution. And then use that as a starting point for learning. However, as one scales up, awk(1) is more powerful and Perl is more powerful still. Both are maybe easier to read when doing complex tasks. All three are also available on all systems by default, along with grep(1).

Anyway, here are some random examples of what sed(1) is capable of.

Print Only Matching Lines

By default, sed(1) prints out everything in the pattern space as it passes through. Using the -n option suppresses that so that a p is needed to force printing when things are found or manipulated.

Code:

sed -n '/pattern/p' < input.txt > output.txt
sed -n 's/pattern/replacement/p' < input.txt > output.txt

The first example prints only lines containing the found pattern. The second example prints only lines when a substitution has taken place.

Use Slashes in Patterns without Needing Escapes

The following uses hashes for delimiters rather than slashes. This is because there are slashes in the string to be substituted. With the hashes, the slashes do not need to be escaped and can be written plainly. Anyway, it inserts a command into the beginning of each public key in the current directory, each such key consisting of only a single line.

Code:

sed -i 's#^#command="/usr/bin/uptime -p" #' *.pub

So the basic expression is s### there rather than s/// but any matching characters can be used. Another common one is s||| which allows both slashes and hashes in the pattern and replacement.

Swap Strings

sed(1) can swap the order of strings and change delimiters. Here it swaps what's on either side of a comma delimiter and replaces it with a colon. So with this data:

Code:

# From this format:
oAZkAse75vY3DUyTwXhW*K$wqoWNvN,7173fecf6b9199ac5ed5ec0d1b66ac9474ec5fb794fc55
3ZIdz2V538b@L36c@Ces%MWKvkyXri,961bddc4a121ce4f251dc3a3ece660f351a88d777ec43d

# To this format:
7173fecf6b9199ac5ed5ec0d1b66ac9474ec5fb794fc55:oAZkAse75vY3DUyTwXhW*K$wqoWNvN
961bddc4a121ce4f251dc3a3ece660f351a88d777ec43d:3ZIdz2V538b@L36c@Ces%MWKvkyXri

You can do the manipulation in sed(1) like this:

Code:

sed -n "s/\(.*\),\([a-f0-9]*\)/\2:\1/p" input.txt > output.txt

The -n and p ensure that lines that do not match the search pattern are not printed, thus functioning a little like grep(1).

Conditional Operations

The utility rsync(1) can use the --dry-run option to check for changes but not actually do a transfer. The option --itemize-changes prints out a change summary. But between the two, even directories are printed, not just the files in them. In order to list just the files, a conditional is needed before doing the pattern substitution. The -a is short for --archive, which is the equivalent of -rlptgoD

Code:

rsync --dry-run --itemize-changes -a ./X/ ./Y/ \
| sed -ne '/^>f/{ s/^[^ ]* //;p }'

Files that are in the directory ./X/ but not ./Y/ will be listed, along with files that are in both but not the same. The /^>f/{ } means to search for lines starting with a greater than sign followed by an f. Then on those lines, do what is in the curly braces, which is attempt a subsitution and then unconditionally print out the line.

Using the Hold Space

This is from sed One-Liners. It will double-space a text file, removing any extra blank lines by using the hold space.

Code:

sed '/^$/d;G' somefile.txt

Loops

sed(1) loops over the input by design. Additional loops can be carried out over subsets of the incoming data.

This example is from Unix Sed Tutorial: 6 Examples for Sed Branching Operation.

Code:

sed ':loop; s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/; t loop'

It searches for a pattern to do a substitution on. If it does, then it loops and tries again on the same line, repeating until the pattern is no longer found. The basic expression doing the work is s///; t; where t is a conditional branch to a previous label.

An Afterthought on sed(1) and XML or HTML Parsing

It's tempting to try XML or HTML parsing in sed(1). "I'll just ..." Nope. There are too many variations on how elements and attributes turn up in actual files. So don't try automated processing of XML or XHTML using sed(1), there are just too many edge cases to chase down. It's a solved problem in perl(1) though. So perl(1) already has that covered in the CPAN modules XML::TreeBuilder, XML::Parser, XML::TokeParser, and their HTML equivalents. There is no point in trying to re-invent the wheel unless you are going to reinvent a better one, and in that case sed(1) does not have enough power to deal with XML or XHTML. XML or XHTML parsing is a job for perl(1) and not sed(1).

When choosing between grep(1), sed(1), awk(1), and perl(1), it is a matter of knowing the tools at least a little. Again, some things are easy for each one. So choose the right tool for the job. That said, sed(1) can easily do an awful lot more than just s///

Ok. nuff sed on sed(1)

Posted in Uncategorized

Views 3659 Comments 5

« Prev Main Next »

Total Comments 5

Comments

One other (overlooked) thing about sed, and I frequently see head piped to sed is that sed can operate on lines by line number as well. So, one can do things like:

Code:

sed -i '1,10s#^#command="/usr/bin/uptime -p" #' *.pub

Which will only perform your example on the first ten lines of the file(s).

One other point, something people fail to remember (or even read at all), what makes sed less than ideal for X/HTML as you mention is that it only has an idea of a single line, and in these files a block can span multiple.

Posted 10-17-2016 at 01:58 AM by goumba goumba is offline

Updated 10-17-2016 at 02:00 AM by goumba

Quote:

Originally Posted by goumba

One other (overlooked) thing about sed, and I frequently see head piped to sed is that sed can operate on lines by line number as well.

You're right. I thought about it when planning the post but could not come up with good use-cases. When just looking for something, I just use grep(1). If I need a little context, I add -A or -B. If it's something I need to change, such as a configuration file or a script, then I fire up the appropriate editor, especially if it's because of an error message.

I have the feeling that maybe line numbers and thus operating on a file by line number was a lot more relevant in the paper teletype days. One couldn't just scroll back and forth interactively.

About the line numbers, I was thinking about how sed(1) can do ranges as well.

sed '3,5s/foo/bar/' will do replacements on lines 3 to 5, inclusive.

The criteria can also be patterns.

sed '/c/,/e/s/foo/bar/' will do replacements on lines between one that has a c to on that has an e, inclusive. Though if the first pattern is not found, it will never start trying the replacements. And if the second pattern is not found, it will keep replacing on through to the end of the file.

What's a good use-case for line numbers?

Posted 10-17-2016 at 04:45 AM by Turbocapitalist Turbocapitalist is offline

	I see now I said 'line number' where I really should have said 'range'. At the time I could not think of the correct term. You caught on to my idea, however. And for fun, you can combine ranges and patterns - replace foo with bar on lines 3 to 5 inclusive, only if they contain baz: sed '3,5{/baz/s/foo/bar/}'
	Posted 10-20-2016 at 08:47 AM by goumba

syg00 posted a good example of conditional replacement spanning multiple lines. The idea is to remove any line breaks between a string at the start of the line and the next occurring semicolon, whether it is on the same line or one of the next. It also leaves other lines untouched.

Code:

sed '/^Test/ {:a;/;/!{N;$!ba};s/\\n/ /g;n};' test.file

So on lines starting with 'Test', sed(1) loops until either a semicolon is found or the end of the file is encountered. During the loop each line is appended to the pattern space. Once the loop exits, new lines are removed from the whole pattern space.

So with the following input:

Code:

Test_Macro(abc, def, "\n string1 string2 \n test string",
            "test string2 \n");
printf("Welcome to C world..\n");
// Some code or text
     
Test_Macro(asdsadas, abc.str(), "test String1\n");
Another_Macro("Done with this file...\n\n");
// Some code...

The following output would be produced:

Code:

Test_Macro(abc, def, " string1 string2  test string",
            "test string2 ");
printf("Welcome to C world..\n");
// Some code or text
     
Test_Macro(asdsadas, abc.str(), "test String1");
Another_Macro("Done with this file...\n\n");
// Some code...

Edit:

And grail came up with a method using the record separator and output record separator instead:

Code:

awk '/Test/{gsub(/\\n/,"")}1' RS=";" ORS=";" file

Posted 10-28-2016 at 01:52 AM by Turbocapitalist Turbocapitalist is offline

Updated 10-28-2016 at 02:18 AM by Turbocapitalist

	As a followup, bradvan had asked about using sed to delete a block if it contains specific text. The following deletes all blocks of text that begin with 'aa' and end with 'cc' if they also happend to contain 'bb' somewhere in between: Code: Begin='aa'; End='cc'; sed -e "/^$Begin/,/^$End/{H;\$!d;}; x;/bb/d;\${p;x;}" somefile.txt; This makes use of the hold buffer.
	Posted 07-16-2018 at 01:56 PM by Turbocapitalist