LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   sed or awk for text file manipulations (https://www.linuxquestions.org/questions/linux-general-1/sed-or-awk-for-text-file-manipulations-919317/)

rng 12-18-2011 07:29 AM

sed or awk for text file manipulations
 
I want to learn an application for manipulating text files. Which is better in terms of features and ease of learning: awk or sed (or something else)?

tronayne 12-18-2011 08:18 AM

More likely, you'd want to learn both -- sed is the streaming editor where AWK is a programming language; sed can alter a file, AWK can make decisions depending upon conditions (and do arithmetic and other handy things).

sed is quite useful when you have either a large number of files or extremely large size files that you need to "clean up" or alter in some way in a stream (files are not loaded into memory, sed works on a line at a time in a pipeline). This allows multiple edits on every line and goes quick like a bunny.

AWK, on the other hand, is used to write small programs that can do large jobs quickly and efficiently.

Both of them are well worth your time to become comfortable with.

Hope this helps some.

jschiwal 12-18-2011 11:06 AM

Awk is a natural for text organized in records of fields. If you are manipulating something like a mailing list or phone list, look at using awk first.

Telengard 12-18-2011 02:36 PM

Both are worthwhile, but if I could only choose one I'd make it AWK. AWK programs tend to be easier for me to read.

AWK has features and constructs common to high level programming languages. If you already know any C-like language, AWK should not be difficult to learn. Sed uses a language similar to GNU Ed, meaning single-symbol commands and a heavy reliance on regular expressions.

AWK sees use for complex tasks, such a Wiki software. Sed mostly sees use for simple substitutions and deletions. People have done some very cool things in both languages though. I seem to recall that someone wrote an RPN calculator in Sed.

Here are some examples of specific programs in AWK and Sed. Fields are separated by single `\t' (tab) characters. Additions, elaborations and corrections welcome.

Output entire file
Code:

test$ awk '{print}' birthday-cake.txt
Janet  June    chocolate
Ken    June    chocolate
Jeff    November        vanilla
Dan    January vanilla
test$ sed '#' birthday-cake.txt
Janet  June    chocolate
Ken    June    chocolate
Jeff    November        vanilla
Dan    January vanilla
test$

Find lines containing specific text
Code:

test$ awk '/November/' birthday-cake.txt
Jeff    November        vanilla
test$ sed '/November/!d' birthday-cake.txt
Jeff    November        vanilla
test$

Replace specific text in specific field
Code:

test$ awk 'BEGIN {OFS = "\t"}; $3 == "vanilla" {$3 = "strawberry"}; {print}' birthday-cake.txt
Janet  June    chocolate
Ken    June    chocolate
Jeff    November        strawberry
Dan    January strawberry
test$ sed 's/^\([^'$'\t''][^'$'\t'']*'$'\t''[^'$'\t''][^'$'\t'']*'$'\t''\)vanilla/\1strawberry/' birthday-cake.txt
Janet  June    chocolate
Ken    June    chocolate
Jeff    November        strawberry
Dan    January strawberry
test$

Don't let that last one scare you. The $'\t' bit is a Bash-ism for an explicit `\t' (tab) character. I had to make sure that the substitution occurred in the third field only, so it was necessary to count the tab characters between fields. My Sed-fu is weak; surely there is a better way. ;)

I recommend reading the manuals for each program, and then decide for yourself which one seems best for the task at hand. Here is the manual for GNU Sed.

Most GNU/Linux systems probably have Gawk or Mawk, though there are other implementations available. It is important to know which implementation you have because they are not all the same. The command awk -Wversion should tell you which AWK you have. I recommend reading the GNU Awk User's Guide in addition to the manual for your AWK, as it is the most complete reference to the laguage I'm aware of.

HTH

rng 12-18-2011 08:10 PM

Thanks for explanations.


All times are GMT -5. The time now is 06:15 AM.