[SOLVED] Replicate a field

danielbmartin · 04-24-2010, 11:59 AM

Hello.

I have a file containing text. I want to replicate a specific field.
For example, I might want to append a copy of the second word of each line to the end of that line.

Have:
Once upon a midnight dreary, while I pondered weak and weary,
Over many a quaint and curious volume of forgotten lore,

Want:
Once upon a midnight dreary, while I pondered weak and weary, upon
Over many a quaint and curious volume of forgotten lore, many

Is there a Linux command which will do this?
I seek a basic command, not awk, not Perl, because I haven't learned those things yet.

Daniel B. Martin

pixellany · 04-24-2010, 12:10 PM

First, let's define some terms: There are no "Linux commands". There are shell commands---BASH being the most common shell---and there are a bazillion utilities, applications, etc.

For text manipulation, common utilities include SED, AWK, and Perl. (Maybe Python also)

The BASH man pages will tell you about the commands built-in to BASH.

Second, I do not recommend posting a question here, and then placing restrictions on what solutions are offered. In fact, since you are talking about fields, I suspect that AWK may be one of the better choices.

I assume you want to do this on a line by line basis. Thus, you cannot simply use one tool to grab a word into a variable, and then make a second pass to add that variable to the end of the line.

Tinkster · 04-26-2010, 01:19 AM

Indeed ... this one screams "awk" and "perl" at the top of
its lungs...

Code:

awk '{print $0", "$2}' file

You could use a shell script and treat each line like so:

Code:

...
scnd=$( echo $line|sed 's/[[:space:]][[:space:]]*/ /g'| cut -d" " -f2)
echo ${line}", "${scnd}
...

The sed is in there in case there's a few consecutive spaces or tabs
in the line which would throw "cut" off.

Personally I find the awk version cleaner and more concise.

Cheers,
Tink

catkin · 04-26-2010, 01:39 AM

Quote:

Originally Posted by danielbmartin

Is there a Linux command which will do this?
I seek a basic command, not awk, not Perl, because I haven't learned those things yet.

pixellany usefully defined terms. Since you didn't exclude bash (bash commands could be held as "basic"), here's a pure bash solution

Code:

#!/bin/bash

while read line
do
    array=( $line )
    echo $line ${array[1]}
done <  input.txt

EDIT: or, more neatly

Code:

#!/bin/bash
while read -a array
do
    echo ${array[*]} ${array[1]}
done <  input.txt

danielbmartin · 04-26-2010, 06:30 AM

Quote:

Originally Posted by pixellany

... I do not recommend posting a question here, and then placing restrictions on what solutions are offered. ...

I respect your expertise and long service to this forum. This is a counterargument which you may find reasonable.

This is the Newbie Forum. I am a newbie, learning Linux on my own. I can't learn all of it at once, so I'm starting with what I mistakenly called Linux commands. Commands such as sed and grep are so powerful that I want to develop competence and confidence with them before moving on to awk or Perl.

If I place no bounds on solutions some members will produce awk or Perl solutions. Then they feel betrayed when I won't use their hard work. That's because I am unwilling to use code that I don't understand.

Daniel B. Martin

pixellany · 04-26-2010, 08:01 AM

Quote:

Originally Posted by danielbmartin

I respect your expertise and long service to this forum. This is a counterargument which you may find reasonable.

This is the Newbie Forum. I am a newbie, learning Linux on my own. I can't learn all of it at once, so I'm starting with what I mistakenly called Linux commands. Commands such as sed and grep are so powerful that I want to develop competence and confidence with them before moving on to awk or Perl.

If I place no bounds on solutions some members will produce awk or Perl solutions. Then they feel betrayed when I won't use their hard work. That's because I am unwilling to use code that I don't understand.

Daniel B. Martin

I totally understand your point of view---and I especially agree with the last sentence.

The only thing I can offer is that the work required to apply the wrong tool often eclipses the work required to learn the right tool. I have personally demonstrated this by coming up with some totally convoluted SED code and then watching the AWK experts swoop in with something far better.

I recommend learning all of the most common tools in the depth required to get your work done. In my case, I know SED and GREP well enough to know what problems will be difficult or even impossible. From this, I know when I need to dig back into AWK and learn a bit more.

MTK358 · 04-26-2010, 08:18 AM

Code:

$ echo "this is test text" | sed -r 's:^([^ \t]+[ \t]+)([^ \t]+)(.*)$:\1\2\3, \2:'
this is test text, is

pixellany · 04-26-2010, 08:28 AM

I love it!!!! Another SED fanatic is released into the world.

This eloquently demonstrates my point above:

Quote:

The only thing I can offer is that the work required to apply the wrong tool often eclipses the work required to learn the right tool. I have personally demonstrated this by coming up with some totally convoluted SED code and then watching the AWK experts swoop in with something far better.

But then MTK's SED solution is NOT convoluted at all---it is a very simple and elegant use of backreferences.

grail · 04-26-2010, 09:10 AM

I think we can do it a little different, but same result:

Code:

echo "this is test text" | sed -r 's:[ \t]+([^ \t]+).*:&, \1:'

colucix · 04-26-2010, 09:28 AM

Yet another different approach...

Code:

paste -d' ' file <(cut -d' ' -f2 file)

but maybe too specific for the example shown in the original post.

catkin · 04-26-2010, 10:14 AM

Quote:

Originally Posted by danielbmartin

This is the Newbie Forum. I am a newbie, learning Linux on my own. I can't learn all of it at once, so I'm starting with what I mistakenly called Linux commands. Commands such as sed and grep are so powerful that I want to develop competence and confidence with them before moving on to awk or Perl.

I respect and understand your position; I would like to offer a counter argument.

There are many commands in the toolset, each with pros and cons for solving various problems. I doubt that any of us are totally fluent with them all. It is not necessary, even if possible, to completely master each before moving on to the next. Another approach is to learn simple usage of an increasing number and gradually extend that knowledge as convenient, as need arises.

This problem suits awk particularly well, allowing Tinkster to offer the simple and comparatively comprehensible

Code:

awk '{print $0", "$2}' file

Hoping to tempt you, it breaks down like this:

awk <string> file means run awk with program <string>, taking input from file.
awk processes each line in turn.
An awk program comprises patterns and actions; when the pattern matches the line the action is performed.
In this case no pattern is given; for awk that matches all lines.
The action is contained in { }.
awk puts the whole line in variable $0 and parses the line into $1, $2, $3 ... words according to its word separator.
The default word separator is a space.
awk's print function prints its arguments to standard output, by default the terminal.
In awk, literal strings are given in double quotes.
awk concatenates adjacent strings.
Thus $0", "$2 is the whole line, followed by comma and space followed by the second word of the line. For every line of file, awk prints that to standard output.

danielbmartin · 04-27-2010, 07:03 PM

Quote:

Originally Posted by catkin

... This problem suits awk particularly well, allowing Tinkster to offer the simple and comparatively comprehensible

Code:

awk '{print $0", "$2}' file

Hoping to tempt you, it breaks down like this:

awk <string> file means run awk with program <string>, taking input from file.
awk processes each line in turn.
An awk program comprises patterns and actions; when the pattern matches the line the action is performed.
In this case no pattern is given; for awk that matches all lines.
The action is contained in { }.
awk puts the whole line in variable $0 and parses the line into $1, $2, $3 ... words according to its word separator.
The default word separator is a space.
awk's print function prints its arguments to standard output, by default the terminal.
In awk, literal strings are given in double quotes.
awk concatenates adjacent strings.
Thus $0", "$2 is the whole line, followed by comma and space followed by the second word of the line. For every line of file, awk prints that to standard output.

Thank you for the detailed explanation. It whets my appetite for learning awk.

Some respondents misread the original post. The objective is to append the second word in each line to that line. There was no need for an additional comma. With that clarification, several of the offered code segments could be simplified.

Daniel B. Martin

danielbmartin · 04-27-2010, 07:08 PM

Quote:

Originally Posted by colucix

Yet another different approach...

Code:

paste -d' ' file <(cut -d' ' -f2 file)

but maybe too specific for the example shown in the original post.

Love it! One line of code, and more readable than some of the other suggested code segments. (More readable, at least, to this newbie.)

Technical Excellence may be defined as "completeness of function coupled with economy of means." Your solution qualifies as TE!

For extra credit: show how the output may be directed to a file rather than standard output.

Daniel B. Martin

catkin · 04-28-2010, 02:53 AM

Quote:

Originally Posted by danielbmartin

show how the output may be directed to a file rather than standard output.

Standard output can be directed to a file using the output redirection operator ">" as in

Code:

ls > my_file

Sometimes a command produces standard error as well. If you want it inn the same file then

Code:

command > my_file 2>&1

where "2>&1" means "send standard error (stream 2) to the same place as standard input (&1) is going".
In case you want them in different files

Code:

command > my_file.stdout 2> my_file.stderr

MTK358 · 04-28-2010, 06:52 AM

> redirects stdout to a file.

2> redirects stderr to a file.

&> redirects both stdout and stderr to a file.