LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Bash Sed Help (https://www.linuxquestions.org/questions/programming-9/bash-sed-help-871032/)

CodyK479 03-25-2011 05:37 PM

Bash Sed Help
 
OK. I can use sed to read a line from a file, but what sed
command also cuts the fields in a tab separated file.

There has to be one but I can't find it. In the mean time I've
been trying something like this, which obviously does not work.
I think the first option would be better, but for future reference
how could I make the following work. That is, how can you redirect output to SED rather than using a file.

sed -n 25p `cat myfile | cut -f1`

kurumi 03-25-2011 06:50 PM

why do you want to use sed to cut fields? Use awk / cut for this job. its more suitable.

CodyK479 03-25-2011 07:23 PM

i guess because I've rarely used awk. I guess I will try that. Thanks

The questions still remains though. Is there such a command for sed and can you redirect
output to sed.

crts 03-25-2011 07:33 PM

It is a bit hard to guess what you are trying to achieve without some sample data.
Maybe something like this?
Code:

sed -n '25 s/\t.*//p' file

CodyK479 03-25-2011 08:00 PM

Let me explain again. I just want to pick a line from a file and the split that line up. It is a tab separated file and there are three fields. My program will know what the line number is. And I will spit up all 3 fields and place into 3 separate variables.

So, lets say I want line number 25 and field 1:

Using awk and cut sounds nice. But how do you use awk to pick a specific line number? (other than reformatting my data file with line numbers and searching)

What about sed?
sed -n '25 s/\t.*//p' file

that looks like a search for \t.* or something. I want to search for field one with tab as a delinator.

What about redirecting cut into sed?

crts 03-25-2011 08:28 PM

Quote:

Originally Posted by CodyK479 (Post 4303664)
Let me explain again. I just want to pick a line from a file and the split that line up. It is a tab separated file and there are three fields. My program will know what the line number is. And I will spit up all 3 fields and place into 3 separate variables.

So, lets say I want line number 25 and field 1:

Using awk and cut sounds nice. But how do you use awk to pick a specific line number? (other than reformatting my data file with line numbers and searching)

What about sed?
sed -n '25 s/\t.*//p' file

that looks like a search for \t.* or something. I want to search for field one with tab as a delinator.

What about redirecting cut into sed?

\t is the TAB character. Above command deletes everything from the first occurrence of a TAB to the end of line, thus leaving only that part that would correspond to the field you get with
cut -f1

I am not 100% sure what you mean by field. The cut command treats the delimiter itself as a field while awk does not. So what do you mean when you say 3 fields?
Code:

col1      col2
or
Code:

col1      col2      col3
It probably can be done with sed but I would also recommend awk as the first choice of tool for this specific task.

kurumi 03-25-2011 08:39 PM

Quote:

Originally Posted by CodyK479 (Post 4303664)

Using awk and cut sounds nice. But how do you use awk to pick a specific line number? (other than reformatting my data file with line numbers and searching)

You use NR (or FNR) for line number. $1, $2, $3 ...... for fields... and -F to specify field delimiter.
Code:

awk -F"\t" 'NR==3{print $3}' file
Isn't it easier than sed?

CodyK479 03-25-2011 08:46 PM

Umm.. ok. I didn't know that about cut. I was just saying I have a tab deliminated file with 3 fields meaning there's at most two tabs in each record. Nevermind though.

I think this is it.

awk -F'\t' 'NR==25 {print $1}' myfile

CodyK479 03-25-2011 09:03 PM

thanks

David the H. 03-26-2011 05:28 AM

The others have given you direct solutions, but let me discuss the reasons. You need the right tool for the right job.


sed is the "stream editor", and works on a per-line basis. It takes one line at a time into its pattern buffer (as determined by newline characters), applies a series of editing commands to it, then clears the buffer and grabs the next line. Which lines it grabs can be controlled by conditions you give it, or else it goes through the whole file line by line.

sed only sees the pattern buffer contents as a single continuous text stream; it has no concept of "fields". Thus the only way to extract a part of a line is to use the s/// command and regular expression pattern matching.

sed does also have some multi-line ability, but it's complex and awkward to use. Again, there are usually other tools that work better.


awk, on the other hand, is field based. You define the fields you want it to operate on, then you can print, edit, or reorganize them pretty much as you please.

It does, in fact, have two main input controls. First, the record separator determines how much of the file it will operate on in a single cycle. This is a single line by default, like sed, but it can be configured to do a paragraph at a time, for example, or even the whole file at once.

Then awk also has the field separator, which defines how the text fields inside each record are determined. The default is whitespace (any amount of contiguous spaces and/or tabs).

awk also has a much more comprehensive collection of functions available to it, and is able to apply complex conditions, loops, and mathematical operations to the input text.

This all means that sed is usually more convenient when doing simple single-line string extractions, insertions, alterations, and deletions, while awk is generally better for extracting individual fields from lines or blocks of text, and for doing more complex multi-line operations.


By the way, don't discount the convenience of chaining commands together either . cut is a single-purpose tool that does what it does quickly and efficiently. There's really nothing wrong with chaining sed and cut together to do what you want (except why use two programs when one is sufficient? ;)); you just need to do it properly. Either work from left to right to pipe the output of one command into the input of another, or use either process substitution, or command substitution with a here string to simulate input from a file.

And since $(..) is recommended over `..`, all of the following should work:
Code:

#using a pipe
#perhaps the most common method

cut -f1 myfile | sed -n 25p
sed -n 25p myfile | cut -f1

#using process substitution
#the space between the two <'s is required

sed -n 25p < <(cut -f1 myfile)
cut -f1 < <(sed -n 25p myfile)

#using a here string and command substitution
#quotes are needed here to protect newlines

sed -n 25p <<<"$(cut -f1 myfile)"
cut -f1 <<<"$(sed -n 25p myfile)"

The second instance in each pair is likely better, since it means cut only has to operate on a single line.

Note also that it's not usually necessary to use cat, as the majority of tools such as sed and cut can accept filenames as input. Not to mention that your shell can do file redirects too.

Here are a few useful sed and awk references.
The grymoire links go to highly-recommended tutorials:
http://www.grymoire.com/Unix/Sed.html
http://sed.sourceforge.net/sedfaq.html
http://sed.sourceforge.net/sed1line.txt
http://www.grymoire.com/Unix/Awk.html
http://www.gnu.org/software/gawk/man...ode/index.html

CodyK479 03-26-2011 10:46 AM

thanks! I feel silly now because I had used that piping method and it didn't work. It is now though so I must of done something stupid. But I think I am getting the hang of awk now.

David the H. 03-26-2011 11:03 AM

Interestingly, I've just discovered that you can also use cut to extract whole lines. You just have to use a literal newline as the field delimiter.
Code:

cut -d '
' -f25 myfile | cut -f1

#or

cut -d $'\n' -f25 myfile | cut -f1

$'..' is a special quoting pattern available in bash (and other shells) that will convert backslashed special characters such as \n into their literal ascii equivalents during parsing, similar to echo's -e option.


All times are GMT -5. The time now is 10:30 AM.