LinuxQuestions.org - Can I use regexps in linux (sed awk?)

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Can I use regexps in linux (sed awk?) (https://www.linuxquestions.org/questions/linux-newbie-8/can-i-use-regexps-in-linux-sed-awk-672054/)

bioinformatics_guy

09-24-2008 08:05 AM

Can I use regexps in linux (sed awk?)

I have a program that prints its output to standard out. What I want to do is look at this data and grab pieces of information from it as it comes.

More specific, if I redirected the output to a file, the last line would be of form:

Final graph has (\d+) nodes and n50 of (\d+) max (\d+)

and I want to grab these 3 numbers and put them in a file. Now, do I have to redirect the output to a file then read through each line or can I just say look at the last line (I'd like to know how to do both for future reference) or can I do it directly from the standard output?

chrism01

09-24-2008 08:13 AM

Either: you can direct to a file and then grep or just pipe through grep eg

yourprog | grep somepattern

or awk or sed; your choice

pixellany

09-24-2008 08:25 AM

Regexes are used in a wide variety of BASH utilities (including SED and AWK)

For grabbing numbers out of a line, you can use AWK (specifying which field to print). In your example, it is not clear what the 3 numbers are---or what is the significance of this notation: \d+

Here's a crude example with AWK. (First I created the file "fields"):

Code:

root@ath:/home/mherring/play# more fields

the numbers are 34 456 and 135

root@ath:/home/mherring/play# awk '{print $4, $5, $7}' fields

34 456 135

root@ath:/home/mherring/play#

This of course depends on the numbers always appearing in the same field position.

To print the last line of a file:

sed -n '$p' filename
OR
tail -n1 filename

Go to http://tldp.org and get "Bash Guide for Beginners" and maybe the "Advanced Bash Scripting Guide" (ABS)

bioinformatics_guy

09-24-2008 10:05 AM

When I put \d+ I just meant any magnitude of number (0-inf, but it will only be as large as a few thousand.

So, what I'm thinking is redirect the output to a file, tail the file to get the last line then awk it? Does that sound like a reasonable pipeline? If it works Ill place the one liner here.

bioinformatics_guy

09-24-2008 10:16 AM

Ok it worked, I printed it out by first redirecting to test, then using the following command

tail -n 1 test | awk '{$4 $9 $11}'

but my output didn't have spaces, say it was 3 45 89, it would print out 34589. How can I add spaces?

bioinformatics_guy

09-24-2008 10:29 AM

I also can't redirect the numbers to a file for storing

tail -n 1 test | awk '{$4 $9 $11}' | echo $4 $9 $11 > store

it just prints exit and closes my shell?

jschiwal

09-24-2008 10:31 AM

From what I understand of what you want to do, you want to pass the data stream out stdio unchanged but extract the information from it that you need. One way of doing this is to use the "tee" command which is like a tap between two pipes (|). The argument to tee is a file. The standard input is passed on to the standard output and to the file or files which are listed as arguments. The file argument could be a fifo special file, and the fifo file could be used as the file input argument of a sed or awk command running in another script or in background subprocess of the same script.

mkfifo nodes
sed -f extractnodes.sed nodes >extracted_nodes.cvs &
<program generating your data stream> | tee nodes | <program processing your data stream>

pixellany

09-24-2008 11:14 AM

Quote:

Originally Posted by bioinformatics_guy (Post 3290418)

I also can't redirect the numbers to a file for storing

tail -n 1 test | awk '{$4 $9 $11}' | echo $4 $9 $11 > store

it just prints exit and closes my shell?

Within awk, "$4" means the contents of the 4th field.
Outside of awk, "$4" means the contents of the variable named "4".
There's no relationship between the two.

Further, you can't use a pipe to pass the output of awk to echo, since echo does not expect an input.

How about:
tail -n 1 test | awk '{$4 $9 $11}' > store
(you can ALWAYS redirect the output of something to a file.....

(BTW, I did not know awk would work like this without the "print" statement but--as they say--"If it works, it's OK.")

All times are GMT -5. The time now is 05:51 AM.