LinuxQuestions.org - Need help stripping statement from text file, ksh: sed awk?

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Need help stripping statement from text file, ksh: sed awk? (https://www.linuxquestions.org/questions/programming-9/need-help-stripping-statement-from-text-file-ksh-sed-awk-739194/)

Need help stripping statement from text file, ksh: sed awk?

This is my first time to this site, so go easy on me.

I'm writing a Korn Script that clears out the NO_HW entries from an ioscan. The script takes the output of the ioscan and places into a text file. The text file is reduced to only the lines with "NO_HW" on them.

Example:

Code:

ctl        54  0/3/0/0/0/0.0.21.11.0    sctl          NO_HW      DEVICE      HP      260 SAS AJ940A

ctl        58  0/3/0/0/0/0.0.27.11.0    sctl          NO_HW      DEVICE      HP      270 SAS AJ941A

ctl        59  0/3/0/0/0/0.0.28.9.0    sctl          NO_HW      DEVICE      HP      270 SAS AJ941A

ctl        60  0/3/0/0/0/0.0.29.11.0    sctl          NO_HW      DEVICE      HP      270 SAS AJ941A

ctl        61  0/3/0/0/0/0.0.30.12.0    sctl          NO_HW      DEVICE      HP      270 SAS AJ941A

ctl        62  0/3/0/0/0/0.0.31.12.0    sctl          NO_HW      DEVICE      HP      270 SAS AJ941A

ctl        63  0/3/0/0/0/0.0.32.12.0    sctl          NO_HW      DEVICE      HP      270 SAS AJ941A

I am trying to strip out EVERYTHING except the H/W path.

Example:

Code:

0/3/0/0/0/0.0.21.11.0

So that the text file will look like this:

Code:

0/3/0/0/0/0.0.21.11.0

0/3/0/0/0/0.0.27.11.0

0/3/0/0/0/0.0.28.9.0

0/3/0/0/0/0.0.29.11.0

..and so on. So I can use the rmsf -H 0/3/0/0/0/0.0.21.11.0 command on each H/W Path and remove it.

The command needs to be pretty versatile as the H/W path could be just about anything.

I was thinking this could probably be done with sed or maybe awk but I'm not very proficient in those. Thanks.

No need for anything as fancy as sed or awk.

cut -f3 file.txt

This is assuming the fields are separated by tabs. If they're spaces you'll have to add '-d " "' and change the -f number to match the column you want.

Quote:

Originally Posted by austin881 (Post 3603404)

I was thinking this could probably be done with sed or maybe awk but I'm not very proficient in those.

Although "David the H." has an easy method, it fails if the column number ever changes. A less fragile approach is

Code:

[user@machine:~]:awk '{ print $3 }' < blah.txt

0/3/0/0/0/0.0.21.11.0

0/3/0/0/0/0.0.27.11.0

0/3/0/0/0/0.0.28.9.0

0/3/0/0/0/0.0.29.11.0

0/3/0/0/0/0.0.30.12.0

0/3/0/0/0/0.0.31.12.0

0/3/0/0/0/0.0.32.12.0

awk is able to handle an arbitrary number of spaces between columns.

Another approach would be to use sed to replace the spaces with commas, and then use cut:

Code:

[user@machine:~]:sed -e 's/  */,/g' < blah.txt | cut -d ',' -f 3

0/3/0/0/0/0.0.21.11.0

0/3/0/0/0/0.0.27.11.0

0/3/0/0/0/0.0.28.9.0

0/3/0/0/0/0.0.29.11.0

0/3/0/0/0/0.0.30.12.0

0/3/0/0/0/0.0.31.12.0

0/3/0/0/0/0.0.32.12.0

NOTE: The sed expression has two spaces before the asterisk. That tells sed to match one or more spaces.

Moin,

Quote:

Originally Posted by David1357 (Post 3603503)

Code:

[user@machine:~]:awk '{ print $3 }' < blah.txt

Why do you use stdin redirect? awk is able to read from files given as command line argument.

Quote:

Originally Posted by David1357 (Post 3603503)

Code:

[user@machine:~]:sed -e 's/ */,/g' < blah.txt | cut -d ',' -f 3

Again: Why the redirect? The sed is also (like the majority of unix commands) able to retrieve filenames from the command line.

In the second example I'd prefer to let sed do the whole work. No need for cut:

Code:

sed 's/[^ ]* *[^ ]* *$[^ ]*$.*/\1/' file.txt

If you want to catch spaces and tabs:

Code:

sed 's/[^ \t]*[ \t]*[^ \t]*[ \t]*$[^ \t]*$.*/\1/' file.txt

Jan

Quote:

Originally Posted by jan61 (Post 3603642)

Why do you use stdin redirect? awk is able to read from files given as command line argument.

Habit.

Quote:

Originally Posted by jan61 (Post 3603642)

Again: Why the redirect? The sed is also (like the majority of unix commands) able to retrieve filenames from the command line.

Again, habit.

Quote:

Originally Posted by jan61 (Post 3603642)

In the second example I'd prefer to let sed do the whole work. No need for cut:

Code:

sed 's/[^ ]* *[^ ]* *$[^ ]*$.*/\1/' file.txt

Not very readable.

Quote:

Originally Posted by jan61 (Post 3603642)

If you want to catch spaces and tabs:

Code:

sed 's/[^ \t]*[ \t]*[^ \t]*[ \t]*$[^ \t]*$.*/\1/' file.txt

Even less readable.

Quote:

Originally Posted by David1357 (Post 3606124)

Not very readable.

Even less readable.

Yup, I fully agree, lines like that make me sick.

Moin,

*habit*, *not readable*, *makes me sick* - strange. I'd think, that efficiency in scripting should be a part of one's judgement too. Your awk solution is o.k., your sed / cut not, it's a waste of resources. Your *habit* to use redirect makes things more difficult to understand (unnecessary redirects and cat's are like an epidemic in my opinion!).

I only wanted to show, how sed can be used to do the whole work in the second example - if one doesn't understand the line, he should be able to ask. It's a simple basic regular expression, nothing high sophisticated.

Jan

Quote:

Originally Posted by jan61 (Post 3606527)

I'd think, that efficiency in scripting should be a part of one's judgement too.

The overhead of reading a file from stdin versus opening the file directly is negligible (long.txt is 70007 lines):

Code:

[user@machine:~]:time awk '{ print $3 }' < long.txt >> /dev/null



real    0m0.076s

user    0m0.072s

sys    0m0.004s

[user@machine:~]:time awk '{ print $3 }' long.txt >> /dev/null



real    0m0.077s

user    0m0.060s

sys    0m0.016s

It actually took longer for the version that specified the file name, but I would have to run each command at least 100 times and average the results for us to do a fair comparison.

Quote:

Originally Posted by jan61 (Post 3606527)

Your awk solution is o.k., your sed / cut not, it's a waste of resources.

Let us see how long the sed-plus-cut solution takes

Code:

[user@machine:~]:time sed -e 's/  */,/g' < long.txt | cut -d ',' -f 3 >> /dev/null



real    0m1.329s

user    0m1.304s

sys    0m0.032s

Much less efficient. So why would I suggest it? If the comma-separated data is saved, then cut can be reused. Let us see how well that works

Code:

[user@machine:~]:time cut -d ',' -f 3 < longc.txt >> /dev/null



real    0m0.042s

user    0m0.040s

sys    0m0.000s

So cut is much more efficient than awk when given well formatted input.

Quote:

Originally Posted by jan61 (Post 3606527)

Your *habit* to use redirect makes things more difficult to understand (unnecessary redirects and cat's are like an epidemic in my opinion!).

Many would disagree with you. It really comes down to a religious debate.

Quote:

Originally Posted by jan61 (Post 3606527)

I only wanted to show, how sed can be used to do the whole work in the second example - if one doesn't understand the line, he should be able to ask. It's a simple basic regular expression, nothing high sophisticated.

The problem with those regular expressions you posted is that they are hard to decipher. Someone who writes a script using them may come back in a week and forget what they do. Sometimes script writers like to sacrifice efficiency for clarity.