LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Need help stripping statement from text file, ksh: sed awk? (https://www.linuxquestions.org/questions/programming-9/need-help-stripping-statement-from-text-file-ksh-sed-awk-739194/)

austin881 07-10-2009 10:15 AM

Need help stripping statement from text file, ksh: sed awk?
 
This is my first time to this site, so go easy on me.

I'm writing a Korn Script that clears out the NO_HW entries from an ioscan. The script takes the output of the ioscan and places into a text file. The text file is reduced to only the lines with "NO_HW" on them.

Example:
Code:

ctl        54  0/3/0/0/0/0.0.21.11.0    sctl          NO_HW      DEVICE      HP      260 SAS AJ940A
ctl        58  0/3/0/0/0/0.0.27.11.0    sctl          NO_HW      DEVICE      HP      270 SAS AJ941A
ctl        59  0/3/0/0/0/0.0.28.9.0    sctl          NO_HW      DEVICE      HP      270 SAS AJ941A
ctl        60  0/3/0/0/0/0.0.29.11.0    sctl          NO_HW      DEVICE      HP      270 SAS AJ941A
ctl        61  0/3/0/0/0/0.0.30.12.0    sctl          NO_HW      DEVICE      HP      270 SAS AJ941A
ctl        62  0/3/0/0/0/0.0.31.12.0    sctl          NO_HW      DEVICE      HP      270 SAS AJ941A
ctl        63  0/3/0/0/0/0.0.32.12.0    sctl          NO_HW      DEVICE      HP      270 SAS AJ941A

I am trying to strip out EVERYTHING except the H/W path.

Example:
Code:

0/3/0/0/0/0.0.21.11.0
So that the text file will look like this:

Code:

0/3/0/0/0/0.0.21.11.0
0/3/0/0/0/0.0.27.11.0
0/3/0/0/0/0.0.28.9.0
0/3/0/0/0/0.0.29.11.0

..and so on. So I can use the rmsf -H 0/3/0/0/0/0.0.21.11.0 command on each H/W Path and remove it.

The command needs to be pretty versatile as the H/W path could be just about anything.

I was thinking this could probably be done with sed or maybe awk but I'm not very proficient in those. Thanks.

David the H. 07-10-2009 10:32 AM

No need for anything as fancy as sed or awk.

cut -f3 file.txt

This is assuming the fields are separated by tabs. If they're spaces you'll have to add '-d " "' and change the -f number to match the column you want.

David1357 07-10-2009 11:46 AM

Quote:

Originally Posted by austin881 (Post 3603404)
I was thinking this could probably be done with sed or maybe awk but I'm not very proficient in those.

Although "David the H." has an easy method, it fails if the column number ever changes. A less fragile approach is
Code:

[user@machine:~]:awk '{ print $3 }' < blah.txt
0/3/0/0/0/0.0.21.11.0
0/3/0/0/0/0.0.27.11.0
0/3/0/0/0/0.0.28.9.0
0/3/0/0/0/0.0.29.11.0
0/3/0/0/0/0.0.30.12.0
0/3/0/0/0/0.0.31.12.0
0/3/0/0/0/0.0.32.12.0

awk is able to handle an arbitrary number of spaces between columns.

Another approach would be to use sed to replace the spaces with commas, and then use cut:
Code:

[user@machine:~]:sed -e 's/  */,/g' < blah.txt | cut -d ',' -f 3
0/3/0/0/0/0.0.21.11.0
0/3/0/0/0/0.0.27.11.0
0/3/0/0/0/0.0.28.9.0
0/3/0/0/0/0.0.29.11.0
0/3/0/0/0/0.0.30.12.0
0/3/0/0/0/0.0.31.12.0
0/3/0/0/0/0.0.32.12.0

NOTE: The sed expression has two spaces before the asterisk. That tells sed to match one or more spaces.

jan61 07-10-2009 01:55 PM

Moin,

Quote:

Originally Posted by David1357 (Post 3603503)
Code:

[user@machine:~]:awk '{ print $3 }' < blah.txt

Why do you use stdin redirect? awk is able to read from files given as command line argument.

Quote:

Originally Posted by David1357 (Post 3603503)
Code:

[user@machine:~]:sed -e 's/  */,/g' < blah.txt | cut -d ',' -f 3

Again: Why the redirect? The sed is also (like the majority of unix commands) able to retrieve filenames from the command line.

In the second example I'd prefer to let sed do the whole work. No need for cut:
Code:

sed 's/[^ ]* *[^ ]* *\([^ ]*\).*/\1/' file.txt
If you want to catch spaces and tabs:
Code:

sed 's/[^ \t]*[ \t]*[^ \t]*[ \t]*\([^ \t]*\).*/\1/' file.txt
Jan

David1357 07-13-2009 09:53 AM

Quote:

Originally Posted by jan61 (Post 3603642)
Why do you use stdin redirect? awk is able to read from files given as command line argument.

Habit.

Quote:

Originally Posted by jan61 (Post 3603642)
Again: Why the redirect? The sed is also (like the majority of unix commands) able to retrieve filenames from the command line.

Again, habit.

Quote:

Originally Posted by jan61 (Post 3603642)
In the second example I'd prefer to let sed do the whole work. No need for cut:
Code:

sed 's/[^ ]* *[^ ]* *\([^ ]*\).*/\1/' file.txt

Not very readable.

Quote:

Originally Posted by jan61 (Post 3603642)
If you want to catch spaces and tabs:
Code:

sed 's/[^ \t]*[ \t]*[^ \t]*[ \t]*\([^ \t]*\).*/\1/' file.txt

Even less readable.

H_TeXMeX_H 07-13-2009 10:11 AM

Quote:

Originally Posted by David1357 (Post 3606124)
Not very readable.


Even less readable.

Yup, I fully agree, lines like that make me sick.

jan61 07-13-2009 04:09 PM

Moin,

*habit*, *not readable*, *makes me sick* - strange. I'd think, that efficiency in scripting should be a part of one's judgement too. Your awk solution is o.k., your sed / cut not, it's a waste of resources. Your *habit* to use redirect makes things more difficult to understand (unnecessary redirects and cat's are like an epidemic in my opinion!).

I only wanted to show, how sed can be used to do the whole work in the second example - if one doesn't understand the line, he should be able to ask. It's a simple basic regular expression, nothing high sophisticated.

Jan

David1357 07-13-2009 04:31 PM

Quote:

Originally Posted by jan61 (Post 3606527)
I'd think, that efficiency in scripting should be a part of one's judgement too.

The overhead of reading a file from stdin versus opening the file directly is negligible (long.txt is 70007 lines):
Code:

[user@machine:~]:time awk '{ print $3 }' < long.txt >> /dev/null

real    0m0.076s
user    0m0.072s
sys    0m0.004s
[user@machine:~]:time awk '{ print $3 }' long.txt >> /dev/null

real    0m0.077s
user    0m0.060s
sys    0m0.016s

It actually took longer for the version that specified the file name, but I would have to run each command at least 100 times and average the results for us to do a fair comparison.

Quote:

Originally Posted by jan61 (Post 3606527)
Your awk solution is o.k., your sed / cut not, it's a waste of resources.

Let us see how long the sed-plus-cut solution takes
Code:

[user@machine:~]:time sed -e 's/  */,/g' < long.txt | cut -d ',' -f 3 >> /dev/null

real    0m1.329s
user    0m1.304s
sys    0m0.032s

Much less efficient. So why would I suggest it? If the comma-separated data is saved, then cut can be reused. Let us see how well that works
Code:

[user@machine:~]:time cut -d ',' -f 3 < longc.txt >> /dev/null

real    0m0.042s
user    0m0.040s
sys    0m0.000s

So cut is much more efficient than awk when given well formatted input.

Quote:

Originally Posted by jan61 (Post 3606527)
Your *habit* to use redirect makes things more difficult to understand (unnecessary redirects and cat's are like an epidemic in my opinion!).

Many would disagree with you. It really comes down to a religious debate.

Quote:

Originally Posted by jan61 (Post 3606527)
I only wanted to show, how sed can be used to do the whole work in the second example - if one doesn't understand the line, he should be able to ask. It's a simple basic regular expression, nothing high sophisticated.

The problem with those regular expressions you posted is that they are hard to decipher. Someone who writes a script using them may come back in a week and forget what they do. Sometimes script writers like to sacrifice efficiency for clarity.


All times are GMT -5. The time now is 11:22 AM.