How do I - not print - in awk?
I'm trying to clean up a fairly messy script and need a pointer in the right direction. Here's what i'm working with...
file gibberish 5 Monkey $gibberish $gibberish gibberish 8 Santa Claus $gibberish $gibberish gibberish 2 Evil Robot Army $gibberish $gibberish gibberish 7 Global Thermal Nuclear War $gibberish $gibberish I want to get rid of the gibberish, (and $gibberish). Here's what I did (and it's messy) for filename in *; do awk '$4 ~ /\$/' $filename | awk '$5 ~ /\$/ {print $3,$2}' > a awk '$5 ~ /\$/' $filename | awk '$6 ~ /\$/ {print $3,$4,$2}' > b awk '$6 ~ /\$/' $filename | awk '$7 ~ /\$/ {print $3,$4,$5,$2}' > c awk '$7 ~ /\$/' $filename | awk '$8 ~ /\$/ {print $3,$4,$5,$6,$2}' > d cat a b c d > $filename done In my case the awk list is actually much longer because the gibberish extends out to 15 fields. My only saving grace is that the pattern is the same, where I don't want the first, or last 2 fields. Is there a cleaner way to use awk so it prints everything but the first and last 2 fields? Or am I stuck with the ugly mess? Thanks Zim |
Quote:
this might help - perl in line edit. shorter, quicker, cleaner Code:
cat filename | perl -ne's/(gibberish |\$gibberish)//g;print' or in line if you confident... with .bak backs up orig. Code:
perl -pi.bak 's/(gibberish |\$gibberish)//g' filename |
Hi.
How about something like: $ awk '{ for (i=3;i<(NF-2);i++) { printf "%s ", $i }; if (i == (NF-2)) print $i }' /path/to/input/file Dave |
Thanks for the replies.
ilikejam, i tried your code and it removed the first and last fields leaving the second to last field of gibberish still there. I'll try to do some tweaking. centosboy, I just don't kow enough about perl to go down that road just yet. I'm still trying to get a handle on awk, so it will be a little while before I make that jump. Thank you both. |
Uh, that's odd. Using the 'file' you gave in your original post, I get back:
Code:
Monkey Dave |
ikilejam, thanks again for your response. I seem to be having a few issues with this. The "file" that I gave was obviously a generalization of the problem that I'm having. To be more accurate, the files that I'm trying to make neater look like this...
2097772 81264 BOOT 1983603 4/30/2007 1 $2.30 $2.30 2612268 023031COUPLING COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 2032363 6/25/2007 1 $4.60 $4.60 266586 60583203 BULB, PANEL LIGHT 2008627 5/29/2007 1 $0.50 $0.50 1995423 SP16F COLLAR, SPLIT 2 PIECE 1935593 3/9/2007 2 $3.80 $7.60 Where the outcome needs to be effectively description, then part # BOOT 81264 COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 023031COUPLING BULB, PANEL LIGHT 60583203 COUPLING, SPLIT 2 PIECE SP16F When I use the script you wrote, (for the first one) I get BOOT 1983603 4/30/2007 1 $2.30 Field 1, 2, and the final field are removed. When I did the copy paste to get this on here, I noticed there are spaces after the final number. I don't know if that has anything to do with anything. I do know that when I use my lengthy code it does work. I wouldn't think that the length of the lines needed to be in the correct order - or do they? Do the lines have to go sequentially in length for this to work - 4 fields, 5, 6, 7 as they were in my post? Sorry for not making things more clear from the start, I was hoping to be able to learn from your example, and tweak it to suit my needs. Apparently my vagueness caused confusion. I'll keep on trying. Thanks again, Zim |
Quote:
fair enough, but in my example, you dont need to know too much about perl, but about regexp, and just knowing what the extra perl flags mean, which perl -h tells anyway. |
Ah. OK.
Try: Code:
awk '{ for (i=3;i<(NF-4);i++) { printf "%s ",$i }; print $2 }' |
Quote:
When trying to solve these kinds of problems, it is very useful to express the requirements in terms of how you might perform the modifications if you were doing it manually. Using language that describes the input in terms of fields is a good start, and describing how one might identify specific fields of interest is also useful. So, for example, you might describe the input as "whitespace delimited fields". In this case, that is actually mostly inaccurate, since one field evidently has embedded whitespace. Now, then, the challenge is to unambiguously describe how to break down the elements of the input. It looks like we can still use the concept of whitespace-delimited fields if we measure the location of the fields in different ways, and perhaps in terms of what we do not want. It looks like we do not want the first field. It looks like we do not want the last 5 fields. And, finally we want to print the result in a different order from the input data. If this is an accurate description of the problem, then the regular expressions and field-indexing gymnastics have practically written themselves. You have said that you aren't up to the task of seeing this as a Perl problem, but I can tell you that if the description of the method for solving the problem is correct, then Perl has some constructs and elements that are particularly well suited to this problem. So, is the description I presented accurate? Should we proceed on to the solution? --- rod. |
ilikejam, thanks for your help.
theNbomr, I will fully admit that I'm learning as I go here. I was actually hoping for - as I said - a pointer, not someone to spoonfeed me the code (I can't deny that it saved me a lot of time). Even in your own post you gave me the pointers of field indexing, and regular expressions. So to make the question more clear... I have this in a file... 2097772 81264 BOOT 1983603 4/30/2007 1 $2.30 $2.30 2612268 023031COUPLING COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 2032363 6/25/2007 1 $4.60 $4.60 266586 60583203 BULB, PANEL LIGHT 2008627 5/29/2007 1 $0.50 $0.50 1995423 SP16F COLLAR, SPLIT 2 PIECE 1935593 3/9/2007 2 $3.80 $7.60 I need to get to this... BOOT 81264 COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 023031COUPLING BULB, PANEL LIGHT 60583203 COUPLING, SPLIT 2 PIECE SP16F This is the code that I used which is quite messy, and I would like a pointer or suggestion as to how to make my code less messy. I do not want someone to write the code for me, i would like a pointer in the right direction. for filename in *; do awk '$4 ~ /\$/' $filename | awk '$5 ~ /\$/ {print $3,$2}' > a awk '$5 ~ /\$/' $filename | awk '$6 ~ /\$/ {print $3,$4,$2}' > b awk '$6 ~ /\$/' $filename | awk '$7 ~ /\$/ {print $3,$4,$5,$2}' > c awk '$7 ~ /\$/' $filename | awk '$8 ~ /\$/ {print $3,$4,$5,$6,$2}' > d cat a b c d > $filename done It appears that there are some extra spaces at the end of each line. I don't know if that makes any difference, but it might be helpful. So as best as I can describe, I want to remove the first, and last 5 fields, then place the second field at the end of the line. To be more accurate still, the file has a "field range" from 5 to 20. So my awk list is very bulky. As I already stated, my code works, I would like some pointers on how to clean it up. How should I proceed? I'm rather new to linux so right now perl is not on the table, but when the time comes, I will be asking about it too. Thanks for your help, Zim |
Quote:
Code:
$NF Code:
print $(NF-i) Code:
for( i = 0; i < NF; i++ ){ While your code is somewhat 'messy', if it works, there's nothing wrong with it. It is good to try to improve upon your work, and learn new things. --- rod. |
All times are GMT -5. The time now is 08:15 AM. |