Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
08-20-2009, 09:10 AM
|
#1
|
LQ Newbie
Registered: Jun 2009
Posts: 13
Rep:
|
How do I - not print - in awk?
I'm trying to clean up a fairly messy script and need a pointer in the right direction. Here's what i'm working with...
file
gibberish 5 Monkey $gibberish $gibberish
gibberish 8 Santa Claus $gibberish $gibberish
gibberish 2 Evil Robot Army $gibberish $gibberish
gibberish 7 Global Thermal Nuclear War $gibberish $gibberish
I want to get rid of the gibberish, (and $gibberish). Here's what I did (and it's messy)
for filename in *; do
awk '$4 ~ /\$/' $filename | awk '$5 ~ /\$/ {print $3,$2}' > a
awk '$5 ~ /\$/' $filename | awk '$6 ~ /\$/ {print $3,$4,$2}' > b
awk '$6 ~ /\$/' $filename | awk '$7 ~ /\$/ {print $3,$4,$5,$2}' > c
awk '$7 ~ /\$/' $filename | awk '$8 ~ /\$/ {print $3,$4,$5,$6,$2}' > d
cat a b c d > $filename
done
In my case the awk list is actually much longer because the gibberish extends out to 15 fields. My only saving grace is that the pattern is the same, where I don't want the first, or last 2 fields. Is there a cleaner way to use awk so it prints everything but the first and last 2 fields? Or am I stuck with the ugly mess?
Thanks
Zim
Last edited by ZimMonkey; 08-20-2009 at 10:21 AM.
Reason: It needed it
|
|
|
08-20-2009, 09:22 AM
|
#2
|
Senior Member
Registered: May 2009
Location: london
Distribution: centos5
Posts: 1,137
Rep:
|
Quote:
Originally Posted by ZimMonkey
I thought I posted this yesterday, but it's been about 18 hours and the post hasn't shown up, so I guess i didn't hit send. If I'm on a delay for being a noob, then please delete my first question.
I'm trying to clean up a fairly messy script and need a pointer in the right direction. Here's what i'm working with...
file
gibberish 5 Monkey $gibberish $gibberish
gibberish 8 Santa Claus $gibberish $gibberish
gibberish 2 Evil Robot Army $gibberish $gibberish
gibberish 7 Global Thermal Nuclear War $gibberish $gibberish
I want to get rid of the gibberish, (and $gibberish). Here's what I did (and it's messy)
for filename in *; do
awk '$4 ~ /\$/' $filename | awk '$5 ~ /\$/ {print $3,$2}' > a
awk '$5 ~ /\$/' $filename | awk '$6 ~ /\$/ {print $3,$4,$2}' > b
awk '$6 ~ /\$/' $filename | awk '$7 ~ /\$/ {print $3,$4,$5,$2}' > c
awk '$7 ~ /\$/' $filename | awk '$8 ~ /\$/ {print $3,$4,$5,$6,$2}' > d
cat a b c d > $filename
done
In my case the awk list is actually much longer because the gibberish extends out to 15 fields. My only saving grace is that the pattern is the same, where I don't want the first, or last 2 fields. Is there a cleaner way to use awk so it prints everything but the first and last 2 fields? Or am I stuck with the ugly mess?
Thanks
Zim
|
this might help - perl in line edit.
shorter, quicker, cleaner
Code:
cat filename | perl -ne's/(gibberish |\$gibberish)//g;print'
or
in line if you confident... with .bak backs up orig.
Code:
perl -pi.bak 's/(gibberish |\$gibberish)//g' filename
Last edited by centosboy; 08-20-2009 at 09:24 AM.
|
|
|
08-20-2009, 09:27 AM
|
#3
|
Senior Member
Registered: Aug 2003
Location: Glasgow
Distribution: Fedora / Solaris
Posts: 3,109
Rep:
|
Hi.
How about something like:
$ awk '{ for (i=3;i<(NF-2);i++) { printf "%s ", $i }; if (i == (NF-2)) print $i }' /path/to/input/file
Dave
Last edited by ilikejam; 08-20-2009 at 09:34 AM.
|
|
|
08-20-2009, 06:00 PM
|
#4
|
LQ Newbie
Registered: Jun 2009
Posts: 13
Original Poster
Rep:
|
Thanks for the replies.
ilikejam, i tried your code and it removed the first and last fields leaving the second to last field of gibberish still there. I'll try to do some tweaking.
centosboy, I just don't kow enough about perl to go down that road just yet. I'm still trying to get a handle on awk, so it will be a little while before I make that jump.
Thank you both.
|
|
|
08-20-2009, 06:37 PM
|
#5
|
Senior Member
Registered: Aug 2003
Location: Glasgow
Distribution: Fedora / Solaris
Posts: 3,109
Rep:
|
Uh, that's odd. Using the 'file' you gave in your original post, I get back:
Code:
Monkey
Santa Claus
Evil Robot Army
Global Thermal Nuclear War
from the awk line I posted.
Dave
|
|
|
08-20-2009, 07:36 PM
|
#6
|
LQ Newbie
Registered: Jun 2009
Posts: 13
Original Poster
Rep:
|
ikilejam, thanks again for your response. I seem to be having a few issues with this. The "file" that I gave was obviously a generalization of the problem that I'm having. To be more accurate, the files that I'm trying to make neater look like this...
2097772 81264 BOOT 1983603 4/30/2007 1 $2.30 $2.30
2612268 023031COUPLING COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 2032363 6/25/2007 1 $4.60 $4.60
266586 60583203 BULB, PANEL LIGHT 2008627 5/29/2007 1 $0.50 $0.50
1995423 SP16F COLLAR, SPLIT 2 PIECE 1935593 3/9/2007 2 $3.80 $7.60
Where the outcome needs to be effectively description, then part #
BOOT 81264
COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 023031COUPLING
BULB, PANEL LIGHT 60583203
COUPLING, SPLIT 2 PIECE SP16F
When I use the script you wrote, (for the first one) I get
BOOT 1983603 4/30/2007 1 $2.30
Field 1, 2, and the final field are removed. When I did the copy paste to get this on here, I noticed there are spaces after the final number. I don't know if that has anything to do with anything. I do know that when I use my lengthy code it does work. I wouldn't think that the length of the lines needed to be in the correct order - or do they? Do the lines have to go sequentially in length for this to work - 4 fields, 5, 6, 7 as they were in my post?
Sorry for not making things more clear from the start, I was hoping to be able to learn from your example, and tweak it to suit my needs. Apparently my vagueness caused confusion. I'll keep on trying.
Thanks again,
Zim
|
|
|
08-21-2009, 05:12 AM
|
#7
|
Senior Member
Registered: May 2009
Location: london
Distribution: centos5
Posts: 1,137
Rep:
|
Quote:
Originally Posted by ZimMonkey
Thanks for the replies.
ilikejam, i tried your code and it removed the first and last fields leaving the second to last field of gibberish still there. I'll try to do some tweaking.
centosboy, I just don't kow enough about perl to go down that road just yet. I'm still trying to get a handle on awk, so it will be a little while before I make that jump.
Thank you both.
|
]#
fair enough, but in my example, you dont need to know too much about perl, but about regexp, and just knowing what the extra perl flags mean, which perl -h tells anyway.
|
|
|
08-21-2009, 05:33 AM
|
#8
|
Senior Member
Registered: Aug 2003
Location: Glasgow
Distribution: Fedora / Solaris
Posts: 3,109
Rep:
|
Ah. OK.
Try:
Code:
awk '{ for (i=3;i<(NF-4);i++) { printf "%s ",$i }; print $2 }'
|
|
|
08-21-2009, 09:33 AM
|
#9
|
LQ 5k Club
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
|
Quote:
Originally Posted by ZimMonkey
The "file" that I gave was obviously a generalization of the problem that I'm having. To be more accurate, the files that I'm trying to make neater look like this...
2097772 81264 BOOT 1983603 4/30/2007 1 $2.30 $2.30
2612268 023031COUPLING COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 2032363 6/25/2007 1 $4.60 $4.60
266586 60583203 BULB, PANEL LIGHT 2008627 5/29/2007 1 $0.50 $0.50
1995423 SP16F COLLAR, SPLIT 2 PIECE 1935593 3/9/2007 2 $3.80 $7.60
Where the outcome needs to be effectively description, then part #
BOOT 81264
COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 023031COUPLING
BULB, PANEL LIGHT 60583203
COUPLING, SPLIT 2 PIECE SP16F
When I use the script you wrote, (for the first one) I get
BOOT 1983603 4/30/2007 1 $2.30
|
Your edited sample may have been obvious to you, but to everyone else, it wasn't.
When trying to solve these kinds of problems, it is very useful to express the requirements in terms of how you might perform the modifications if you were doing it manually. Using language that describes the input in terms of fields is a good start, and describing how one might identify specific fields of interest is also useful.
So, for example, you might describe the input as "whitespace delimited fields". In this case, that is actually mostly inaccurate, since one field evidently has embedded whitespace. Now, then, the challenge is to unambiguously describe how to break down the elements of the input. It looks like we can still use the concept of whitespace-delimited fields if we measure the location of the fields in different ways, and perhaps in terms of what we do not want. It looks like we do not want the first field. It looks like we do not want the last 5 fields. And, finally we want to print the result in a different order from the input data.
If this is an accurate description of the problem, then the regular expressions and field-indexing gymnastics have practically written themselves. You have said that you aren't up to the task of seeing this as a Perl problem, but I can tell you that if the description of the method for solving the problem is correct, then Perl has some constructs and elements that are particularly well suited to this problem. So, is the description I presented accurate? Should we proceed on to the solution?
--- rod.
|
|
|
08-21-2009, 01:27 PM
|
#10
|
LQ Newbie
Registered: Jun 2009
Posts: 13
Original Poster
Rep:
|
ilikejam, thanks for your help.
theNbomr, I will fully admit that I'm learning as I go here. I was actually hoping for - as I said - a pointer, not someone to spoonfeed me the code (I can't deny that it saved me a lot of time). Even in your own post you gave me the pointers of field indexing, and regular expressions.
So to make the question more clear...
I have this in a file...
2097772 81264 BOOT 1983603 4/30/2007 1 $2.30 $2.30
2612268 023031COUPLING COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 2032363 6/25/2007 1 $4.60 $4.60
266586 60583203 BULB, PANEL LIGHT 2008627 5/29/2007 1 $0.50 $0.50
1995423 SP16F COLLAR, SPLIT 2 PIECE 1935593 3/9/2007 2 $3.80 $7.60
I need to get to this...
BOOT 81264
COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 023031COUPLING
BULB, PANEL LIGHT 60583203
COUPLING, SPLIT 2 PIECE SP16F
This is the code that I used which is quite messy, and I would like a pointer or suggestion as to how to make my code less messy. I do not want someone to write the code for me, i would like a pointer in the right direction.
for filename in *; do
awk '$4 ~ /\$/' $filename | awk '$5 ~ /\$/ {print $3,$2}' > a
awk '$5 ~ /\$/' $filename | awk '$6 ~ /\$/ {print $3,$4,$2}' > b
awk '$6 ~ /\$/' $filename | awk '$7 ~ /\$/ {print $3,$4,$5,$2}' > c
awk '$7 ~ /\$/' $filename | awk '$8 ~ /\$/ {print $3,$4,$5,$6,$2}' > d
cat a b c d > $filename
done
It appears that there are some extra spaces at the end of each line. I don't know if that makes any difference, but it might be helpful. So as best as I can describe, I want to remove the first, and last 5 fields, then place the second field at the end of the line. To be more accurate still, the file has a "field range" from 5 to 20. So my awk list is very bulky. As I already stated, my code works, I would like some pointers on how to clean it up. How should I proceed?
I'm rather new to linux so right now perl is not on the table, but when the time comes, I will be asking about it too.
Thanks for your help,
Zim
|
|
|
08-21-2009, 07:11 PM
|
#11
|
LQ 5k Club
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
|
Quote:
So as best as I can describe, I want to remove the first, and last 5 fields, then place the second field at the end of the line. To be more accurate still, the file has a "field range" from 5 to 20.
|
Perfect. You have described in unambiguous terms what you wish to do. Now, you simply have to translate those terms to code (being unambiguous really helps with that part). The tricky part of your problem is that there are a variable number of fields, and you want to reference the fields numbering backward from the last field. You can index your fields using awk's builtin 'NF' variable. It will index the last field. Indexing the second last, third last, etc would involve indexing with 'NF-1', 'NF-2', etc. So, your fields are named like
Code:
$NF
or
$(NF-0)
or
$(NF-5)
Replacing the '0' with a variable 'i', you can print something like
Since i is a variable, you can modify it, such as using it as a loop counter:
Code:
for( i = 0; i < NF; i++ ){
}
I could put it all together for you, but there is enough there to point you in the right direction, and still leave plenty of room for learning.
While your code is somewhat 'messy', if it works, there's nothing wrong with it. It is good to try to improve upon your work, and learn new things.
--- rod.
|
|
|
All times are GMT -5. The time now is 06:32 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|