LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-20-2009, 09:10 AM   #1
ZimMonkey
LQ Newbie
 
Registered: Jun 2009
Posts: 13

Rep: Reputation: 0
How do I - not print - in awk?


I'm trying to clean up a fairly messy script and need a pointer in the right direction. Here's what i'm working with...

file
gibberish 5 Monkey $gibberish $gibberish
gibberish 8 Santa Claus $gibberish $gibberish
gibberish 2 Evil Robot Army $gibberish $gibberish
gibberish 7 Global Thermal Nuclear War $gibberish $gibberish

I want to get rid of the gibberish, (and $gibberish). Here's what I did (and it's messy)

for filename in *; do
awk '$4 ~ /\$/' $filename | awk '$5 ~ /\$/ {print $3,$2}' > a
awk '$5 ~ /\$/' $filename | awk '$6 ~ /\$/ {print $3,$4,$2}' > b
awk '$6 ~ /\$/' $filename | awk '$7 ~ /\$/ {print $3,$4,$5,$2}' > c
awk '$7 ~ /\$/' $filename | awk '$8 ~ /\$/ {print $3,$4,$5,$6,$2}' > d
cat a b c d > $filename
done

In my case the awk list is actually much longer because the gibberish extends out to 15 fields. My only saving grace is that the pattern is the same, where I don't want the first, or last 2 fields. Is there a cleaner way to use awk so it prints everything but the first and last 2 fields? Or am I stuck with the ugly mess?

Thanks

Zim

Last edited by ZimMonkey; 08-20-2009 at 10:21 AM. Reason: It needed it
 
Old 08-20-2009, 09:22 AM   #2
centosboy
Senior Member
 
Registered: May 2009
Location: london
Distribution: centos5
Posts: 1,137

Rep: Reputation: 116Reputation: 116
Quote:
Originally Posted by ZimMonkey View Post
I thought I posted this yesterday, but it's been about 18 hours and the post hasn't shown up, so I guess i didn't hit send. If I'm on a delay for being a noob, then please delete my first question.

I'm trying to clean up a fairly messy script and need a pointer in the right direction. Here's what i'm working with...

file
gibberish 5 Monkey $gibberish $gibberish
gibberish 8 Santa Claus $gibberish $gibberish
gibberish 2 Evil Robot Army $gibberish $gibberish
gibberish 7 Global Thermal Nuclear War $gibberish $gibberish

I want to get rid of the gibberish, (and $gibberish). Here's what I did (and it's messy)

for filename in *; do
awk '$4 ~ /\$/' $filename | awk '$5 ~ /\$/ {print $3,$2}' > a
awk '$5 ~ /\$/' $filename | awk '$6 ~ /\$/ {print $3,$4,$2}' > b
awk '$6 ~ /\$/' $filename | awk '$7 ~ /\$/ {print $3,$4,$5,$2}' > c
awk '$7 ~ /\$/' $filename | awk '$8 ~ /\$/ {print $3,$4,$5,$6,$2}' > d
cat a b c d > $filename
done

In my case the awk list is actually much longer because the gibberish extends out to 15 fields. My only saving grace is that the pattern is the same, where I don't want the first, or last 2 fields. Is there a cleaner way to use awk so it prints everything but the first and last 2 fields? Or am I stuck with the ugly mess?

Thanks

Zim

this might help - perl in line edit.
shorter, quicker, cleaner

Code:
cat filename | perl -ne's/(gibberish |\$gibberish)//g;print'

or

in line if you confident... with .bak backs up orig.

Code:
perl -pi.bak 's/(gibberish |\$gibberish)//g' filename

Last edited by centosboy; 08-20-2009 at 09:24 AM.
 
Old 08-20-2009, 09:27 AM   #3
ilikejam
Senior Member
 
Registered: Aug 2003
Location: Glasgow
Distribution: Fedora / Solaris
Posts: 3,109

Rep: Reputation: 97
Hi.

How about something like:
$ awk '{ for (i=3;i<(NF-2);i++) { printf "%s ", $i }; if (i == (NF-2)) print $i }' /path/to/input/file

Dave

Last edited by ilikejam; 08-20-2009 at 09:34 AM.
 
Old 08-20-2009, 06:00 PM   #4
ZimMonkey
LQ Newbie
 
Registered: Jun 2009
Posts: 13

Original Poster
Rep: Reputation: 0
Thanks for the replies.

ilikejam, i tried your code and it removed the first and last fields leaving the second to last field of gibberish still there. I'll try to do some tweaking.

centosboy, I just don't kow enough about perl to go down that road just yet. I'm still trying to get a handle on awk, so it will be a little while before I make that jump.

Thank you both.
 
Old 08-20-2009, 06:37 PM   #5
ilikejam
Senior Member
 
Registered: Aug 2003
Location: Glasgow
Distribution: Fedora / Solaris
Posts: 3,109

Rep: Reputation: 97
Uh, that's odd. Using the 'file' you gave in your original post, I get back:
Code:
Monkey
Santa Claus
Evil Robot Army
Global Thermal Nuclear War
from the awk line I posted.

Dave
 
Old 08-20-2009, 07:36 PM   #6
ZimMonkey
LQ Newbie
 
Registered: Jun 2009
Posts: 13

Original Poster
Rep: Reputation: 0
ikilejam, thanks again for your response. I seem to be having a few issues with this. The "file" that I gave was obviously a generalization of the problem that I'm having. To be more accurate, the files that I'm trying to make neater look like this...

2097772 81264 BOOT 1983603 4/30/2007 1 $2.30 $2.30
2612268 023031COUPLING COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 2032363 6/25/2007 1 $4.60 $4.60
266586 60583203 BULB, PANEL LIGHT 2008627 5/29/2007 1 $0.50 $0.50
1995423 SP16F COLLAR, SPLIT 2 PIECE 1935593 3/9/2007 2 $3.80 $7.60

Where the outcome needs to be effectively description, then part #

BOOT 81264
COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 023031COUPLING
BULB, PANEL LIGHT 60583203
COUPLING, SPLIT 2 PIECE SP16F

When I use the script you wrote, (for the first one) I get

BOOT 1983603 4/30/2007 1 $2.30

Field 1, 2, and the final field are removed. When I did the copy paste to get this on here, I noticed there are spaces after the final number. I don't know if that has anything to do with anything. I do know that when I use my lengthy code it does work. I wouldn't think that the length of the lines needed to be in the correct order - or do they? Do the lines have to go sequentially in length for this to work - 4 fields, 5, 6, 7 as they were in my post?

Sorry for not making things more clear from the start, I was hoping to be able to learn from your example, and tweak it to suit my needs. Apparently my vagueness caused confusion. I'll keep on trying.

Thanks again,

Zim
 
Old 08-21-2009, 05:12 AM   #7
centosboy
Senior Member
 
Registered: May 2009
Location: london
Distribution: centos5
Posts: 1,137

Rep: Reputation: 116Reputation: 116
Quote:
Originally Posted by ZimMonkey View Post
Thanks for the replies.

ilikejam, i tried your code and it removed the first and last fields leaving the second to last field of gibberish still there. I'll try to do some tweaking.

centosboy, I just don't kow enough about perl to go down that road just yet. I'm still trying to get a handle on awk, so it will be a little while before I make that jump.

Thank you both.
]#


fair enough, but in my example, you dont need to know too much about perl, but about regexp, and just knowing what the extra perl flags mean, which perl -h tells anyway.
 
Old 08-21-2009, 05:33 AM   #8
ilikejam
Senior Member
 
Registered: Aug 2003
Location: Glasgow
Distribution: Fedora / Solaris
Posts: 3,109

Rep: Reputation: 97
Ah. OK.

Try:
Code:
awk '{ for (i=3;i<(NF-4);i++) { printf "%s ",$i }; print $2 }'
 
Old 08-21-2009, 09:33 AM   #9
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
Originally Posted by ZimMonkey View Post
The "file" that I gave was obviously a generalization of the problem that I'm having. To be more accurate, the files that I'm trying to make neater look like this...

2097772 81264 BOOT 1983603 4/30/2007 1 $2.30 $2.30
2612268 023031COUPLING COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 2032363 6/25/2007 1 $4.60 $4.60
266586 60583203 BULB, PANEL LIGHT 2008627 5/29/2007 1 $0.50 $0.50
1995423 SP16F COLLAR, SPLIT 2 PIECE 1935593 3/9/2007 2 $3.80 $7.60

Where the outcome needs to be effectively description, then part #

BOOT 81264
COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 023031COUPLING
BULB, PANEL LIGHT 60583203
COUPLING, SPLIT 2 PIECE SP16F

When I use the script you wrote, (for the first one) I get

BOOT 1983603 4/30/2007 1 $2.30
Your edited sample may have been obvious to you, but to everyone else, it wasn't.

When trying to solve these kinds of problems, it is very useful to express the requirements in terms of how you might perform the modifications if you were doing it manually. Using language that describes the input in terms of fields is a good start, and describing how one might identify specific fields of interest is also useful.
So, for example, you might describe the input as "whitespace delimited fields". In this case, that is actually mostly inaccurate, since one field evidently has embedded whitespace. Now, then, the challenge is to unambiguously describe how to break down the elements of the input. It looks like we can still use the concept of whitespace-delimited fields if we measure the location of the fields in different ways, and perhaps in terms of what we do not want. It looks like we do not want the first field. It looks like we do not want the last 5 fields. And, finally we want to print the result in a different order from the input data.

If this is an accurate description of the problem, then the regular expressions and field-indexing gymnastics have practically written themselves. You have said that you aren't up to the task of seeing this as a Perl problem, but I can tell you that if the description of the method for solving the problem is correct, then Perl has some constructs and elements that are particularly well suited to this problem. So, is the description I presented accurate? Should we proceed on to the solution?

--- rod.
 
Old 08-21-2009, 01:27 PM   #10
ZimMonkey
LQ Newbie
 
Registered: Jun 2009
Posts: 13

Original Poster
Rep: Reputation: 0
ilikejam, thanks for your help.

theNbomr, I will fully admit that I'm learning as I go here. I was actually hoping for - as I said - a pointer, not someone to spoonfeed me the code (I can't deny that it saved me a lot of time). Even in your own post you gave me the pointers of field indexing, and regular expressions.

So to make the question more clear...

I have this in a file...

2097772 81264 BOOT 1983603 4/30/2007 1 $2.30 $2.30
2612268 023031COUPLING COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 2032363 6/25/2007 1 $4.60 $4.60
266586 60583203 BULB, PANEL LIGHT 2008627 5/29/2007 1 $0.50 $0.50
1995423 SP16F COLLAR, SPLIT 2 PIECE 1935593 3/9/2007 2 $3.80 $7.60

I need to get to this...

BOOT 81264
COUPLING, SPLINED HYDRAULIC MOTOR BRIDGE 023031COUPLING
BULB, PANEL LIGHT 60583203
COUPLING, SPLIT 2 PIECE SP16F

This is the code that I used which is quite messy, and I would like a pointer or suggestion as to how to make my code less messy. I do not want someone to write the code for me, i would like a pointer in the right direction.

for filename in *; do
awk '$4 ~ /\$/' $filename | awk '$5 ~ /\$/ {print $3,$2}' > a
awk '$5 ~ /\$/' $filename | awk '$6 ~ /\$/ {print $3,$4,$2}' > b
awk '$6 ~ /\$/' $filename | awk '$7 ~ /\$/ {print $3,$4,$5,$2}' > c
awk '$7 ~ /\$/' $filename | awk '$8 ~ /\$/ {print $3,$4,$5,$6,$2}' > d
cat a b c d > $filename
done

It appears that there are some extra spaces at the end of each line. I don't know if that makes any difference, but it might be helpful. So as best as I can describe, I want to remove the first, and last 5 fields, then place the second field at the end of the line. To be more accurate still, the file has a "field range" from 5 to 20. So my awk list is very bulky. As I already stated, my code works, I would like some pointers on how to clean it up. How should I proceed?

I'm rather new to linux so right now perl is not on the table, but when the time comes, I will be asking about it too.

Thanks for your help,

Zim
 
Old 08-21-2009, 07:11 PM   #11
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
So as best as I can describe, I want to remove the first, and last 5 fields, then place the second field at the end of the line. To be more accurate still, the file has a "field range" from 5 to 20.
Perfect. You have described in unambiguous terms what you wish to do. Now, you simply have to translate those terms to code (being unambiguous really helps with that part). The tricky part of your problem is that there are a variable number of fields, and you want to reference the fields numbering backward from the last field. You can index your fields using awk's builtin 'NF' variable. It will index the last field. Indexing the second last, third last, etc would involve indexing with 'NF-1', 'NF-2', etc. So, your fields are named like
Code:
$NF
 or
$(NF-0)
 or
$(NF-5)
Replacing the '0' with a variable 'i', you can print something like
Code:
print $(NF-i)
Since i is a variable, you can modify it, such as using it as a loop counter:
Code:
for( i = 0; i < NF; i++ ){

}
I could put it all together for you, but there is enough there to point you in the right direction, and still leave plenty of room for learning.

While your code is somewhat 'messy', if it works, there's nothing wrong with it. It is good to try to improve upon your work, and learn new things.

--- rod.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to print an apostrophe (') in a shell script using awk? skuz_ball Programming 11 03-10-2012 08:26 AM
How do I print the filename in awk? Disillusionist Programming 2 12-02-2008 12:20 PM
How to parameterize which field awk should print? dbland07666 Linux - Newbie 2 10-29-2007 03:49 PM
Using awk to print CLI-version of kaddressbook ? Yalla-One Programming 6 11-05-2006 02:51 PM
awk print lines that doesn't have a pattern huynguye Programming 5 05-04-2006 11:08 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:32 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration