ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm trying to clean up a fairly messy script and need a pointer in the right direction. Here's what i'm working with...
file
gibberish 5 Monkey $gibberish $gibberish
gibberish 8 Santa Claus $gibberish $gibberish
gibberish 2 Evil Robot Army $gibberish $gibberish
gibberish 7 Global Thermal Nuclear War $gibberish $gibberish
I want to get rid of the gibberish, (and $gibberish). Here's what I did (and it's messy)
for filename in *; do
awk '$4 ~ /\$/' $filename | awk '$5 ~ /\$/ {print $3,$2}' > a
awk '$5 ~ /\$/' $filename | awk '$6 ~ /\$/ {print $3,$4,$2}' > b
awk '$6 ~ /\$/' $filename | awk '$7 ~ /\$/ {print $3,$4,$5,$2}' > c
awk '$7 ~ /\$/' $filename | awk '$8 ~ /\$/ {print $3,$4,$5,$6,$2}' > d
cat a b c d > $filename
done
In my case the awk list is actually much longer because the gibberish extends out to 15 fields. My only saving grace is that the pattern is the same, where I don't want the first, or last 2 fields. Is there a cleaner way to use awk so it prints everything but the first and last 2 fields? Or am I stuck with the ugly mess?
Thanks
Zim
Last edited by ZimMonkey; 08-20-2009 at 10:21 AM.
Reason: It needed it
I thought I posted this yesterday, but it's been about 18 hours and the post hasn't shown up, so I guess i didn't hit send. If I'm on a delay for being a noob, then please delete my first question.
I'm trying to clean up a fairly messy script and need a pointer in the right direction. Here's what i'm working with...
file
gibberish 5 Monkey $gibberish $gibberish
gibberish 8 Santa Claus $gibberish $gibberish
gibberish 2 Evil Robot Army $gibberish $gibberish
gibberish 7 Global Thermal Nuclear War $gibberish $gibberish
I want to get rid of the gibberish, (and $gibberish). Here's what I did (and it's messy)
for filename in *; do
awk '$4 ~ /\$/' $filename | awk '$5 ~ /\$/ {print $3,$2}' > a
awk '$5 ~ /\$/' $filename | awk '$6 ~ /\$/ {print $3,$4,$2}' > b
awk '$6 ~ /\$/' $filename | awk '$7 ~ /\$/ {print $3,$4,$5,$2}' > c
awk '$7 ~ /\$/' $filename | awk '$8 ~ /\$/ {print $3,$4,$5,$6,$2}' > d
cat a b c d > $filename
done
In my case the awk list is actually much longer because the gibberish extends out to 15 fields. My only saving grace is that the pattern is the same, where I don't want the first, or last 2 fields. Is there a cleaner way to use awk so it prints everything but the first and last 2 fields? Or am I stuck with the ugly mess?
Thanks
Zim
this might help - perl in line edit.
shorter, quicker, cleaner
ilikejam, i tried your code and it removed the first and last fields leaving the second to last field of gibberish still there. I'll try to do some tweaking.
centosboy, I just don't kow enough about perl to go down that road just yet. I'm still trying to get a handle on awk, so it will be a little while before I make that jump.
ikilejam, thanks again for your response. I seem to be having a few issues with this. The "file" that I gave was obviously a generalization of the problem that I'm having. To be more accurate, the files that I'm trying to make neater look like this...
When I use the script you wrote, (for the first one) I get
BOOT 1983603 4/30/2007 1 $2.30
Field 1, 2, and the final field are removed. When I did the copy paste to get this on here, I noticed there are spaces after the final number. I don't know if that has anything to do with anything. I do know that when I use my lengthy code it does work. I wouldn't think that the length of the lines needed to be in the correct order - or do they? Do the lines have to go sequentially in length for this to work - 4 fields, 5, 6, 7 as they were in my post?
Sorry for not making things more clear from the start, I was hoping to be able to learn from your example, and tweak it to suit my needs. Apparently my vagueness caused confusion. I'll keep on trying.
ilikejam, i tried your code and it removed the first and last fields leaving the second to last field of gibberish still there. I'll try to do some tweaking.
centosboy, I just don't kow enough about perl to go down that road just yet. I'm still trying to get a handle on awk, so it will be a little while before I make that jump.
Thank you both.
]#
fair enough, but in my example, you dont need to know too much about perl, but about regexp, and just knowing what the extra perl flags mean, which perl -h tells anyway.
The "file" that I gave was obviously a generalization of the problem that I'm having. To be more accurate, the files that I'm trying to make neater look like this...
When I use the script you wrote, (for the first one) I get
BOOT 1983603 4/30/2007 1 $2.30
Your edited sample may have been obvious to you, but to everyone else, it wasn't.
When trying to solve these kinds of problems, it is very useful to express the requirements in terms of how you might perform the modifications if you were doing it manually. Using language that describes the input in terms of fields is a good start, and describing how one might identify specific fields of interest is also useful.
So, for example, you might describe the input as "whitespace delimited fields". In this case, that is actually mostly inaccurate, since one field evidently has embedded whitespace. Now, then, the challenge is to unambiguously describe how to break down the elements of the input. It looks like we can still use the concept of whitespace-delimited fields if we measure the location of the fields in different ways, and perhaps in terms of what we do not want. It looks like we do not want the first field. It looks like we do not want the last 5 fields. And, finally we want to print the result in a different order from the input data.
If this is an accurate description of the problem, then the regular expressions and field-indexing gymnastics have practically written themselves. You have said that you aren't up to the task of seeing this as a Perl problem, but I can tell you that if the description of the method for solving the problem is correct, then Perl has some constructs and elements that are particularly well suited to this problem. So, is the description I presented accurate? Should we proceed on to the solution?
theNbomr, I will fully admit that I'm learning as I go here. I was actually hoping for - as I said - a pointer, not someone to spoonfeed me the code (I can't deny that it saved me a lot of time). Even in your own post you gave me the pointers of field indexing, and regular expressions.
This is the code that I used which is quite messy, and I would like a pointer or suggestion as to how to make my code less messy. I do not want someone to write the code for me, i would like a pointer in the right direction.
for filename in *; do
awk '$4 ~ /\$/' $filename | awk '$5 ~ /\$/ {print $3,$2}' > a
awk '$5 ~ /\$/' $filename | awk '$6 ~ /\$/ {print $3,$4,$2}' > b
awk '$6 ~ /\$/' $filename | awk '$7 ~ /\$/ {print $3,$4,$5,$2}' > c
awk '$7 ~ /\$/' $filename | awk '$8 ~ /\$/ {print $3,$4,$5,$6,$2}' > d
cat a b c d > $filename
done
It appears that there are some extra spaces at the end of each line. I don't know if that makes any difference, but it might be helpful. So as best as I can describe, I want to remove the first, and last 5 fields, then place the second field at the end of the line. To be more accurate still, the file has a "field range" from 5 to 20. So my awk list is very bulky. As I already stated, my code works, I would like some pointers on how to clean it up. How should I proceed?
I'm rather new to linux so right now perl is not on the table, but when the time comes, I will be asking about it too.
So as best as I can describe, I want to remove the first, and last 5 fields, then place the second field at the end of the line. To be more accurate still, the file has a "field range" from 5 to 20.
Perfect. You have described in unambiguous terms what you wish to do. Now, you simply have to translate those terms to code (being unambiguous really helps with that part). The tricky part of your problem is that there are a variable number of fields, and you want to reference the fields numbering backward from the last field. You can index your fields using awk's builtin 'NF' variable. It will index the last field. Indexing the second last, third last, etc would involve indexing with 'NF-1', 'NF-2', etc. So, your fields are named like
Code:
$NF
or
$(NF-0)
or
$(NF-5)
Replacing the '0' with a variable 'i', you can print something like
Code:
print $(NF-i)
Since i is a variable, you can modify it, such as using it as a loop counter:
Code:
for( i = 0; i < NF; i++ ){
}
I could put it all together for you, but there is enough there to point you in the right direction, and still leave plenty of room for learning.
While your code is somewhat 'messy', if it works, there's nothing wrong with it. It is good to try to improve upon your work, and learn new things.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.