Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I wish to find lines in a file (e.g. filename1 ) that match ($2) field number two (e.g. HIST), then using that as a key to replace ($6) field six on any lines that have (99999) in field six.
The rather clumsy awk I have below does this except when it hits a line that has HIST but also has PERI on it, it then re-replaces the field six with a new value ending up with 1000 rather than 7352. My $2 match below doesn't seem to work.
Basically I want to force the awk to only replace on the line with Field two = HIST (and about 385 other line matches).
you will choose HIST (for example) in field 2, then you need change 99999 (for example) in field 6 by Hyst, in all lines which field 2 is HIST.
If my think is correct, you can try this script: (where test.txt is your old file and test2.txt is the new file with changes)
#! /bin/bash
echo "type what you want put in field 6"
read xy
echo "what you need find in field 2?"
read xz
echo "$xy will replace what?"
read xx
while read linha;
do
v1=$(echo $linha | cut -d, -f1)
v2=$(echo $linha | cut -d, -f2)
v3=$(echo $linha | cut -d, -f3)
v4=$(echo $linha | cut -d, -f4)
v5=$(echo $linha | cut -d, -f5)
v6=$(echo $linha | cut -d, -f6)
v7=$(echo $linha | cut -d, -f7)
v8=$(echo $linha | cut -d, -f8)
if test $v2 = $xz;
then
if test $v6 = $xx;
then
v6=$xy
fi
fi
echo "$v1,$v2,$v3,$v4,$v5,$v6" >> test2.txt
done < test.txt
echo "finished"
Last edited by GioBarr; 05-10-2012 at 11:19 PM.
Reason: just a improvement in this script
Thanks GioBarr. But not quite.
I have a file with about 81198 lines that I need to update field 6 depending on what is in field 2 and then field 6 e.g. IF field 2 has the code HIST in it (and it may not) then I will replace the contents of field 6 with a number - BUT - only if it is equal to a certain number (in this case 99999).
e.g.
if field 2 is HIST and field 6 is 99999 then change 99999 to 7352.
if field 2 is HIST and field 6 is not 99999 the do not chnge field 6
if field 2 is not HIST it will be another alpha code to match e.g. PERI
if field 2 if PERI and field 6 is 99999 then change ... and so on ...
(for 385 alpha codes that could possibly be in field 2)
Unfortunately the alpha code in field 2 CAN also be in field 7 - hence needing to specifically match field 2 rather then the whole line.
This will be in a script I run every month over new data - this is just one of many little tweaks to the data in the source file.
Create an array with all the possible values, such as HIST, being the indexes; inside the BEGIN block. Then test for "$2 in ARRAY". In this case, the values used in this associative array don't matter.
See section 8.1.2 for an explanation. http://www.gnu.org/software/gawk/man...wk.html#Arrays
Also, create an awk script instead of separate awk commands. Each line in the body of the awk script is run for each line of the input file. You were running individual awk commands repeatedly.
Ok everybody, let's start by using [code][/code] tags around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting.
The standard awk expression syntax is:
Code:
pattern { commands }
Both sides are optional. If you have only { commands }, then they will be applied to all input records. If you have only pattern, then whenever the pattern evaluates as true, it applies the default command, which is print.
Each ";" delimits an expression. So the first '$2 == "HIST"' is just a pattern with no commands, which would normally print any line that contains "HIST" in field 2. But the problem here is that that the FS field separator is still set to whitespace, which doesn't exist in the input (and would define the fields improperly if it did), so the whole line is counted as one field. Since there is no "$2" field to match, it will never evaluate as true, and never do anything.
The second expression has both pattern and command. But since we're working on the whole line, it has to kludge about trying to handle the commas. So the pattern matches any line containing ",HIST," and applies the sub function to it. But since the function has no specific target field set, it attempts to apply the pattern to the whole $0 line. This means that the first string that matches in the line is substituted, regardless of the pattern that activated it.
The final command on the line is simply "1", a pattern. This is an awker's shortcut for print. Since a non-zero number on its own always evaluates as "true", every line gets printed.
Now to write it correctly.
First, we set the FS to comma, to properly parse the line. We set the output field separator to comma too, so that the output looks the same as the input.
Next, we test field $2 for the value "HIST"
If it matches, we run the sub command on field $6 only.
Finally, we print every line (we can continue to use the "1" trick).
Seriously though, it is neat, but probably way too advanced at this point. I can barely understand them myself. How about something with more of a traditional operational flow?
awk -f test.awk test
ETA3846,6445,1,120426,CHEP,3963,PERI,12H09695
ETA3846,6455,2,120426,CHEP,3963,PERI,12H09695
ETA3846,HIST,1,120426,CHEP,7352,PERI,12H09695
Your solution would have more lines in the BEGIN section containing all of the PATTERN value entries for $2 to test.
search is the name of the array. We don't know what your data represents. The array could use a better name, representing what the 2nd field is.
This performs a default change. Any line where $2 doesn't match a preset entry will have the $6 value changed to 3963. If this is what the OP wants, great, but I don't see anything in his descriptions that merit the assumption. It's only in the output code example that you see that number.
The actual need may instead be all numeric entries, or numbers within a certain range, or even specific numbers. The OP would have to expand on what conditions should apply.
@grail; Thanks for the updated version. I see that the only substantive difference between that and what I posted before is that it eliminates the sub function by moving the $6 evaluation to the pattern side. I wonder if there's any big difference in performance?
Your initial solutions are cool, BTW, now that I've had a chance to digest them. I'm going to have to start looking more closely at that ternary operator.
I believe that is what the OP wants. I inferred it from these lines:
awk '$2 == "6445" ; /\,6445\,/{sub(/\,99999\,/, ",3963,")}; 1' filename2 > filename3
awk '$2 == "6455" ; /\,6455\,/{sub(/\,99999\,/, ",3963,")}; 1' filename3 > filename4
awk '$2 == "6465" ; /\,6465\,/{sub(/\,99999\,/, ",3963,")}; 1' filename4 > filename5
They don't have a text token in the second field, and field 6 is replaced with the same value.
I understood why you made the inference, and I do agree that it appears to be that way. But it's still just an assumption, since nothing in the stated requirements mentions it.
Therefore we should at the very least clearly point out what that line does and why it was included, so that the OP can alter or eliminate it if the actual conditions are different.
This is great stuff guys. I will take the lessons from all of you and try to do better next time.
David: For some reason my previous FS work had failed, due to semi colons I think. Yours substituted worked immediately thanks.
Jschiwal: I certainly agree with the array idea - very clean and tidy - I was just not capable yet so was attacking it at my basic level. David is right about the field 6 issue though.
Grail: I do love graceful solutions.
I am going to glean bit from all responses - thank you.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.