LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 05-10-2012, 09:23 PM   #1
gafoleyo
LQ Newbie
 
Registered: May 2012
Posts: 3

Rep: Reputation: Disabled
awk - field substitutions


I wish to find lines in a file (e.g. filename1 ) that match ($2) field number two (e.g. HIST), then using that as a key to replace ($6) field six on any lines that have (99999) in field six.
The rather clumsy awk I have below does this except when it hits a line that has HIST but also has PERI on it, it then re-replaces the field six with a new value ending up with 1000 rather than 7352. My $2 match below doesn't seem to work.

Basically I want to force the awk to only replace on the line with Field two = HIST (and about 385 other line matches).

File/input extract: (n.b. ~80000 lines)
ETA3846,6445,1,120426,CHEP,99999,PERI,12H09695
ETA3846,6455,2,120426,CHEP,99999,PERI,12H09695
ETA3846,HIST,1,120426,CHEP,99999,PERI,12H09695

My clumsy Code: (n.b. ~385 lines)
awk '$2 == "HIST" ; /\,HIST\,/{sub(/\,99999\,/, ",7352,")}; 1' filename1 > filename2
awk '$2 == "6445" ; /\,6445\,/{sub(/\,99999\,/, ",3963,")}; 1' filename2 > filename3
awk '$2 == "6455" ; /\,6455\,/{sub(/\,99999\,/, ",3963,")}; 1' filename3 > filename4
awk '$2 == "6465" ; /\,6465\,/{sub(/\,99999\,/, ",3963,")}; 1' filename4 > filename5
awk '$2 == "PERI" ; /\,PERI\,/{sub(/\,99999\,/, ",1000,")}; 1' filename5 > filename6

Any help is greatly appreciated. Thanks, g
 
Old 05-10-2012, 11:13 PM   #2
GioBarr
LQ Newbie
 
Registered: May 2012
Location: Americana, Brazil
Distribution: Fedora 16 Verne i386
Posts: 11

Rep: Reputation: Disabled
Hi. Let me see if I understood what you need.

you will choose HIST (for example) in field 2, then you need change 99999 (for example) in field 6 by Hyst, in all lines which field 2 is HIST.

If my think is correct, you can try this script: (where test.txt is your old file and test2.txt is the new file with changes)



#! /bin/bash
echo "type what you want put in field 6"
read xy
echo "what you need find in field 2?"
read xz
echo "$xy will replace what?"
read xx
while read linha;
do
v1=$(echo $linha | cut -d, -f1)
v2=$(echo $linha | cut -d, -f2)
v3=$(echo $linha | cut -d, -f3)
v4=$(echo $linha | cut -d, -f4)
v5=$(echo $linha | cut -d, -f5)
v6=$(echo $linha | cut -d, -f6)
v7=$(echo $linha | cut -d, -f7)
v8=$(echo $linha | cut -d, -f8)
if test $v2 = $xz;
then
if test $v6 = $xx;
then
v6=$xy
fi
fi
echo "$v1,$v2,$v3,$v4,$v5,$v6" >> test2.txt

done < test.txt
echo "finished"

Last edited by GioBarr; 05-10-2012 at 11:19 PM. Reason: just a improvement in this script
 
Old 05-10-2012, 11:31 PM   #3
gafoleyo
LQ Newbie
 
Registered: May 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thanks GioBarr. But not quite.
I have a file with about 81198 lines that I need to update field 6 depending on what is in field 2 and then field 6 e.g. IF field 2 has the code HIST in it (and it may not) then I will replace the contents of field 6 with a number - BUT - only if it is equal to a certain number (in this case 99999).

e.g.
if field 2 is HIST and field 6 is 99999 then change 99999 to 7352.
if field 2 is HIST and field 6 is not 99999 the do not chnge field 6
if field 2 is not HIST it will be another alpha code to match e.g. PERI
if field 2 if PERI and field 6 is 99999 then change ... and so on ...
(for 385 alpha codes that could possibly be in field 2)
Unfortunately the alpha code in field 2 CAN also be in field 7 - hence needing to specifically match field 2 rather then the whole line.

This will be in a script I run every month over new data - this is just one of many little tweaks to the data in the source file.

Hope this helps. Long winded I know ;-/

Last edited by gafoleyo; 05-10-2012 at 11:33 PM.
 
Old 05-11-2012, 12:52 AM   #4
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Create an array with all the possible values, such as HIST, being the indexes; inside the BEGIN block. Then test for "$2 in ARRAY". In this case, the values used in this associative array don't matter.
See section 8.1.2 for an explanation. http://www.gnu.org/software/gawk/man...wk.html#Arrays

Also, create an awk script instead of separate awk commands. Each line in the body of the awk script is run for each line of the input file. You were running individual awk commands repeatedly.

Last edited by jschiwal; 05-11-2012 at 01:06 AM.
 
Old 05-11-2012, 11:14 AM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
Ok everybody, let's start by using [code][/code] tags around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting.


The standard awk expression syntax is:

Code:
pattern { commands }
Both sides are optional. If you have only { commands }, then they will be applied to all input records. If you have only pattern, then whenever the pattern evaluates as true, it applies the default command, which is print.

So lets look at one of the OP's commands.

Code:
awk '$2 == "HIST" ; /\,HIST\,/{sub(/\,99999\,/, ",7352,")}; 1' filename1 > filename2
Each ";" delimits an expression. So the first '$2 == "HIST"' is just a pattern with no commands, which would normally print any line that contains "HIST" in field 2. But the problem here is that that the FS field separator is still set to whitespace, which doesn't exist in the input (and would define the fields improperly if it did), so the whole line is counted as one field. Since there is no "$2" field to match, it will never evaluate as true, and never do anything.


The second expression has both pattern and command. But since we're working on the whole line, it has to kludge about trying to handle the commas. So the pattern matches any line containing ",HIST," and applies the sub function to it. But since the function has no specific target field set, it attempts to apply the pattern to the whole $0 line. This means that the first string that matches in the line is substituted, regardless of the pattern that activated it.

The final command on the line is simply "1", a pattern. This is an awker's shortcut for print. Since a non-zero number on its own always evaluates as "true", every line gets printed.


Now to write it correctly.

First, we set the FS to comma, to properly parse the line. We set the output field separator to comma too, so that the output looks the same as the input.

Next, we test field $2 for the value "HIST"

If it matches, we run the sub command on field $6 only.

Finally, we print every line (we can continue to use the "1" trick).

Code:
awk -F',' -v OFS=',' '( $2 == "HIST" ) { sub(99999,7532,$6) }1' filename1 > filename2

Last edited by David the H.; 05-11-2012 at 11:42 AM. Reason: it was a totally messed up post
 
Old 05-11-2012, 12:20 PM   #6
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,411

Rep: Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874
Well now that David has put the right information out there
Code:
awk -F, '$6 = ($2 == "HIST" && $6 == 99999)?7352:$6' OFS="," file
If we also follow jschiwal's advice for multiple solutions (untested):
Code:
awk 'BEGIN{OFS=FS=",";search["HIST"]=7352;search["PERI"]=1000}$6 = ($2 in search && $6 == 99999)?search[$2]:$6' file
Now the only mess is on how you wish to populate the array
 
Old 05-11-2012, 12:30 PM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
All right grail, quit showing off.

Seriously though, it is neat, but probably way too advanced at this point. I can barely understand them myself. How about something with more of a traditional operational flow?
 
Old 05-11-2012, 01:49 PM   #8
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,411

Rep: Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874Reputation: 1874
Quote:
How about something with more of a traditional operational flow?
No probs ... will leave the second for everyone to ponder, but here is the first in a more straight forward approach:
Code:
awk -F, '$2 == "HIST" && $6 == 99999{$6 = 7352}1' OFS="," file
 
Old 05-11-2012, 02:04 PM   #9
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Produce an awk program instead of trying to jam everything into one line.
Code:
BEGIN{
   OFS=FS=","
   search["HIST"]=7352
   search["PERI"]=1000
     }
($2 in search && $6 == 99999) { $6 = search[$2] }
$6 == 99999 { $6 = 3963 }
print $0
Code:
 awk -f test.awk test
ETA3846,6445,1,120426,CHEP,3963,PERI,12H09695
ETA3846,6455,2,120426,CHEP,3963,PERI,12H09695
ETA3846,HIST,1,120426,CHEP,7352,PERI,12H09695
Your solution would have more lines in the BEGIN section containing all of the PATTERN value entries for $2 to test.
search is the name of the array. We don't know what your data represents. The array could use a better name, representing what the 2nd field is.
 
1 members found this post helpful.
Old 05-12-2012, 11:58 AM   #10
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
I think jschiwal has summed it up quite nicely.

One minor quibble though. This line:

Code:
$6 == 99999 { $6 = 3963 }
This performs a default change. Any line where $2 doesn't match a preset entry will have the $6 value changed to 3963. If this is what the OP wants, great, but I don't see anything in his descriptions that merit the assumption. It's only in the output code example that you see that number.

The actual need may instead be all numeric entries, or numbers within a certain range, or even specific numbers. The OP would have to expand on what conditions should apply.

@grail; Thanks for the updated version. I see that the only substantive difference between that and what I posted before is that it eliminates the sub function by moving the $6 evaluation to the pattern side. I wonder if there's any big difference in performance?

Your initial solutions are cool, BTW, now that I've had a chance to digest them. I'm going to have to start looking more closely at that ternary operator.
 
Old 05-13-2012, 02:13 AM   #11
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
I believe that is what the OP wants. I inferred it from these lines:
awk '$2 == "6445" ; /\,6445\,/{sub(/\,99999\,/, ",3963,")}; 1' filename2 > filename3
awk '$2 == "6455" ; /\,6455\,/{sub(/\,99999\,/, ",3963,")}; 1' filename3 > filename4
awk '$2 == "6465" ; /\,6465\,/{sub(/\,99999\,/, ",3963,")}; 1' filename4 > filename5

They don't have a text token in the second field, and field 6 is replaced with the same value.
 
Old 05-13-2012, 05:57 AM   #12
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
I understood why you made the inference, and I do agree that it appears to be that way. But it's still just an assumption, since nothing in the stated requirements mentions it.

Therefore we should at the very least clearly point out what that line does and why it was included, so that the OP can alter or eliminate it if the actual conditions are different.
 
Old 05-13-2012, 04:29 PM   #13
gafoleyo
LQ Newbie
 
Registered: May 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
This is great stuff guys. I will take the lessons from all of you and try to do better next time.

David: For some reason my previous FS work had failed, due to semi colons I think. Yours substituted worked immediately thanks.
Jschiwal: I certainly agree with the array idea - very clean and tidy - I was just not capable yet so was attacking it at my basic level. David is right about the field 6 issue though.
Grail: I do love graceful solutions.

I am going to glean bit from all responses - thank you.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk: how to print a field when field position is unknown? elfoozo Programming 12 08-18-2010 03:52 AM
awk printing from Nth field to last field sebelk Programming 2 01-08-2010 09:39 AM
deleting a field using awk jkeertir Linux - Newbie 5 04-13-2008 10:55 PM
My field separator changes when using awk Helene Programming 3 05-01-2004 08:10 AM
Two field seperators in awk?? Astro Programming 2 11-09-2003 10:12 AM


All times are GMT -5. The time now is 11:30 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration