LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   internal awk variable Field seperator (https://www.linuxquestions.org/questions/linux-newbie-8/internal-awk-variable-field-seperator-908685/)

casperdaghost 10-17-2011 10:52 PM

internal awk variable Field seperator
 
If I have a file, my file which contains five characters
delimited by both colons and spaces.

One Two:Three:4 Five

If I cat this file and pipe it to this internal awk file -

cat myfile | ./myawkprogram

I get 'Five' when I expected to get the Third field, '4 Five'.

Here is the code.


[CODE]
#!/bin/awk -f
{
if ( $0 ~ /:/ ) {
FS=":";
} else {
FS=" ";
}

print $3
}
[\CODE]

grail 10-18-2011 12:30 AM

Have a think of what you are asking here and at what point you are in the code when you ask it.

By the time your awk script has something in $0 it has already performed the split and as you do not set it prior to this it will use the default.
Also, awk can read a file so cat is not required.

casperdaghost 10-22-2011 03:05 PM

Yeah when i add a begin statement and set the field separator to space I get a "FIVE" as a result, when i set the field separator to colon i get
a "4 FIVE".

I am not crystal clear on this but I think that the field separation is a parameter best set before the line iteration, not while. i can do this all with bash - i just am onn a awk kick and want to explore the language. thanks.


this is the file : One Two:Three:4 Five

Code:

#!/usr/bin/gawk -f
BEGIN {
        FS=" ";
}
{
if ( $0 ~ /:/ ) {
FS=":";
} else {
FS=" ";
}

print $3
}


PTrenholme 10-22-2011 04:30 PM

FS can be a regular expression: FS=/[ :]/ or FS=/( +)|:/ might be what you want.

<edit>
Or, more generally, FS=/([[:space:]]+)|:/ or FS=/[[:space:]]*[:[:space:]][[:space:]]*/

That last one says "zreo or more white-space characters followed by either a white-space character or a colon, followed be zero of more white-space characters"
</edit>

grail 10-22-2011 10:52 PM

hmmm ... not quite sure where you were going with this one PT? Neither of your edited versions seem to return any output for the third field.

ahh .. just a did little test ... the issue is that whilst FS is a computed regex it requires quotes (""), although the are turned into slashes (//) at some point.
Any way, even with quotes you are not generating the desired output.

@OP - I believe the best solution is to you use split when you encounter a colon in the line and FS the rest of the time.

PTrenholme 10-23-2011 11:12 AM

:redface: Yes, of course, computed regular expressions should be in quoted strings:
Code:

$ echo "One Two:Three:4 Five" | gawk 'BEGIN {FS="[[:space:]]*[:[:space:]][[:space:]]*"} {print;for(i=1;i<=NF;++i) print "  $" i " = " $i}'
One Two:Three:4 Five
  $1 = One
  $2 = Two
  $3 = Three
  $4 = 4
  $5 = Five

<edit>
To show the reason for the "zero or moe" stuff, consider this:
Code:

$ echo "One Two : Three:  4 Five" | gawk 'BEGIN {FS="[[:space:]]*" "[:[:space:]]" "[[:space:]]*"} {print;for(i=1;i<=NF;++i) print "  $" i " = \"" $i "\""}'
One Two : Three:  4 Five
  $1 = "One"
  $2 = "Two"
  $3 = "Three"
  $4 = "4"
  $5 = "Five"

</edit>


All times are GMT -5. The time now is 04:11 PM.