LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   process substitution with awk, output splitting incorrectly (https://www.linuxquestions.org/questions/linux-newbie-8/process-substitution-with-awk-output-splitting-incorrectly-4175539082/)

jewfro2 04-08-2015 03:34 AM

process substitution with awk, output splitting incorrectly
 
I have the following code to extract two dates using awk, which are then read into two awk variables new and old respectively.
Each dates on the html file pulled with curl request is in this format:
2015-04-06 09:40:37
And two are being extracted
However the strings are being split on white space within the date strings. I tried changing OFS to ',', but it was still splitting incorrectly.
Code:

read dateStrNew dateStrOld < <(curl -k -q "$curl_call" | html2text | gawk '/Newest Sequence/ { new=$3" "$4 }/Oldest Sequence/ \
 {old=$3" "$4}END {OFS=","; print new,old }')  //new = date, old = date

Both parts of the date are being assigned to each variable using $3 and $4, then the space needs to be added back in so that the string can be used afterwards with a date command.

I just can't work out what is wrong, any help would be very much appreciated! Thanks!

millgates 04-08-2015 05:18 AM

Hi,
the culprit here is the shell (bash?), rather than awk. awk outputs a line that contains spaces and read splits it. If you want to prevent that, you need to change the IFS variable in the shell:

Code:

IFS=, read dateStrNew dateStrOld < <(curl -k -q "$curl_call" | html2text | gawk '/Newest Sequence/ { new=$3" "$4 }/Oldest Sequence/ \
 {old=$3" "$4}END {OFS=","; print new,old }')  //new = date, old = date


jewfro2 04-08-2015 05:26 AM

Quote:

Originally Posted by millgates (Post 5344279)
Hi,
the culprit here is the shell (bash?), rather than awk. awk outputs a line that contains spaces and read splits it. If you want to prevent that, you need to change the IFS variable in the shell:

Code:

IFS=, read dateStrNew dateStrOld < <(curl -k -q "$curl_call" | html2text | gawk '/Newest Sequence/ { new=$3" "$4 }/Oldest Sequence/ \
 {old=$3" "$4}END {OFS=","; print new,old }')  //new = date, old = date


Thanks! I did actually have IFS="," followed by the read command on a new line it will work? So you are saying it needs to all be on the same line and it will work? I can't try it now until the server is up again tomorrow when we are testing. Will let you know how I go

millgates 04-08-2015 06:10 AM

Quote:

Originally Posted by jewfro2 (Post 5344281)
Thanks! I did actually have IFS="," followed by the read command on a new line it will work? So you are saying it needs to all be on the same line and it will work? I can't try it now until the server is up again tomorrow when we are testing. Will let you know how I go

do you mean something like this?

Code:

IFS=","
read dateStrNew dateStrOld < ...

That would also work. The difference is in this case the IFS variable will be set to "," permanently for the current shell, while the former assignment will only change IFS for the single command. If that doesn't work for you, then there is some other problem and it would help to show us the exact output of

Code:

curl -k -q "$curl_call" | html2text
and the exact results you got.

jewfro2 04-08-2015 07:04 AM

Ah, well it's good to know that is a way of temporarily setting the IFS.
Yes I did.
Sample output:
newest ,
oldest
newest 2015-04-07 06
oldest 29:29,2015-04-06 09:40:37
newest ,
oldest
newest 2015-04-07 06
oldest 29:29,2015-04-06 09:40:37

So you can see that the comma is being read in the first variable, followed by whitespace/nothing.
Then the first part of the first date string
Then the second part of the first date string plus all of the second date string
then a comma
etc


All times are GMT -5. The time now is 04:42 AM.