LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   awk messing up trying to split a unicode line by whitespace (https://www.linuxquestions.org/questions/programming-9/awk-messing-up-trying-to-split-a-unicode-line-by-whitespace-471045/)

hedpe 08-05-2006 10:56 AM

awk messing up trying to split a unicode line by whitespace
 
Hey guys,

I am not sure whats causing this, but whenever I need to split a line by whitespace, i've always used this for example:
Code:

awk -F" " '{print $2}'
I've written a script to help me go through greek vocabulary words, it reads a line from a file which is in the format:
Code:

english greek_present greek_past greek_future
In my script, i try to split the line like this:
Code:

    echo $line
    english=$(echo $line | awk -F" " '{print $1}')
    present=$(echo $line | awk -F" " '{print $2}')
    past=$(echo $line | awk -F" " '{print $3}')
    future=$(echo $line | awk -F" " '{print $4}')
    echo  $english $present $past $future

This is where things start getting screwed up. the two echos for testing do not come out the same, here is an example:
Code:

buy αγοράζω αγόρασα αγοράσω
buy αγοράζω αγό αγοράσω

$past only has part of the word, not the full word as "echo $line" shows there should be. I don't know if this is an awk problem with unicode or if its a bash variable problem. My guess is that it is awk, since $line always contains the proper characters, and i've tested that over and over. The problems only begin when it goes through awk.

Also, note that its not always predictable, awk will sometimes split it properly, or will screw up $present, or maybe $future ... it never does the same thing over and over.

Any ideas?

Thanks!
George

hedpe 08-05-2006 11:10 AM

i think quotes around the echo $variables, like echo "$variables" may have fixed it


All times are GMT -5. The time now is 05:10 PM.