Hey guys,
I am not sure whats causing this, but whenever I need to split a line by whitespace, i've always used this for example:
Code:
awk -F" " '{print $2}'
I've written a script to help me go through greek vocabulary words, it reads a line from a file which is in the format:
Code:
english greek_present greek_past greek_future
In my script, i try to split the line like this:
Code:
echo $line
english=$(echo $line | awk -F" " '{print $1}')
present=$(echo $line | awk -F" " '{print $2}')
past=$(echo $line | awk -F" " '{print $3}')
future=$(echo $line | awk -F" " '{print $4}')
echo $english $present $past $future
This is where things start getting screwed up. the two echos for testing do not come out the same, here is an example:
Code:
buy αγοράζω αγόρασα αγοράσω
buy αγοράζω αγό αγοράσω
$past only has part of the word, not the full word as "echo $line" shows there should be. I don't know if this is an awk problem with unicode or if its a bash variable problem. My guess is that it is awk, since $line always contains the proper characters, and i've tested that over and over. The problems only begin when it goes through awk.
Also, note that its not always predictable, awk will sometimes split it properly, or will screw up $present, or maybe $future ... it never does the same thing over and over.
Any ideas?
Thanks!
George