Hello
After a few hours of unsuccessful google-ing, I decided to ask to pros.
This is my problem:
The result of an annotation tool is saved as an .xml file
I need to extract some lines, which contain a certain pattern, from the many files in a folder.
Here is the trick: for some files it works, for some not. I tried to break-down the problem.
My "detective" work (aka GOOGLE like crazy) has led me to the following conclusions:
1. The output of
Code:
grep "start" test.anvil
is:
Code:
<el index="0" start="0" end="0.93332">
<el index="1" start="0.93332" end="1.93331">
<el index="2" start="1.93331" end="3.1333">
So, my grep command works.
2. The output of
Code:
file -bi test.anvil
is:
3. The output of
Code:
grep "start" test_not_working.anvil
is:
nothing.
4. The output of
Code:
file -bi test_not_working.anvil
is:
Code:
text/plain; charset=utf-16
I tried iconv in any possible way. No success (I get the error:
Code:
iconv: illegal input sequence at position 0.
I tried messing with the xml file itself. Nothing.
My only mistake was that when I saved the file in the annotation tool, I didn't choose from beginning ISO-8859-1 coding, and I left the default value: UTF-8
I really don't know what else I can try. What I found on google was related to iconv and nothing worked for me. not even the \\IGNORE option.
Any help is more than appreciated.
Thanks a lot