translating text with awk or sed
Hi all,
I would like to know how can I parse a line like this: IT©Author«what is CP/M?*an OS into something like: Category: IT Question: what is CP/M? Answer: an OS Author: Author I have to parse a file with 60000 lines like that. Do you know how can I make an awk syntax to accomplish this task? Thanks in advance! |
Quote:
Try: http://www.grymoire.com/Unix/Awk.html |
Ok, here I'll give you a hint as to how I might do it:
Code:
awk '{ printf("Category: %s", substr($1,1,index($1,"©")-2)); }' test |
Why not first parse the file to get rid of all the strange characters?---then awk can just work on fields normally.
|
Code:
# awk -vFS='\302\251|\302\253|*' '{print $1,$2,$3,$4}' file |
Thanks guys, I sorted out the issue working with sed to replace all those weird character with commas and then I used cut to extract the text to different files.
Thanks for your tips in awk anyway, I'll play arround that |
All times are GMT -5. The time now is 11:02 AM. |