LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 11-12-2009, 11:52 AM   #1
mek
Member
 
Registered: Jul 2009
Location: Buenos Aires, Argentina
Distribution: fedora, slackware, *bsd
Posts: 42

Rep: Reputation: 17
translating text with awk or sed


Hi all,

I would like to know how can I parse a line like this:

IT©Author«what is CP/M?*an OS

into something like:

Category: IT
Question: what is CP/M?
Answer: an OS
Author: Author

I have to parse a file with 60000 lines like that.
Do you know how can I make an awk syntax to accomplish this task?

Thanks in advance!
 
Old 11-12-2009, 12:52 PM   #2
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
Quote:
Originally Posted by mek View Post
Do you know how can I make an awk syntax to accomplish this task?
Yes, I do know how to accomplish this task. But, why not take a stab at it yourself first. At least try.

Try:
http://www.grymoire.com/Unix/Awk.html
 
Old 11-12-2009, 01:29 PM   #3
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
Ok, here I'll give you a hint as to how I might do it:

Code:
awk '{ printf("Category: %s", substr($1,1,index($1,"©")-2)); }' test
Note that some trouble may come from the strange characters that are used.
 
Old 11-12-2009, 01:40 PM   #4
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Why not first parse the file to get rid of all the strange characters?---then awk can just work on fields normally.
 
Old 11-12-2009, 06:32 PM   #5
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Code:
# awk  -vFS='\302\251|\302\253|*' '{print $1,$2,$3,$4}' file
IT Author what is CP/M? an OS
do the rest yourself.
 
Old 11-13-2009, 07:57 AM   #6
mek
Member
 
Registered: Jul 2009
Location: Buenos Aires, Argentina
Distribution: fedora, slackware, *bsd
Posts: 42

Original Poster
Rep: Reputation: 17
Thanks guys, I sorted out the issue working with sed to replace all those weird character with commas and then I used cut to extract the text to different files.

Thanks for your tips in awk anyway, I'll play arround that
 
  


Reply

Tags
awk, text


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed, awk, Keep only text between two regular expressions scott_audio Linux - Newbie 9 08-06-2009 02:46 PM
Text substitution and processing with sed and awk shanecraddock@gmail.com Linux - Newbie 1 12-18-2008 11:34 AM
Manipulating Text File with awk or sed kushalkoolwal Programming 2 09-10-2008 07:35 PM
parsing text using sed/awk or similar??? freeindy Programming 5 07-24-2008 04:04 AM
awk/sed to grep the text ahpin Linux - Software 3 10-17-2007 12:34 AM


All times are GMT -5. The time now is 07:59 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration