LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-12-2007, 07:02 AM   #1
mauran
LQ Newbie
 
Registered: Dec 2005
Location: Sri Lanka
Distribution: unbuntu 7.04
Posts: 17

Rep: Reputation: 0
multiple character replacement by shell script


hello,

I want to write a shell script to convert text file which contains phonetically written Tamil text into Tamil Unicode text file.

It is just like this.

first, need to search all patterns

say, 'aa'

then replace that pattern intu unicode charecter "அ"

please help me
 
Old 07-12-2007, 07:38 AM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Have you read up on any of the standard utilities---eg sed and awk?

One common solution is the sed "substitute" command. Suppose you wanted to replace all instances of "aa" with "bb":
sed 's/aa/bb/g' <oldfile >newfile

I am sure there is a way to put in a hex byte instead of "bb", but I don't have it in front of me.

Really good sed, awk, and other tutorials here:
http://www.grymoire.com/Unix/
 
Old 07-12-2007, 01:31 PM   #3
mauran
LQ Newbie
 
Registered: Dec 2005
Location: Sri Lanka
Distribution: unbuntu 7.04
Posts: 17

Original Poster
Rep: Reputation: 0
pixellany,

Thank you very much.

I got the idea from your reply and made a small script to achieve a demo conversion.

here is my script.

Quote:
sed -e 's/ma/ம/g' -e 's/yuu/யூ/g' -e 's/ra/ர/g' -e 's/n/ன்/g' < mauran >mauran1
but it gave me a new problem.

output file contains some complicated utf-8 mess like this,

Quote:
மuரன் மuரன் மரm மரன்am
does > operator handles Unicode well?
 
Old 07-12-2007, 01:35 PM   #4
mauran
LQ Newbie
 
Registered: Dec 2005
Location: Sri Lanka
Distribution: unbuntu 7.04
Posts: 17

Original Poster
Rep: Reputation: 0
update!

i saved the output file as file.html and open it in Firefox. when I make firefox's encoding as utf-8, it shows characters correctly.

so the problem is,

how to handle utf-8 encoding in sed and >.
 
Old 07-12-2007, 02:15 PM   #5
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
The "<" ">" operators are for redirection and do not care what the data encoding is.

How did you get the special characters into the sed commands? You may need to run some experiments to see which characters get correctly handled by sed. (I've never seen anything about this in the various books on sed.)

Did you find anything on how to input raw hex bytes using sed?
 
Old 07-12-2007, 02:45 PM   #6
mauran
LQ Newbie
 
Registered: Dec 2005
Location: Sri Lanka
Distribution: unbuntu 7.04
Posts: 17

Original Poster
Rep: Reputation: 0
I'm directly input Tamil charecters into bash script using scim input method.

I've almost finished my code.

now I'm redirecting the output to a html file.
html file can be handled easily in these encoding stuff.

here is my code

Quote:
#/bin/bash!

filename=`zenity --file-selection`

sed -e 's/Xau/க்ஷௌ/g' -e 's/Xai/க்ஷை/g' -e 's/Xaa/க்ஷா/g' -e 's/XA/க்ஷா/g' -e 's/Xa/க்ஷ/g' -e 's/Xii/க்ஷீ/g' -e 's/Xi/க்ஷி/g' -e 's/XI/க்ஷீ/g' -e 's/Xuu/க்ஷூ/g' -e 's/Xu/க்ஷு/g' -e 's/XU/க்ஷூ/g' -e 's/Xee/க்ஷே/g' -e 's/Xe/க்ஷெ/g' -e 's/XE/க்ஷே/g' -e 's/Xoo/க்ஷோ/g' -e 's/Xo/க்ஷொ/g' -e 's/XO/க்ஷோ/g' -e 's/X/க்ஷ்/g' -e 's/njau/ஞௌ/g' -e 's/njai/ஞை/g' -e 's/njee/ஞே/g' -e 's/njoo/ஞோ/g' -e 's/njaa/ஞா/g' -e 's/njuu/ஞூ/g' -e 's/njii/ஞீ/g' -e 's/nja/ஞ/g' -e 's/nji/ஞி/g' -e 's/njI/ஞீ/g' -e 's/njA/ஞா/g' -e 's/nje/ஞெ/g' -e 's/njE/ஞே/g' -e 's/njo/ஞொ/g' -e 's/njO/ஞோ/g' -e 's/nju/ஞு/g' -e 's/njU/ஞூ/g' -e 's/nj/ஞ்/g' -e 's/ngau/ஙௌ/g' -e 's/ngai/ஙை/g' -e 's/ngee/ஙே/g' -e 's/ngoo/ஙோ/g' -e 's/ngaa/ஙா/g' -e 's/nguu/ஙூ/g' -e 's/ngii/ஙீ/g' -e 's/nga/ங/g' -e 's/ngi/ஙி/g' -e 's/ngI/ஙீ/g' -e 's/ngA/ஙா/g' -e 's/nge/ஙெ/g' -e 's/ngE/ஙே/g' -e 's/ngo/ஙொ/g' -e 's/ngO/ஙோ/g' -e 's/ngu/ஙு/g' -e 's/ngU/ஙூ/g' -e 's/ng/ங்/g' -e 's/shau/ஷௌ/g' -e 's/shai/ஷை/g' -e 's/shee/ஷே/g' -e 's/shoo/ஷோ/g' -e 's/shaa/ஷா/g' -e 's/shuu/ஷூ/g' -e 's/shii/ஷீ/g' -e 's/sha/ஷ/g' -e 's/shi/ஷி/g' -e 's/shI/ஷீ/g' -e 's/shA/ஷா/g' -e 's/she/ஷெ/g' -e 's/shE/ஷே/g' -e 's/sho/ஷொ/g' -e 's/shO/ஷோ/g' -e 's/shu/ஷு/g' -e 's/shU/ஷூ/g' -e 's/sh/ஷ்/g' -e 's/ nau/ நௌ/g' -e 's/ nai/ நை/g' -e 's/ nee/ நே/g' -e 's/ noo/ நோ/g' -e 's/ naa/ நா/g' -e 's/ nuu/ நூ/g' -e 's/ nii/ நீ/g' -e 's/ na/ ந/g' -e 's/ ni/ நி/g' -e 's/ nI/ நீ/g' -e 's/ nA/ நா/g' -e 's/ ne/ நெ/g' -e 's/ nE/ நே/g' -e 's/ no/ நொ/g' -e 's/ nO/ நோ/g' -e 's/ nu/ நு/g' -e 's/ nU/ நூ/g' -e 's/ nth/ ந்/g' -e 's/-nau/நௌ/g' -e 's/-nai/நை/g' -e 's/-nee/நே/g' -e 's/-noo/நோ/g' -e 's/-naa/நா/g' -e 's/-nuu/நூ/g' -e 's/-nii/நீ/g' -e 's/-na/ந/g' -e 's/-ni/நி/g' -e 's/-nI/நீ/g' -e 's/-nA/நா/g' -e 's/-ne/நெ/g' -e 's/-nE/நே/g' -e 's/-no/நொ/g' -e 's/-nO/நோ/g' -e 's/-nu/நு/g' -e 's/-nU/நூ/g' -e 's/n-au/நௌ/g' -e 's/n-ai/நை/g' -e 's/n -ee/நே/g' -e 's/n-oo/நோ/g' -e 's/n-aa/நா/g' -e 's/n-uu/நூ/g' -e 's/n-ii/நீ/g' -e 's/n-a/ந/g' -e 's/n-i/நி/g' -e 's/n-I/நீ/g' -e 's/n-A/நா/g' -e 's/n -e/நெ/g' -e 's/n -e/நே/g' -e 's/n-o/நொ/g' -e 's/n-O/நோ/g' -e 's/n-u/நு/g' -e 's/n-U/நூ/g' -e 's/wau/நௌ/g' -e 's/wai/நை/g' -e 's/wee/நே/g' -e 's/woo/நோ/g' -e 's/waa/நா/g' -e 's/wuu/நூ/g' -e 's/wii/நீ/g' -e 's/wa/ந/g' -e 's/wi/நி/g' -e 's/wI/நீ/g' -e 's/wA/நா/g' -e 's/we/நெ/g' -e 's/wE/நே/g' -e 's/wo/நொ/g' -e 's/wO/நோ/g' -e 's/wu/நு/g' -e 's/wU/நூ/g' -e 's/ n/ ந்/g' -e 's/n-/ந்/g' -e 's/-n/ந்/g' -e 's/w/ந்/g' -e 's/nthau/ந்தௌ/g' -e 's/nthai/ந்தை/g' -e 's/nthee/ந்தே/g' -e 's/nthoo/ந்தோ/g' -e 's/nthaa/ந்தா/g' -e 's/nthuu/ந்தூ/g' -e 's/nthii/ந்தீ/g' -e 's/ntha/ந்த/g' -e 's/nthi/ந்தி/g' -e 's/nthI/ந்தீ/g' -e 's/nthA/ந்தா/g' -e 's/nthe/ந்தெ/g' -e 's/nthE/ந்தே/g' -e 's/ntho/ந்தொ/g' -e 's/nthO/ந்தோ/g' -e 's/nthu/ந்து/g' -e 's/nthU/ந்தூ/g' -e 's/nth/ந்/g' -e 's/dhau/தௌ/g' -e 's/dhai/தை/g' -e 's/dhee/தே/g' -e 's/dhoo/தோ/g' -e 's/dhaa/தா/g' -e 's/dhuu/தூ/g' -e 's/dhii/தீ/g' -e 's/dha/த/g' -e 's/dhi/தி/g' -e 's/dhI/தீ/g' -e 's/dhA/தா/g' -e 's/dhe/தெ/g' -e 's/dhE/தே/g' -e 's/dho/தொ/g' -e 's/dhO/தோ/g' -e 's/dhu/து/g' -e 's/dhU/தூ/g' -e 's/dh/த்/g' -e 's/chau/சௌ/g' -e 's/chai/சை/g' -e 's/chee/சே/g' -e 's/choo/சோ/g' -e 's/chaa/சா/g' -e 's/chuu/சூ/g' -e 's/chii/சீ/g' -e 's/cha/ச/g' -e 's/chi/சி/g' -e 's/chI/சீ/g' -e 's/chA/சா/g' -e 's/che/செ/g' -e 's/chE/சே/g' -e 's/cho/சொ/g' -e 's/chO/சோ/g' -e 's/chu/சு/g' -e 's/chU/சூ/g' -e 's/ch/ச்/g' -e 's/zhau/ழௌ/g' -e 's/zhai/ழை/g' -e 's/zhee/ழே/g' -e 's/zhoo/ழோ/g' -e 's/zhaa/ழா/g' -e 's/zhuu/ழூ/g' -e 's/zhii/ழீ/g' -e 's/zha/ழ/g' -e 's/zhi/ழி/g' -e 's/zhI/ழீ/g' -e 's/zhA/ழா/g' -e 's/zhe/ழெ/g' -e 's/zhE/ழே/g' -e 's/zho/ழொ/g' -e 's/zhO/ழோ/g' -e 's/zhu/ழு/g' -e 's/zhU/ழூ/g' -e 's/zh/ழ்/g' -e 's/zau/ழௌ/g' -e 's/zai/ழை/g' -e 's/zee/ழே/g' -e 's/zoo/ழோ/g' -e 's/zaa/ழா/g' -e 's/zuu/ழூ/g' -e 's/zii/ழீ/g' -e 's/za/ழ/g' -e 's/zi/ழி/g' -e 's/zI/ழீ/g' -e 's/zA/ழா/g' -e 's/ze/ழெ/g' -e 's/zE/ழே/g' -e 's/zo/ழொ/g' -e 's/zO/ழோ/g' -e 's/zu/ழு/g' -e 's/zU/ழூ/g' -e 's/z/ழ்/g' -e 's/jau/ஜௌ/g' -e 's/jai/ஜை/g' -e 's/jee/ஜே/g' -e 's/joo/ஜோ/g' -e 's/jaa/ஜா/g' -e 's/juu/ஜூ/g' -e 's/jii/ஜீ/g' -e 's/ja/ஜ/g' -e 's/ji/ஜி/g' -e 's/jI/ஜீ/g' -e 's/jA/ஜா/g' -e 's/je/ஜெ/g' -e 's/jE/ஜே/g' -e 's/jo/ஜொ/g' -e 's/jO/ஜோ/g' -e 's/ju/ஜு/g' -e 's/jU/ஜூ/g' -e 's/j/ஜ்/g' -e 's/thau/தௌ/g' -e 's/thai/தை/g' -e 's/thee/தே/g' -e 's/thoo/தோ/g' -e 's/thaa/தா/g' -e 's/thuu/தூ/g' -e 's/thii/தீ/g' -e 's/tha/த/g' -e 's/thi/தி/g' -e 's/thI/தீ/g' -e 's/thA/தா/g' -e 's/the/தெ/g' -e 's/thE/தே/g' -e 's/tho/தொ/g' -e 's/thO/தோ/g' -e 's/thu/து/g' -e 's/thU/தூ/g' -e 's/th/த்/g' -e 's/-hau/ஹௌ/g' -e 's/-hai/ஹை/g' -e 's/-hee/ஹே/g' -e 's/-hoo/ஹோ/g' -e 's/-haa/ஹா/g' -e 's/-huu/ஹூ/g' -e 's/-hii/ஹீ/g' -e 's/-ha/ஹ/g' -e 's/-hi/ஹி/g' -e 's/-hI/ஹீ/g' -e 's/-hA/ஹா/g' -e 's/-he/ஹெ/g' -e 's/-hE/ஹே/g' -e 's/-ho/ஹொ/g' -e 's/-hO/ஹோ/g' -e 's/-hu/ஹு/g' -e 's/-hU/ஹூ/g' -e 's/-h/ஹ்/g' -e 's/hau/கௌ/g' -e 's/hai/கை/g' -e 's/hee/கே/g' -e 's/hoo/கோ/g' -e 's/haa/கா/g' -e 's/huu/கூ/g' -e 's/hii/கீ/g' -e 's/ha/க/g' -e 's/hi/கி/g' -e 's/hI/கீ/g' -e 's/hA/கா/g' -e 's/he/கெ/g' -e 's/hE/கே/g' -e 's/ho/கொ/g' -e 's/hO/கோ/g' -e 's/hu/கு/g' -e 's/hU/கூ/g' -e 's/h/க்/g' -e 's/kau/கௌ/g' -e 's/kai/கை/g' -e 's/kee/கே/g' -e 's/koo/கோ/g' -e 's/kaa/கா/g' -e 's/kuu/கூ/g' -e 's/kii/கீ/g' -e 's/ka/க/g' -e 's/ki/கி/g' -e 's/kI/கீ/g' -e 's/kA/கா/g' -e 's/ke/கெ/g' -e 's/kE/கே/g' -e 's/ko/கொ/g' -e 's/kO/கோ/g' -e 's/ku/கு/g' -e 's/kU/கூ/g' -e 's/k/க்/g' -e 's/-sau/ஸௌ/g' -e 's/-sai/ஸை/g' -e 's/-see/ஸே/g' -e 's/-soo/ஸோ/g' -e 's/-saa/ஸா/g' -e 's/-suu/ஸூ/g' -e 's/-sii/ஸீ/g' -e 's/-sa/ஸ/g' -e 's/-si/ஸி/g' -e 's/-sI/ஸீ/g' -e 's/-sA/ஸா/g' -e 's/-se/ஸெ/g' -e 's/-sE/ஸே/g' -e 's/-so/ஸொ/g' -e 's/-sO/ஸோ/g' -e 's/-su/ஸு/g' -e 's/-sU/ஸூ/g' -e 's/-s/ஸ்/g' -e 's/Sau/ஸௌ/g' -e 's/Sai/ஸை/g' -e 's/See/ஸே/g' -e 's/Soo/ஸோ/g' -e 's/Saa/ஸா/g' -e 's/Suu/ஸூ/g' -e 's/Sii/ஸீ/g' -e 's/Sa/ஸ/g' -e 's/Si/ஸி/g' -e 's/SI/ஸீ/g' -e 's/SA/ஸா/g' -e 's/Se/ஸெ/g' -e 's/SE/ஸே/g' -e 's/So/ஸொ/g' -e 's/SO/ஸோ/g' -e 's/Su/ஸு/g' -e 's/SU/ஸூ/g' -e 's/S/ஸ்/g' -e 's/rau/ரௌ/g' -e 's/rai/ரை/g' -e 's/ree/ரே/g' -e 's/roo/ரோ/g' -e 's/raa/ரா/g' -e 's/ruu/ரூ/g' -e 's/rii/ரீ/g' -e 's/ra/ர/g' -e 's/ri/ரி/g' -e 's/rI/ரீ/g' -e 's/rA/ரா/g' -e 's/re/ரெ/g' -e 's/rE/ரே/g' -e 's/ro/ரொ/g' -e 's/rO/ரோ/g' -e 's/ru/ரு/g' -e 's/rU/ரூ/g' -e 's/r/ர்/g' -e 's/Rau/றௌ/g' -e 's/Rai/றை/g' -e 's/Ree/றே/g' -e 's/Roo/றோ/g' -e 's/Raa/றா/g' -e 's/Ruu/றூ/g' -e 's/Rii/றீ/g' -e 's/Ra/ற/g' -e 's/Ri/றி/g' -e 's/RI/றீ/g' -e 's/RA/றா/g' -e 's/Re/றெ/g' -e 's/RE/றே/g' -e 's/Ro/றொ/g' -e 's/RO/றோ/g' -e 's/Ru/று/g' -e 's/RU/றூ/g' -e 's/R/ற்/g' -e 's/tau/டௌ/g' -e 's/tai/டை/g' -e 's/tee/டே/g' -e 's/too/டோ/g' -e 's/taa/டா/g' -e 's/tuu/டூ/g' -e 's/tii/டீ/g' -e 's/ta/ட/g' -e 's/ti/டி/g' -e 's/tI/டீ/g' -e 's/tA/டா/g' -e 's/te/டெ/g' -e 's/tE/டே/g' -e 's/to/டொ/g' -e 's/tO/டோ/g' -e 's/tu/டு/g' -e 's/tU/டூ/g' -e 's/t/ட்/g' -e 's/sau/சௌ/g' -e 's/sai/சை/g' -e 's/see/சே/g' -e 's/soo/சோ/g' -e 's/saa/சா/g' -e 's/suu/சூ/g' -e 's/sii/சீ/g' -e 's/sa/ச/g' -e 's/si/சி/g' -e 's/sI/சீ/g' -e 's/sA/சா/g' -e 's/se/செ/g' -e 's/sE/சே/g' -e 's/so/சொ/g' -e 's/sO/சோ/g' -e 's/su/சு/g' -e 's/sU/சூ/g' -e 's/s/ச்/g' -e 's/pau/பௌ/g' -e 's/pai/பை/g' -e 's/pee/பே/g' -e 's/poo/போ/g' -e 's/paa/பா/g' -e 's/puu/பூ/g' -e 's/pii/பீ/g' -e 's/pa/ப/g' -e 's/pi/பி/g' -e 's/pI/பீ/g' -e 's/pA/பா/g' -e 's/pe/பெ/g' -e 's/pE/பே/g' -e 's/po/பொ/g' -e 's/pO/போ/g' -e 's/pu/பு/g' -e 's/pU/பூ/g' -e 's/p/ப்/g' -e 's/bau/பௌ/g' -e 's/bai/பை/g' -e 's/bee/பே/g' -e 's/boo/போ/g' -e 's/baa/பா/g' -e 's/buu/பூ/g' -e 's/bii/பீ/g' -e 's/ba/ப/g' -e 's/bi/பி/g' -e 's/bI/பீ/g' -e 's/bA/பா/g' -e 's/be/பெ/g' -e 's/bE/பே/g' -e 's/bo/பொ/g' -e 's/bO/போ/g' -e 's/bu/பு/g' -e 's/bU/பூ/g' -e 's/b/ப்/g' -e 's/mau/மௌ/g' -e 's/mai/மை/g' -e 's/mee/மே/g' -e 's/moo/மோ/g' -e 's/maa/மா/g' -e 's/muu/மூ/g' -e 's/mii/மீ/g' -e 's/ma/ம/g' -e 's/mi/மி/g' -e 's/mI/மீ/g' -e 's/mA/மா/g' -e 's/me/மெ/g' -e 's/mE/மே/g' -e 's/mo/மொ/g' -e 's/mO/மோ/g' -e 's/mu/மு/g' -e 's/mU/மூ/g' -e 's/m/ம்/g' -e 's/yau/யௌ/g' -e 's/yai/யை/g' -e 's/yee/யே/g' -e 's/yoo/யோ/g' -e 's/yaa/யா/g' -e 's/yuu/யூ/g' -e 's/yii/யீ/g' -e 's/ya/ய/g' -e 's/yi/யி/g' -e 's/yI/யீ/g' -e 's/yA/யா/g' -e 's/ye/யெ/g' -e 's/yE/யே/g' -e 's/yo/யொ/g' -e 's/yO/யோ/g' -e 's/yu/யு/g' -e 's/yU/யூ/g' -e 's/y/ய்/g' -e 's/dau/டௌ/g' -e 's/dai/டை/g' -e 's/dee/டே/g' -e 's/doo/டோ/g' -e 's/daa/டா/g' -e 's/duu/டூ/g' -e 's/dii/டீ/g' -e 's/da/ட/g' -e 's/di/டி/g' -e 's/dI/டீ/g' -e 's/dA/டா/g' -e 's/de/டெ/g' -e 's/dE/டே/g' -e 's/do/டொ/g' -e 's/dO/டோ/g' -e 's/du/டு/g' -e 's/dU/டூ/g' -e 's/d/ட்/g' -e 's/nau/னௌ/g' -e 's/nai/னை/g' -e 's/nee/னே/g' -e 's/noo/னோ/g' -e 's/naa/னா/g' -e 's/nuu/னூ/g' -e 's/nii/னீ/g' -e 's/na/ன/g' -e 's/ni/னி/g' -e 's/nI/னீ/g' -e 's/nA/னா/g' -e 's/ne/னெ/g' -e 's/nE/னே/g' -e 's/no/னொ/g' -e 's/nO/னோ/g' -e 's/nu/னு/g' -e 's/nU/னூ/g' -e 's/n/ன்/g' -e 's/Nau/ணௌ/g' -e 's/Nai/ணை/g' -e 's/Nee/ணே/g' -e 's/Noo/ணோ/g' -e 's/Naa/ணா/g' -e 's/Nuu/ணூ/g' -e 's/Nii/ணீ/g' -e 's/Na/ண/g' -e 's/Ni/ணி/g' -e 's/NI/ணீ/g' -e 's/NA/ணா/g' -e 's/Ne/ணெ/g' -e 's/NE/ணே/g' -e 's/No/ணொ/g' -e 's/NO/ணோ/g' -e 's/Nu/ணு/g' -e 's/NU/ணூ/g' -e 's/N/ண்/g' -e 's/lau/லௌ/g' -e 's/lai/லை/g' -e 's/lee/லே/g' -e 's/loo/லோ/g' -e 's/laa/லா/g' -e 's/luu/லூ/g' -e 's/lii/லீ/g' -e 's/la/ல/g' -e 's/li/லி/g' -e 's/lI/லீ/g' -e 's/lA/லா/g' -e 's/le/லெ/g' -e 's/lE/லே/g' -e 's/lo/லொ/g' -e 's/lO/லோ/g' -e 's/lu/லு/g' -e 's/lU/லூ/g' -e 's/l/ல்/g' -e 's/Lau/ளௌ/g' -e 's/Lai/ளை/g' -e 's/Lee/ளே/g' -e 's/Loo/ளோ/g' -e 's/Laa/ளா/g' -e 's/Luu/ளூ/g' -e 's/Lii/ளீ/g' -e 's/La/ள/g' -e 's/Li/ளி/g' -e 's/LI/ளீ/g' -e 's/LA/ளா/g' -e 's/Le/ளெ/g' -e 's/LE/ளே/g' -e 's/Lo/ளொ/g' -e 's/LO/ளோ/g' -e 's/Lu/ளு/g' -e 's/LU/ளூ/g' -e 's/L/ள்/g' -e 's/vau/வௌ/g' -e 's/vai/வை/g' -e 's/vee/வே/g' -e 's/voo/வோ/g' -e 's/vaa/வா/g' -e 's/vuu/வூ/g' -e 's/vii/வீ/g' -e 's/va/வ/g' -e 's/vi/வி/g' -e 's/vI/வீ/g' -e 's/vA/வா/g' -e 's/ve/வெ/g' -e 's/vE/வே/g' -e 's/vo/வொ/g' -e 's/vO/வோ/g' -e 's/vu/வு/g' -e 's/vU/வூ/g' -e 's/v/வ்/g' -e 's/gau/கௌ/g' -e 's/gai/கை/g' -e 's/gee/கே/g' -e 's/goo/கோ/g' -e 's/gaa/கா/g' -e 's/guu/கூ/g' -e 's/gii/கீ/g' -e 's/ga/க/g' -e 's/gi/கி/g' -e 's/gI/கீ/g' -e 's/gA/கா/g' -e 's/ge/கெ/g' -e 's/gE/கே/g' -e 's/go/கொ/g' -e 's/gO/கோ/g' -e 's/gu/கு/g' -e 's/gU/கூ/g' -e 's/g/க்/g' -e 's/au/ஔ/g' -e 's/ai/ஐ/g' -e 's/aa/ஆ/g' -e 's/ee/ஏ/g' -e 's/ii/ஈ/g' -e 's/uu/ஊ/g' -e 's/oo/ஓ/g' -e 's/-1000/௲/g' -e 's/-100/௱/g' -e 's/-10/௰/g' -e 's/-1/௧/g' -e 's/-2/௨/g' -e 's/-3/௩/g' -e 's/-4/௪/g' -e 's/-5/௫/g' -e 's/-6/௬/g' -e 's/-7/௭/g' -e 's/-8/௮/g' -e 's/-9/௯/g' -e 's/i/இ/g' -e 's/I/ஈ/g' -e 's/a/அ/g' -e 's/A/ஆ/g' -e 's/e/எ/g' -e 's/E/ஏ/g' -e 's/i/இ/g' -e 's/I/ஈ/g' -e 's/u/உ/g' -e 's/U/ஊ/g' -e 's/o/ஒ/g' -e 's/O/ஓ/g' -e 's/q/ஃ/g' < $filename > $filename-converted.html
 
Old 07-12-2007, 03:50 PM   #7
osvaldomarques
Member
 
Registered: Jul 2004
Location: Rio de Janeiro - Brazil
Distribution: Conectiva 10 - Conectiva 8 - Slackware 9 - starting with LFS
Posts: 519

Rep: Reputation: 34
Hi mauran,

I guess you should look for "iconv", which is the tool to translate from one character set to another.
 
Old 07-12-2007, 04:03 PM   #8
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Quote:
Originally Posted by mauran
I'm directly input Tamil charecters into bash script using scim input method.

I've almost finished my code.

now I'm redirecting the output to a html file.
html file can be handled easily in these encoding stuff.

here is my code
Good Grief!!!
I am tempted to tell you that I spotted an error on line 76, but I think you would know better.

Actually, that printout might make a neat desktop background.....
 
Old 07-12-2007, 11:18 PM   #9
mauran
LQ Newbie
 
Registered: Dec 2005
Location: Sri Lanka
Distribution: unbuntu 7.04
Posts: 17

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by osvaldomarques
Hi mauran,

I guess you should look for "iconv", which is the tool to translate from one character set to another.

Thanks!!

That worked.

now no need to redirect to html. :-)
 
Old 07-12-2007, 11:25 PM   #10
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Quote:
#/bin/bash!
Change the first line to "!#/bin/bash"
 
Old 07-12-2007, 11:27 PM   #11
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
If you are using gnu sed, you can use the form:
Code:
sed 's/Xau/க்ஷௌ/g;s/Xai/க்ஷை/g;s/Xaa/க்ஷா/g'

which is the same as 

sed -e 's/Xau/க்ஷௌ/g' -e 's/Xai/க்ஷை/g' -e 's/Xaa/க்ஷா/g'
but, for such a long sed script, you might want to produce a sed script that you use as an argument to the -f option.

Last edited by jschiwal; 07-12-2007 at 11:28 PM.
 
Old 07-12-2007, 11:37 PM   #12
mauran
LQ Newbie
 
Registered: Dec 2005
Location: Sri Lanka
Distribution: unbuntu 7.04
Posts: 17

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by jschiwal
Change the first line to "!#/bin/bash"
Can I know the reason for this?

#/bin/bash! is working for me.

and.

thank you for the short form.
 
Old 07-12-2007, 11:56 PM   #13
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Actually, the books say: "#!/bin/bash"

But your version works also on my machine. However, my script also works if the line is completely deleted. Obviously, bash is the default.

Note that "#/bin/bash!" is likely just being seen as a comment.
 
Old 07-12-2007, 11:59 PM   #14
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
!# are two magic characters that the kernel looks for. If they are present, the rest of the line is taken as the shell to run.
#/bin/bash! is just plain wrong. Your script may work only because /bin/bash is already your default shell. Someone running your shell using ksh or csh would not be a lucky, unless the rest of your script would be work in both shells.
 
Old 07-13-2007, 02:46 AM   #15
mauran
LQ Newbie
 
Registered: Dec 2005
Location: Sri Lanka
Distribution: unbuntu 7.04
Posts: 17

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by jschiwal
Change the first line to "!#/bin/bash"

It's gives this error

Quote:
./roman.sh: line 1: !#/bin/bash: No such file or directory
:-(
 
  


Reply

Tags
script, shell, tamil, unicode



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
about wide character and multiple byte character George2 Programming 5 05-23-2006 01:03 AM
Shell Script - String Replacement revof11 Programming 7 11-29-2005 06:38 AM
Character replacement SeT Linux - General 1 11-18-2004 12:21 PM
Adding multiple user shell script plexus Programming 2 06-19-2004 08:36 PM
Escape character not working in shell script philipz Programming 1 04-29-2004 09:58 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:05 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration