LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Other *NIX Forums > Other *NIX
User Name
Password
Other *NIX This forum is for the discussion of any UNIX platform that does not have its own forum. Examples would include HP-UX, IRIX, Darwin, Tru64 and OS X.

Notices

Reply
 
LinkBack Search this Thread
Old 04-04-2012, 01:54 PM   #1
K-Veikko
LQ Newbie
 
Registered: Jul 2005
Posts: 11

Rep: Reputation: 0
sed space between captioned letters


I am using sed to feed festival TTS.

Is there a way to use sed to write a space [ ] between each letter in a CAPTIONED word of any length when all letters are captioned.

USA -> U S A
USA. -> U S A.
USA. -> U S A (also OK to loose dots etc. because the outcome is only listened)
XXXIIV -> X X X I I V
Word -> Word

- How to avoid space if only the first letter is captioned?
 
Old 04-05-2012, 08:20 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
Hi,

If I assume your example is relevant (there's not much info in your post) then this works:
Code:
sed -r '/\<[A-Z]+\>/{s/./& /g;s/\.//}' infile
The green part looks for capitalized words. The \< and \> make sure that individual words are matched.
The brown part changes individual characters to individual characters followed by a space.
The blue part removes the trailing dot.

Here's an example run:
Code:
$ cat infile
USA
USA.
XXXIIV
Word

$ sed -r '/\<[A-Z]+\>/{s/./& /g;s/\.//}' infile
U S A 
U S A  
X X X I I V 
Word
BTW: The above will not work when there are multiple words on one line......

Hope this helps.
 
1 members found this post helpful.
Old 04-05-2012, 08:32 AM   #3
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,360

Rep: Reputation: 1909Reputation: 1909Reputation: 1909Reputation: 1909Reputation: 1909Reputation: 1909Reputation: 1909Reputation: 1909Reputation: 1909Reputation: 1909Reputation: 1909
Here is a solution in awk:
Code:
{
  for (i = 1; i <= NF; i++)
    if ( $i == toupper($i)) {
      gsub(/[[:punct:]]/,"",$i)
      gsub(/./,"& ",$i)
      gsub(/ +$/,"",$i)
    }     
}
1
The first gsub removes punctuation, the second one adds a space after each character, the third one removes the extra blank space at the end of the word. Anyway, since you've posted in other *nix forum, it might not work for you. Which system are you running on? And which version of sed or awk/nawk/gawk do you have?
 
Old 04-07-2012, 07:52 AM   #4
K-Veikko
LQ Newbie
 
Registered: Jul 2005
Posts: 11

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by druuna View Post
Hi,

If I assume your example is relevant (there's not much info in your post) then this works:
Code:
sed -r '/\<[A-Z]+\>/{s/./& /g;s/\.//}' infile
I think I can use this because festival anyhow handles the words one by one. – Just did not think possibility to write one word per line.

My current "script" is in the next comment.
 
Old 04-07-2012, 07:53 AM   #5
K-Veikko
LQ Newbie
 
Registered: Jul 2005
Posts: 11

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by colucix View Post
Here is a solution in awk:
Code:
{
  for (i = 1; i <= NF; i++)
    if ( $i == toupper($i)) {
      gsub(/[[:punct:]]/,"",$i)
      gsub(/./,"& ",$i)
      gsub(/ +$/,"",$i)
    }     
}
1
The first gsub removes punctuation, the second one adds a space after each character, the third one removes the extra blank space at the end of the word. Anyway, since you've posted in other *nix forum, it might not work for you. Which system are you running on? And which version of sed or awk/nawk/gawk do you have?
Might work but I don't know how is it written into single line.

I am using Ubuntu 11.04 and
sed --version
GNU sed versio 4.2.1

My current "script" is:

Code:
cat input.txt | sed 's/\-\{1,\}\|\–\{1,\}\|\?\{1,\}\|\!\{1,\}\|\;\{1,\}\|\:\{1,\}\|\,\{1,\}\|\.\{1,\}\|\^\{1,\}\|\"\{1,\}\|\/\{1,\}\|\«\{1,\}\|\»\{1,\}/\n\n/g' | sed 's/\§/pykälä/g'    | sed 's/klo /kello /g'   | iconv -f UTF-8 -t ISO8859-1 -c    | text2wave -otype wav -eval '(language_finnish)' -o - | lame - output.mp3
 
Old 05-15-2012, 11:42 AM   #6
fakie_flip
Senior Member
 
Registered: Feb 2005
Location: san antonio, texas
Distribution: Fedora 64 bit RAID0 + LUKS, CentOS (server), Backtrack, Gentoo Hardened
Posts: 1,439

Rep: Reputation: 80
Quote:
Originally Posted by K-Veikko View Post
I am using sed to feed festival TTS.
http://linuxinnovations.blogspot.com...to-speach.html
 
  


Reply

Tags
sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed - change and capitalize some letters in words anascu Linux - Newbie 7 05-10-2011 12:50 PM
Using sed to replace newling with space binarybob0001 Programming 2 05-29-2008 03:32 PM
space between letters nicoc Mandriva 5 09-03-2004 12:32 PM
sed or awk question - replace caps with small letters computera Linux - General 1 12-30-2003 04:39 AM
Non-standard letters and space in directory structure over AppleShare? Dog and Pony Linux - Networking 0 02-11-2002 03:59 AM


All times are GMT -5. The time now is 08:53 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration