LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Other *NIX Forums > Other *NIX
User Name
Password
Other *NIX This forum is for the discussion of any UNIX platform that does not have its own forum. Examples would include HP-UX, IRIX, Darwin, Tru64 and OS X.

Notices


Reply
  Search this Thread
Old 04-04-2012, 01:54 PM   #1
K-Veikko
LQ Newbie
 
Registered: Jul 2005
Posts: 11

Rep: Reputation: 0
sed space between captioned letters


I am using sed to feed festival TTS.

Is there a way to use sed to write a space [ ] between each letter in a CAPTIONED word of any length when all letters are captioned.

USA -> U S A
USA. -> U S A.
USA. -> U S A (also OK to loose dots etc. because the outcome is only listened)
XXXIIV -> X X X I I V
Word -> Word

- How to avoid space if only the first letter is captioned?
 
Old 04-05-2012, 08:20 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

If I assume your example is relevant (there's not much info in your post) then this works:
Code:
sed -r '/\<[A-Z]+\>/{s/./& /g;s/\.//}' infile
The green part looks for capitalized words. The \< and \> make sure that individual words are matched.
The brown part changes individual characters to individual characters followed by a space.
The blue part removes the trailing dot.

Here's an example run:
Code:
$ cat infile
USA
USA.
XXXIIV
Word

$ sed -r '/\<[A-Z]+\>/{s/./& /g;s/\.//}' infile
U S A 
U S A  
X X X I I V 
Word
BTW: The above will not work when there are multiple words on one line......

Hope this helps.
 
1 members found this post helpful.
Old 04-05-2012, 08:32 AM   #3
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Here is a solution in awk:
Code:
{
  for (i = 1; i <= NF; i++)
    if ( $i == toupper($i)) {
      gsub(/[[:punct:]]/,"",$i)
      gsub(/./,"& ",$i)
      gsub(/ +$/,"",$i)
    }     
}
1
The first gsub removes punctuation, the second one adds a space after each character, the third one removes the extra blank space at the end of the word. Anyway, since you've posted in other *nix forum, it might not work for you. Which system are you running on? And which version of sed or awk/nawk/gawk do you have?
 
Old 04-07-2012, 07:52 AM   #4
K-Veikko
LQ Newbie
 
Registered: Jul 2005
Posts: 11

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by druuna View Post
Hi,

If I assume your example is relevant (there's not much info in your post) then this works:
Code:
sed -r '/\<[A-Z]+\>/{s/./& /g;s/\.//}' infile
I think I can use this because festival anyhow handles the words one by one. – Just did not think possibility to write one word per line.

My current "script" is in the next comment.
 
Old 04-07-2012, 07:53 AM   #5
K-Veikko
LQ Newbie
 
Registered: Jul 2005
Posts: 11

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by colucix View Post
Here is a solution in awk:
Code:
{
  for (i = 1; i <= NF; i++)
    if ( $i == toupper($i)) {
      gsub(/[[:punct:]]/,"",$i)
      gsub(/./,"& ",$i)
      gsub(/ +$/,"",$i)
    }     
}
1
The first gsub removes punctuation, the second one adds a space after each character, the third one removes the extra blank space at the end of the word. Anyway, since you've posted in other *nix forum, it might not work for you. Which system are you running on? And which version of sed or awk/nawk/gawk do you have?
Might work but I don't know how is it written into single line.

I am using Ubuntu 11.04 and
sed --version
GNU sed versio 4.2.1

My current "script" is:

Code:
cat input.txt | sed 's/\-\{1,\}\|\–\{1,\}\|\?\{1,\}\|\!\{1,\}\|\;\{1,\}\|\:\{1,\}\|\,\{1,\}\|\.\{1,\}\|\^\{1,\}\|\"\{1,\}\|\/\{1,\}\|\«\{1,\}\|\»\{1,\}/\n\n/g' | sed 's/\§/pykälä/g'    | sed 's/klo /kello /g'   | iconv -f UTF-8 -t ISO8859-1 -c    | text2wave -otype wav -eval '(language_finnish)' -o - | lame - output.mp3
 
Old 05-15-2012, 11:42 AM   #6
fakie_flip
Senior Member
 
Registered: Feb 2005
Location: San Antonio, Texas
Distribution: Gentoo Hardened using OpenRC not Systemd
Posts: 1,495

Rep: Reputation: 85
Quote:
Originally Posted by K-Veikko View Post
I am using sed to feed festival TTS.
http://linuxinnovations.blogspot.com...to-speach.html
 
  


Reply

Tags
sed



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed - change and capitalize some letters in words anascu Linux - Newbie 7 05-10-2011 12:50 PM
Using sed to replace newling with space binarybob0001 Programming 2 05-29-2008 03:32 PM
space between letters nicoc Mandriva 5 09-03-2004 12:32 PM
sed or awk question - replace caps with small letters computera Linux - General 1 12-30-2003 04:39 AM
Non-standard letters and space in directory structure over AppleShare? Dog and Pony Linux - Networking 0 02-11-2002 03:59 AM

LinuxQuestions.org > Forums > Other *NIX Forums > Other *NIX

All times are GMT -5. The time now is 04:05 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration