LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Other *NIX (https://www.linuxquestions.org/questions/other-%2Anix-55/)
-   -   sed space between captioned letters (https://www.linuxquestions.org/questions/other-%2Anix-55/sed-space-between-captioned-letters-938121/)

K-Veikko 04-04-2012 01:54 PM

sed space between captioned letters
 
I am using sed to feed festival TTS.

Is there a way to use sed to write a space [ ] between each letter in a CAPTIONED word of any length when all letters are captioned.

USA -> U S A
USA. -> U S A.
USA. -> U S A (also OK to loose dots etc. because the outcome is only listened)
XXXIIV -> X X X I I V
Word -> Word

- How to avoid space if only the first letter is captioned?

druuna 04-05-2012 08:20 AM

Hi,

If I assume your example is relevant (there's not much info in your post) then this works:
Code:

sed -r '/\<[A-Z]+\>/{s/./& /g;s/\.//}' infile
The green part looks for capitalized words. The \< and \> make sure that individual words are matched.
The brown part changes individual characters to individual characters followed by a space.
The blue part removes the trailing dot.

Here's an example run:
Code:

$ cat infile
USA
USA.
XXXIIV
Word

$ sed -r '/\<[A-Z]+\>/{s/./& /g;s/\.//}' infile
U S A
U S A 
X X X I I V
Word

BTW: The above will not work when there are multiple words on one line......

Hope this helps.

colucix 04-05-2012 08:32 AM

Here is a solution in awk:
Code:

{
  for (i = 1; i <= NF; i++)
    if ( $i == toupper($i)) {
      gsub(/[[:punct:]]/,"",$i)
      gsub(/./,"& ",$i)
      gsub(/ +$/,"",$i)
    }   
}
1

The first gsub removes punctuation, the second one adds a space after each character, the third one removes the extra blank space at the end of the word. Anyway, since you've posted in other *nix forum, it might not work for you. Which system are you running on? And which version of sed or awk/nawk/gawk do you have?

K-Veikko 04-07-2012 07:52 AM

Quote:

Originally Posted by druuna (Post 4645422)
Hi,

If I assume your example is relevant (there's not much info in your post) then this works:
Code:

sed -r '/\<[A-Z]+\>/{s/./& /g;s/\.//}' infile

I think I can use this because festival anyhow handles the words one by one. – Just did not think possibility to write one word per line.

My current "script" is in the next comment.

K-Veikko 04-07-2012 07:53 AM

Quote:

Originally Posted by colucix (Post 4645435)
Here is a solution in awk:
Code:

{
  for (i = 1; i <= NF; i++)
    if ( $i == toupper($i)) {
      gsub(/[[:punct:]]/,"",$i)
      gsub(/./,"& ",$i)
      gsub(/ +$/,"",$i)
    }   
}
1

The first gsub removes punctuation, the second one adds a space after each character, the third one removes the extra blank space at the end of the word. Anyway, since you've posted in other *nix forum, it might not work for you. Which system are you running on? And which version of sed or awk/nawk/gawk do you have?

Might work but I don't know how is it written into single line.

I am using Ubuntu 11.04 and
sed --version
GNU sed versio 4.2.1

My current "script" is:

Code:

cat input.txt | sed 's/\-\{1,\}\|\–\{1,\}\|\?\{1,\}\|\!\{1,\}\|\;\{1,\}\|\:\{1,\}\|\,\{1,\}\|\.\{1,\}\|\^\{1,\}\|\"\{1,\}\|\/\{1,\}\|\«\{1,\}\|\»\{1,\}/\n\n/g' | sed 's/\§/pykälä/g'    | sed 's/klo /kello /g'  | iconv -f UTF-8 -t ISO8859-1 -c    | text2wave -otype wav -eval '(language_finnish)' -o - | lame - output.mp3

fakie_flip 05-15-2012 11:42 AM

Quote:

Originally Posted by K-Veikko (Post 4644766)
I am using sed to feed festival TTS.

http://linuxinnovations.blogspot.com...to-speach.html


All times are GMT -5. The time now is 07:58 PM.