LinuxQuestions.org - sed space between captioned letters

- Other *NIX (https://www.linuxquestions.org/questions/other-%2Anix-55/)

- - sed space between captioned letters (https://www.linuxquestions.org/questions/other-%2Anix-55/sed-space-between-captioned-letters-938121/)

sed space between captioned letters

I am using sed to feed festival TTS.

Is there a way to use sed to write a space [ ] between each letter in a CAPTIONED word of any length when all letters are captioned.

USA -> U S A
USA. -> U S A.
USA. -> U S A (also OK to loose dots etc. because the outcome is only listened)
XXXIIV -> X X X I I V
Word -> Word

- How to avoid space if only the first letter is captioned?

Hi,

If I assume your example is relevant (there's not much info in your post) then this works:

Code:

sed -r '/\<[A-Z]+\>/{s/./& /g;s/\.//}' infile

The green part looks for capitalized words. The \< and \> make sure that individual words are matched.
The brown part changes individual characters to individual characters followed by a space.
The blue part removes the trailing dot.

Here's an example run:

Code:

$ cat infile

USA

USA.

XXXIIV

Word



$ sed -r '/\<[A-Z]+\>/{s/./& /g;s/\.//}' infile

U S A 

U S A  

X X X I I V 

Word

BTW: The above will not work when there are multiple words on one line......

Hope this helps.

Here is a solution in awk:

Code:

{

  for (i = 1; i <= NF; i++)

    if ( $i == toupper($i)) {

      gsub(/[[:punct:]]/,"",$i)

      gsub(/./,"& ",$i)

      gsub(/ +$/,"",$i)

    }    

}

1

The first gsub removes punctuation, the second one adds a space after each character, the third one removes the extra blank space at the end of the word. Anyway, since you've posted in other *nix forum, it might not work for you. Which system are you running on? And which version of sed or awk/nawk/gawk do you have?

Quote:

Originally Posted by druuna (Post 4645422)

Hi,

If I assume your example is relevant (there's not much info in your post) then this works:

Code:

sed -r '/\<[A-Z]+\>/{s/./& /g;s/\.//}' infile

I think I can use this because festival anyhow handles the words one by one. – Just did not think possibility to write one word per line.

My current "script" is in the next comment.

Quote:

Originally Posted by colucix (Post 4645435)

Here is a solution in awk:

Code:

{

  for (i = 1; i <= NF; i++)

    if ( $i == toupper($i)) {

      gsub(/[[:punct:]]/,"",$i)

      gsub(/./,"& ",$i)

      gsub(/ +$/,"",$i)

    }    

}

1

Might work but I don't know how is it written into single line.

I am using Ubuntu 11.04 and
sed --version
GNU sed versio 4.2.1

My current "script" is:

Code:

cat input.txt | sed 's/\-\{1,\}\|\–\{1,\}\|\?\{1,\}\|\!\{1,\}\|\;\{1,\}\|\:\{1,\}\|\,\{1,\}\|\.\{1,\}\|\^\{1,\}\|\"\{1,\}\|\/\{1,\}\|\«\{1,\}\|\»\{1,\}/\n\n/g' | sed 's/\§/pykälä/g'    | sed 's/klo /kello /g'  | iconv -f UTF-8 -t ISO8859-1 -c    | text2wave -otype wav -eval '(language_finnish)' -o - | lame - output.mp3

Quote:

Originally Posted by K-Veikko (Post 4644766)

I am using sed to feed festival TTS.

http://linuxinnovations.blogspot.com...to-speach.html