speex speexenc does not encode an audio file

felipe1982 · 12-10-2009, 01:03 AM

I have been messing around with speex (speexenc) for a few days, have have been replacing MP3 voice with SPX voice, compressing by an additional 20% (files are 20% smaller). All well. I've been running:

Code:

lame --quite --decode file.mp3 - | speexenc - file.spx

I tried today to decode/encode an mp3 into spx, but I get a strange error onscreen:

Quote:

Only 8 kHz (narrowband) and 16 kHz (wideband) supported (plus 11.025 kHz and 22.05 kHz, but your mileage may vary)

I haven't seen that error before. These are the flags I tried, mixed and matched them as best I could while retaining some sanity
* --le
* --be
* --8bit
* --16bit
* -w
* -u
* -n
* --vad
* --vbr

All gave the SAME error. I am *still* able to modify *other* mp3 files without a problem.

Next, I tried to decode with lame in WAV, and then try speexenc. Nothing.

Code:

lame --decode voice.mp3 voice.wav

And I make sure the file is indeed WAV.

Code:

~$ file voice.wav
 RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, stereo 24000 Hz

I've searched google a bit, and mailing lists for speex aren't helpful. Manual also doesn't show anything helpful. Any thoughts?

EDIT: The output for the lame decoding shows

Quote:

input: voice.mp3
(24 kHz, 2 channels, MPEG-2 Layer III)
output: ../WAV/voice.wav
(16 bit, Microsoft WAVE)
skipping initial 1105 samples (encoder+decoder delay)
skipping final 632 samples (encoder padding-decoder delay)
Frame#135030/135030 64 kbps MS

But *other* mp3 files show this when decoding to WAV

Quote:

input: other.mp3
(44.1 kHz, 1 channel, MPEG-1 Layer III)
output: ../WAV/other.wav
(16 bit, Microsoft WAVE)
skipping initial 1105 samples (encoder+decoder delay)
Frame#124035/124067 64 kbps

Do these differences matter? Why can't I encode to speex?

felipe1982 · 12-10-2009, 12:34 PM

any ideas guys why I can't convert this mp3 to speex?

felipe1982 · 12-21-2009, 01:18 AM

Quote:

Only 8 kHz (narrowband) and 16 kHz (wideband) supported (plus 11.025 kHz and 22.05 kHz, but your mileage may vary)

What does this error mean?

rmallins · 06-08-2010, 08:26 PM

The reason speexenc is complaining is that the native sample rate of your mp3 file, at 24kHz, is not one of the recommended sample rates for speex. I was just able to speex encode a more typical mp3 (sampled at 44100kHz) using your command, though I still got the same warning message so it looks like speexenc is just rejecting a 24kHz sample rate for some reason.

You can work around the problem by resampling the audio, as I'll demonstrate further down.

Audio mp3s typically use the audio CD sample rate of 44100 samples per second. DVD audio (whether encoded as mp3 or AAC) is typically sampled at 48kHz (the same as the highest quality setting for DAT tapes). The sample rate limits the highest frequency you can reproduce (look up "nyquist frequency" for more info). Speex is designed for human speech, which usually doesn't contain much high frequency information. As a reduction of sample rate reduces the required data rate, speex uses lower sample rates than typically used for music files.

Speex is designed to encode/decode at 8k, 16k or 32k samples per second, so the encoder complains when it's not fed audio to this spec.

mplayer can resample audio, but can't easily output to stdout, so here's how to use mplayer and speexenc tied together with a named pipe/named fifo:

The input file I'm using is 128kbps mp3 (44100Hz sample rate), as shown by file:

rmallins@rmallins-desktop:~/tmp$ file audio_book_disk_1.mp3
audio_book_disk_1.mp3: Audio file with ID3 version 2.4.0, contains: MPEG ADTS, layer III, v1, 128 kbps, 44.1 kHz, JntStereo
rmallins@rmallins-desktop:~/tmp$

1) create a named fifo and tell mplayer to resample the audio as 16kHz pcm (as used in wav files) to it:

rmallins@rmallins-desktop:~/tmp$ mkfifo named_fifo
rmallins@rmallins-desktop:~/tmp$ mplayer -srate 16000 -ao pcm:file=named_fifo audio_book_disk_1.mp3

(you could equally well use -srate of 32000 or 8000 here for speex and get respectively better or worse final sound quality )
mplayer will now be blocked having almost immediately filled the fifo and be waiting to write data to it, so we need a new process to run speexenc to read data from the fifo:

2) start a new shell in the same directory, and make speexenc encode data from the fifo:

rmallins@rmallins-desktop:~/tmp$ speexenc named_fifo audio_book_disk_1_16kHz.spx

After hitting return you'll notice activity in both shells, as mplayer is now able to write to the fifo as speexenc reads data from it.

Afterwards you'll find your speex encoded (and dramatically smaller) file:

rmallins@rmallins-desktop:~/tmp$ ll -h
total 83M
-rw-r--r-- 1 rmallins rmallins 16M 2010-06-09 02:16 audio_book_disk_1_16kHz.spx
-rw-r--r-- 1 rmallins rmallins 67M 2010-06-09 01:27 audio_book_disk_1.mp3
prw-r--r-- 1 rmallins rmallins 0 2010-06-09 02:16 named_fifo
rmallins@rmallins-desktop:~/tmp$

Now the named_fifo can be deleted.

felipe1982 · 06-08-2010, 10:19 PM

Hey. thanks for answering this ancient post. I had actually forgotten about it, and written off. I shall try your suggestions. Thank you.

Ken_Fallon · 06-30-2010, 10:26 AM

Thanks rmallins for the excellent post.

Can I suggest using sox instead of mplayer as it will allow you to re-sample and re-encode in one operation.

Code:

sox audio_book_disk_1.mp3 -t wav -r 16000 - | speexenc - audio_book_disk_1_16kHz.spx

The sox flags are "-t wav" so it knows to output in wav format and "-r 16000" which is the actual re-sample.

The use of the dash "-" tells sox to send the output to standard out and tells speexenc to read from standard in.