I’ve recently started a job that requires a 1 hour commute each way, and so I decided to make the most of it by listening to audio books. I’ve finished about 3 audio books so far, but I realize that I will soon run out of interesting content to listen to (I’ve been listening to LibreVox free public domain audio books).
Exploring the world of Text To Speech (TTS) software led me to first examine espeak, which had too much of a robotic tone for my liking. I then stumbled upon Pico TTS on my cheap android tablet, which sounded too good to be true. Looking around, I found a linux project that uses it, PicoSpeaker. Pico is a TTS solution from the company SVOX Mobile Voices, which apparently specializes in text to speech solutions for devices. I’m not sure how the product ended up in linux as the packages sox and libttspico0, but they are their, and they work reasonably well. The frustrating problem I found, was that PicoSpeaker didn’t accept large files. So frustrating was this problem, that I continued to look around at different fixes.
I then checked out Festival, installed better voices, and still found the quality lacking in comparison to Pico TTS. I played with the gain, rate, pitch to make the different voices sound better to me, but it failed to make a difference (I tried out the MBROLA and CMU Arctic voices, samples here). Even though I could convert a complete file with these, they didn’t sound as good to my subjective ears.
To cut a long story short, much of my Saturday was spent on getting a TTS solution which would help me convert Text books to Audio books. To fix the file size limitation problem, I split up the file into 100 line parts with:
split -l 100 -d -a 4 Ebook_ Text_To_Convert.txt
This creates a set of text files with no extension, starting at Ebook_0000. Next I created the following script, which I named convert.sh:
if [ $# -eq 0 ]
echo “Type the base name of the file to convert, followed by enter:”
echo “Type name of author: ”
echo “Type name of book: ”
for f in $name*;
echo “Converting $f ..”
cat $f | ./picospeaker -o $f.ogg;
echo “Now adding tag information”
lltag –yes –clear -a “$author” -A “$book” -t “$f” $f.ogg
I run this script by making the script executable (chmod +x convert.sh) and provide it with the base name (Ebook_ in this case), the title of the Author (“Henry Thoreau” for example), and the title of the book. Note that if any of those have spaces, you need to put the words in quotes.
The end result is a pretty decent sounding audio book, that I can actually play at 120% (with the -r 20 flag provided to picospeaker) with all of the words intelligible. Here is a 6 minute sample of the audio, uploaded on Picosong (Picosong seems to be like the imgur of audio links, pretty nice service). This is a sample of it as I like to listen to it.
You may need an additional step to convert the audio into an mp3 format, and to do that, add the following before lltag:
ffmpeg -i $f.ogg -ab 128k $f.mp3
Note that this creates a larger file than the ogg, I’m not sure of the settings to make it better, but for now it will work. Better to ship something working, than nothing at all.