I think I fell asleep just reading that title. And yet, this new piece of research work direct from Google shows us some amazing new AI capabilities.
It also makes for an entertaining new game!
Meet Tacotron 2. A second iteration neural network built by Google Machine Learning engineers to synthesis ordinary written text into a natural, spoken word. This new program takes regular old sentences such as “This is your personal assistant Google Home.” and turns it into speech like below.
As you can imagine, this has huge benefits across the board. From aiding the blind, to giving AI’s a human voice and even allowing feedback via smart home speakers. Google has long worked on Text-to-Speech (TTS) and has recently seriously stepped up it’s game in making it more natural and human sounding.
New Kid On The Block
Despite Googles recent upgrade in making synthesised speech sound more natural, this new version I think takes it pretty much to it’s conclusion. An indistinguishable machine voice.
Go ahead, try it out for yourself. Below are a few sentences that have both been spoken by a real person and also generated by the Tacotron 2 neural network. Listen to both and see if you can tell which audio file belongs to which. The game of “Tacotron 2 or Human?”
“That girl did a video about Star Wars lipstick.”
“She earned a doctorate in sociology at Columbia University.”
“George Washington was the first President of the United States.”
Would you like to know the answers?
I’d happily tell you… if I knew myself. And that’s the point. I can’t tell. According to their research, most people can’t tell the difference either.
Our model achieves a mean opinion score (MOS) of 4.53 comparable to a MOS of 4.58 for professionally recorded speech.
Next up I think I’d like to see more emotion and personality built into TTS. Kind of like an emotional equaliser.
They could also enable it to evolve over time or even react to your moods. Maybe the next Amazon Echo will learn that, you know, in the morning… you don’t really want someone super cheerful talking to you. You want it’s voice to be soft and generally neutral and to the point. After all, you did just wake up! Then when you get home on a Friday night it adapts to being more up beat, excited, happy and cracks jokes even.
Whatever it ends up developing into it’s great to finally see TTS rise to the level of natural human speech. It’s been a long route from the original “robot voice” you’d hear generated back in the 1980’s. A hearty congratulations to the entire team who contributed to this achievement!
If you’d like to hear some more samples or even read the full paper (which is available for free) head on over here. Also, Merry Christmas!
The benefits include: 1) How to pay off your mortgage faster than 99% of people with one hour a month of work 2) How to get rid of your debt and have the freedom to spend money on the things you love, guilt free 3) Clear outline of how to setup your expenses, mortgage and general finance 4) How offset accounts work and how to get the same result without being gouged by the big banks 5) How to cut through the crap and focus on the things that truly matter when taking down a mortgage 6) How to adjust the strategy so it works for you, even if you have kids, even if you only have one income 7) How to do all of these things and maintain a normal social life (and never be cheap).