{"id":1375,"date":"2017-12-25T23:59:28","date_gmt":"2017-12-25T12:59:28","guid":{"rendered":"https:\/\/www.alexshoolman.com\/blog\/?p=1375"},"modified":"2024-08-08T10:27:54","modified_gmt":"2024-08-08T00:27:54","slug":"can-you-hear-the-difference-between-human-and-ai","status":"publish","type":"post","link":"https:\/\/www.alexshoolman.com\/blog\/2017\/12\/25\/can-you-hear-the-difference-between-human-and-ai\/","title":{"rendered":"Can You Tell The Difference Between Human And AI?"},"content":{"rendered":"<p><a href=\"https:\/\/arxiv.org\/abs\/1712.05884\" target=\"_blank\" rel=\"noopener\">Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions<\/a>.<\/p>\n<p>I think I fell asleep just reading that title. And yet, this new piece of research work direct from Google shows us some amazing new AI capabilities.<\/p>\n<p>It also makes for an entertaining new game!<\/p>\n<p>Meet Tacotron 2. A second iteration neural network built by Google Machine Learning engineers to synthesis ordinary written text into a natural, spoken word. This new program takes regular old sentences such as \u201c<em>This is your personal assistant Google Home.\u201d<\/em> and turns it into speech like below.<\/p>\n<audio class=\"wp-audio-shortcode\" id=\"audio-1375-1\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/wav\" src=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/ghome_nocomma.wav?_=1\" \/><a href=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/ghome_nocomma.wav\" target=\"_blank\" rel=\"noopener\">https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/ghome_nocomma.wav<\/a><\/audio>\n<p>As you can imagine, this has huge benefits across the board. From aiding the blind, to giving AI&#8217;s a human voice and even allowing feedback via <a href=\"https:\/\/www.alexshoolman.com\/blog\/2017\/12\/11\/smart-homes-are-the-new-smartphones\/\">smart home speakers<\/a>. Google has long worked on Text-to-Speech (TTS) and has recently seriously stepped up it&#8217;s game in making it more natural and human sounding.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.alexshoolman.com\/blog\/wp-content\/uploads\/2017\/12\/Google-Home-1024x576.jpeg\" alt=\"\" width=\"750\" height=\"422\" class=\"aligncenter size-large wp-image-1358\" title=\"\" srcset=\"https:\/\/www.alexshoolman.com\/blog\/wp-content\/uploads\/2017\/12\/Google-Home-1024x576.jpeg 1024w, https:\/\/www.alexshoolman.com\/blog\/wp-content\/uploads\/2017\/12\/Google-Home-300x169.jpeg 300w, https:\/\/www.alexshoolman.com\/blog\/wp-content\/uploads\/2017\/12\/Google-Home-768x432.jpeg 768w, https:\/\/www.alexshoolman.com\/blog\/wp-content\/uploads\/2017\/12\/Google-Home.jpeg 1600w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/p>\n<h3>New Kid On The Block<\/h3>\n<p>Despite Googles recent upgrade in making synthesised speech sound more natural, this new version I think takes it pretty much to it&#8217;s conclusion. An indistinguishable machine voice.<\/p>\n<p>Go ahead, try it out for yourself. Below are a few sentences that have both been spoken by a real person and also generated by the\u00a0Tacotron 2 neural network. Listen to both and see if you can tell which audio file belongs to which. The game of &#8220;Tacotron 2 or Human?&#8221;<\/p>\n<p><em> \u201cThat girl did a video about Star Wars lipstick.\u201d<\/em><\/p>\n<audio class=\"wp-audio-shortcode\" id=\"audio-1375-2\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/wav\" src=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/lipstick_gt.wav?_=2\" \/><a href=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/lipstick_gt.wav\" target=\"_blank\" rel=\"noopener\">https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/lipstick_gt.wav<\/a><\/audio>\n<audio class=\"wp-audio-shortcode\" id=\"audio-1375-3\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/wav\" src=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/lipstick_gen.wav?_=3\" \/><a href=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/lipstick_gen.wav\" target=\"_blank\" rel=\"noopener\">https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/lipstick_gen.wav<\/a><\/audio>\n<p><em><br \/>\n\u201cShe earned a doctorate in sociology at Columbia University.\u201d<\/em><\/p>\n<audio class=\"wp-audio-shortcode\" id=\"audio-1375-4\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/wav\" src=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/columbia_gen.wav?_=4\" \/><a href=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/columbia_gen.wav\" target=\"_blank\" rel=\"noopener\">https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/columbia_gen.wav<\/a><\/audio>\n<audio class=\"wp-audio-shortcode\" id=\"audio-1375-5\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/wav\" src=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/columbia_gt.wav?_=5\" \/><a href=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/columbia_gt.wav\" target=\"_blank\" rel=\"noopener\">https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/columbia_gt.wav<\/a><\/audio>\n<p><em><br \/>\n\u201cGeorge Washington was the first President of the United States.\u201d<\/em><\/p>\n<audio class=\"wp-audio-shortcode\" id=\"audio-1375-6\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/wav\" src=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/washington_gen.wav?_=6\" \/><a href=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/washington_gen.wav\" target=\"_blank\" rel=\"noopener\">https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/washington_gen.wav<\/a><\/audio>\n<audio class=\"wp-audio-shortcode\" id=\"audio-1375-7\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/wav\" src=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/washington_gt.wav?_=7\" \/><a href=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/washington_gt.wav\" target=\"_blank\" rel=\"noopener\">https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/demos\/washington_gt.wav<\/a><\/audio>\n<p>Would you like to know the answers?<\/p>\n<p>I&#8217;d happily tell you&#8230; if I knew myself. And that&#8217;s the point. I can&#8217;t tell. According to their research, most people can&#8217;t tell the difference either.<\/p>\n<blockquote><p><quoteblock>Our model achieves a mean opinion score (MOS) of 4.53 comparable to a MOS of 4.58 for professionally recorded speech.<\/quoteblock><\/p><\/blockquote>\n<h3>The Future<\/h3>\n<p>Next up I think I&#8217;d like to see more emotion and personality built into TTS. Kind of like an emotional equaliser.<\/p>\n<div id=\"attachment_1380\" style=\"width: 760px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1380\" src=\"https:\/\/www.alexshoolman.com\/blog\/wp-content\/uploads\/2017\/12\/Emotional-EQ-1024x618.jpg\" alt=\"\" width=\"750\" height=\"453\" class=\"size-large wp-image-1380\" title=\"\" srcset=\"https:\/\/www.alexshoolman.com\/blog\/wp-content\/uploads\/2017\/12\/Emotional-EQ-1024x618.jpg 1024w, https:\/\/www.alexshoolman.com\/blog\/wp-content\/uploads\/2017\/12\/Emotional-EQ-300x181.jpg 300w, https:\/\/www.alexshoolman.com\/blog\/wp-content\/uploads\/2017\/12\/Emotional-EQ-768x463.jpg 768w, https:\/\/www.alexshoolman.com\/blog\/wp-content\/uploads\/2017\/12\/Emotional-EQ.jpg 1119w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><p id=\"caption-attachment-1380\" class=\"wp-caption-text\">Would you prefer a happy go lucky AI voice? A sexy lady? Or someone who speaks soft and kindly?<\/p><\/div>\n<p>They could also enable it to evolve over time or even react to your moods. Maybe the next Amazon Echo will learn that, you know, in the morning&#8230; you don&#8217;t really want someone\u00a0<em><strong>super cheerful\u00a0<\/strong><\/em>talking to you. You want it&#8217;s voice to be soft and generally neutral and to the point. After all, you <em><strong>did\u00a0<\/strong><\/em>just wake up! Then when you get home on a Friday night it adapts to being more up beat, excited, happy and cracks jokes even.<\/p>\n<p>Whatever it ends up developing into it&#8217;s great to finally see TTS rise to the level of natural human speech. It&#8217;s been a long route from the original &#8220;robot voice&#8221; you&#8217;d hear generated back in the 1980&#8217;s. A hearty congratulations to the entire team who contributed to this achievement!<\/p>\n<p>If you&#8217;d like to hear some more samples or even read the full paper (which is available for free) head on <a href=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/index.html\" target=\"_blank\" rel=\"noopener\">over here<\/a>. Also, Merry Christmas!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Meet Tacotron 2. A second iteration neural network built by Google Machine Learning engineers to synthesis ordinary written text into a natural, spoken word.<\/p>\n","protected":false},"author":1,"featured_media":222,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[40,1],"tags":[26,25,55],"class_list":["post-1375","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-robotics","category-technology","tag-computers","tag-machine-learning","tag-smart-homes"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/www.alexshoolman.com\/blog\/wp-content\/uploads\/2017\/08\/Big-Sur-ML.jpg","jetpack_shortlink":"https:\/\/wp.me\/p92j6e-mb","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.alexshoolman.com\/blog\/wp-json\/wp\/v2\/posts\/1375","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.alexshoolman.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.alexshoolman.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.alexshoolman.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.alexshoolman.com\/blog\/wp-json\/wp\/v2\/comments?post=1375"}],"version-history":[{"count":10,"href":"https:\/\/www.alexshoolman.com\/blog\/wp-json\/wp\/v2\/posts\/1375\/revisions"}],"predecessor-version":[{"id":4888,"href":"https:\/\/www.alexshoolman.com\/blog\/wp-json\/wp\/v2\/posts\/1375\/revisions\/4888"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.alexshoolman.com\/blog\/wp-json\/wp\/v2\/media\/222"}],"wp:attachment":[{"href":"https:\/\/www.alexshoolman.com\/blog\/wp-json\/wp\/v2\/media?parent=1375"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.alexshoolman.com\/blog\/wp-json\/wp\/v2\/categories?post=1375"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.alexshoolman.com\/blog\/wp-json\/wp\/v2\/tags?post=1375"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}