Page 1

Dominican Sisters of Blauvelt ​hello got it okay good morning everyone my name is Tim Tuttle I'm the CEO and founder of expect labs today I'm going to talk to you about how some of the advances that we've seen in ASR recently are going to enable a new generation of voice driven applications over the next few years that are going to fundamentally change how we interact with computing devices and I'm also going to show you a demonstration of a set of technologies that now make it dramatically easier for people to create very good intelligent boys driven applications for the first time so as I said on the CEO of expect labs we are a technology company based in San Francisco and our primary product is a developer platform called mind meld mind meld is a platform that lets companies create really great voice driven interfaces customized around their own content domain this is a product that we launched less than a year ago we already have about a thousand companies that are using it to build a wide variety of voice driven applications and based on this traction we just announced a large financing round and our backers include companies like Google and Samsung and Intel telefonica Liberty global and others the opportunity that everyone is excited about including some folks in this room and certainly the people are involved in our company has really captured in this graph specifically there have been breakthroughs their dances in ASR that looked to solve this problem for the first time since the beginning of a I my background is an AI research I I originally started as an academic I used to be on the research staff at MIT as well as Bell Labs and if you told me 10 years ago that we would have solved the speech rectum I would have thought you were crazy as and I think that's a sentiment that would be shared by a lot of other AI researchers at the time this is in our opinion transformative what we've seen in the past two two-and-a-half years is advances in accuracy and speech recognition systems of thirty percent or more and that's dwarfed all the improvements that we've seen over the past two decades this is a big deal most researchers agree believe that certainly for English speed machine speech rec will be better than humans for the first time this is a big deal most people outside of this room are not aware of these breakthroughs and this technology is poised to transform how we all interact with devices and this doesn't come a moment too soon because we are going to live in a world in a few years where we're surrounded by three billion different computing devices and all these devices have really great omnidirectional microphones fewer than five percent have physical physical keyboards so in this world if you want to continue to have rich interactions with information through your computing devices voice becomes critical so what does this mean this means that over the next few years you're going to start to see voice becoming a feature in many many different applications so users are going to start noticing that products like Siri and Google Voice Search are getting surprisingly good at understanding them and they're going to expect that same type of convenience in every application every device that they use and so over the next three years are gonna see voice appear everywhere but this is not where it stops as this technology gets better and better these same advances in deep learning that our solving speech rack are going to be applied to solve language understanding more generally and as that happens the world that we live in is going to look a lot more like the things that we see in science fiction movies but this is not going to be 50 years out this is going to be a decade or less and in this world for many applications voice becomes the primary interface and touch becomes secondary the reason this will occur is because if voice is a primary interface anybody can walk up to a device or an application and automatically know how to use it without having to learn a new UI without having to read a manual you can just interact with any device or app the same way you would another human that makes this technology far more accessible than any other type of you I that we've ever invented to date this is really exciting I obviously the people at me and the people become you think this is probably the most exciting area of computer science right now certainly one of them but it will be transformative over the next 10 years the way that everybody builds applications way that developers create applications the way that you interact with applications will fundamentally change and for people in this room and for the people in Silicon Valley that represents a huge opportunity to create a new set of tools to build all of these great new apps that we've all seen in science fiction movies and so this represents a big challenge for developers today just because we solved the ASR problem or on the on the brink of solving ASR problem doesn't mean it's still easy to create applications that are do a very good job

on understanding people in fact if you're a developer today and you want to create an application that can listen and understand to a high degree of accuracy what people are saying you still have to solve a number of other very hard problems specifically the first thing you have to do is you have to create this knowledge graph that captures all the concepts that matter in your domain so if you're trying to build a very good voice driven medical assistant you need to capture all of the millions of concepts that matter for diagnosing diseases and treatment options that is no small task once you have that knowledge graph you then have to use that to power the features in your language models so that you can understand what a user is saying today you you generally need to have a room for the PhDs in order to get that right and it's going to take you six months a year or more once you have a sense of what the user is trying to ask for you then need to solve the search problem which is you have to go out and find the exact right answer from potentially thousands or millions of potential answers and that's a big statistical search and ranking problem that generally that expertise only lives in companies like Google and Microsoft and Apple and a handful of other companies and then last of course if you want to have a really great voice experience that's intuitive for the user generally these systems require a new type of dynamic adaptive user interface that shows users as they're talking what it understands and with every progressive word the system gets better and better that's very different than web or mobile development today and most front-end developers don't have expertise in this area so today if you're looking to build one of these applications you're talking about investing a year or two years and you need to have an extremely smart team with people that are going to spend a lot of time and resources on this and this is why you don't see a whole lot of really great voice apps today because it's still very hard so what we're doing is a company what other companies are doing is trying to take all of this complex technology that sits on top of really great speech recognition and put it into easy use easy to use services that mere mortals can use to create a really great voice app and so our company is taking steps in this direction the product that we have called mind-meld is just that it's a developer platform and cloud service that's trying to make it effortless to create a really good intelligent voice ey so any of you can go to our website my mail com sign up for an account and build some small apps and see how it works to get started but generally what our customers do is they sign up for an account and then they they tell us where their content needs they either give us access to a database or point us through their website and we have these automated crawling process it will automatically analyze their content create models that will then power the language understanding and then the second thing or customers do is they drop in our client libraries that work across all platforms and that creates this really great dynamic voice you I that allows user to ask questions and get really good answers you know this is an example of a application we built for tablets that allows you to do really good voice driven content discovery this is one of many applications that we're supporting but rather than show you video I'll show you an actual demo the product I'll build a simple application and show you how simple it can be to build a basic voice driven application and any you can try this out as well on our website alright so so what I'll do is I'll go to our developer center which is mind meld calm and this is where you can go and sign up for an account and read extensive documentation get access to sample code I will login to my existing account and when you log in what you see is your developer console which is where you manage your apps I will create a new app and the app I'm going to create is an app that will be able to answer questions about movies so you'll be able to ask questions about Hollywood movies or actors or celebrities and it should give you helpful information so i'll call this app movie genius know everything about movies and as i mentioned before if the content that you want to create your knowledge graph is on your own website we have tools that make it easy to convert that website into a structured data representation that powers the language models in particular we have this tool called the crawl manager which I got I can point to generally my own website or a site that has lots of information about movies and you can click go and what happens in our system on the backend we're spinning up a bunch of what we call semantic crawler processes and these crawlers are going to the website that I specified they're looking for things like semantic markup tags or any other metadata that gives an indication of the type of content on those pages so january if you have a site that's already been optimized for search engine or facebook distribution this tool works very well and so once this the crawlers have an idea of the pattern of objects they're looking for in this case movies and the crawlers will go from page to page looking for objects that match that pattern and every time it finds one they will add it as a new note to the knowledge graph and that's what you see on the right-hand side of this console each one of these objects coming in is a new structure data object and you know to the knowledge graph that captures a unique movie object and so without any programming you can build up this database that contains potentially millions of unique concepts or nodes and a knowledge graph and that becomes the foundation that you use to then build your intelligent voice search interface and so that's step number one I'll show you step number two which is doing the front end piece of it so creating a really good voice front end is is a tricky task to get it right you've all used a lot of bad voice experiences and I'm sure you

know what a bad one is there are really good ones out there and requires having a really great UI and so we've taken a lot of this heavy lifting and simplified it by creating these developer SDKs that run on iOS Android and browsers and literally as simple as dropping in a few lines of code into your application to get a really great voice experience up and running this so if for example you want to create a simple browser based app that allows allows a user to speak and ask questions and then get really great information you can these are the 10 lines of code that you have to copy into your web application in order to set up a really great default UI I've dropped this code into this browser app and I'll show you what the basic UI looks like so you can get a sense of how it works sorry so let me run a few examples to show you the type of accuracy you can expect with this so what's that movie where Tom Hanks gets attacked by the Somali parts so obviously I'm looking for Captain Phillips and just like that it shows up in the search results obviously you could show just that answer if that's what your application wanted to do the API lets you do that but without writing a single line of code you have a very good voice driven interface that allows you to find content from your domain with very good accuracy let me do one more example so you can get a sense of how it works show me movies that are directed by martin scorsese and just show me the ones at star leonardo dicaprio and what's that one where he plays the undercover cop and the Boston Police Force so obviously i'm looking for the departed its i can ask a few questions it shows up immediately think how useful that would be if you're sitting on the couch trying to find a video to watch or if you're on your tablet you don't want to type a bunch of searches and all this was done in a matter of minutes without having to write any code so this is something you check out yourself please go to my mail com you can play around with it for free and if you want more information we have some people that will be outside and happy to answer any questions for you and that's it I'm happy to answer any questions now any questions anything okay there's one in the front what about mood based movies so here you're saying you know specifically words around the movie director etc but sometimes you might be in a mood right so it's raining outside you've got your hot chocolate you want you know a type of genre or mood based movie how does it deal with that so we have an adapter that plugs into your central nervous system and can actually read now so that our system is a voyeur and content discovery platform and so it relies on having good metadata that's available that characterizes the movies themselves so there are many movie databases that talk about sort of what the genre of the movie is or what the what the mood of the movie is and certainly if you then express that through voice we would be able to find movies that are you know comedy movies or dramas or things like that but being able to read your emotions is not something that our technology does but I've heard some other talks I think some other people are working on that anything else any last questions there's a couple life of them sorry who is okay yeah we run here when you train your language model where did you get the labels for the training training labels yeah so the so currently the way that we're doing that is we have a set of training data that we've generated internally largely through crowdsourcing that we use to train but as we as we get more and more user data we're going to use that as the source of our training data and there's another one in the front and then it will get to the one in the back do you plan to support other languages uh yes so the question is we plan to support other languages so right now we support eight languages which are mostly Western European languages by the end of this year we expect the support in addition Korean Japanese Chinese but right now i will say in advance that the ASR and language understanding models that have the best accuracy or generally english right now but we expect the same improvement to happen with other languages over the next one to two years what what possibilities there for some degree of customization say add very specialized engineer I have an engineering app and so I'm gonna have specialized language around that or if I wanted added mood and it was my own database that was getting crawled and I could add those tags in what is that those are real possibilities here so the question is what what room is there for customization and the if you go to our developer documentation you'll see that it's all it is highly customizable not just on the front end your ability to create your own custom UI but the backend is really designed for you to control manipulate and manage your own data as well as your own ranking so that you can create a custom map and that really is our business model if you want one size fits all voice interface you can use Siri or Cortana or Google now but our customers generally want to create their own experience in their own app and our platform is dying to be very flexible to allow that maybe we have time for one more question I think is there one oh yeah can you give us a sense of the sort of upper bound of the size of the knowledge graph or a corpus of information that you've been able to practically process yeah so in terms of the size of knowledge graph the sweet spot for our technology is generally in this the millions hundreds of thousands to tens of millions that's really what we sighs our system to cover and that generally applies to you know thousands of companies that have large data sets but not web wide data sets right so if you want to go into the hundreds of millions or billions you're talking about web wide search and that's not the area we're focused on we're really trying to make it possible for pennies that have millions of data objects or concepts to have really great accurate natural language search around those specific content domains okay I think

that's all the time we have thank you very much Eugene Lang College, School for Liberal Arts.