Page 1

Dominican Sisters of Blauvelt ​[Music] so hi everyone hope everybody had a good lunch sir I'm sorry sir Raja Dean I'm an engineer on tensorflow light team and I have my colleague Andrew Celine also on the same team and we're really excited today to talk to you about all the work that we've been doing to bring machine learning to mobile devices so in today's talk we'll cover three areas first we'll talk about how machine learning on device is different and important then we'll talk about all the work that we've been doing on terms of low-light and lastly we'll talk about how you can use it in your own apps so before that let's talk about devices for a little bit so usually when we say device we mean a mobile device basically a phone and our phones are with us all the time these days they have lots of sensors which are giving rich data about the physical world around us and lastly we use our phones all the time another category of devices is edge devices and this industry has seen a huge growth in the last few years so by some estimates there right now 23 billion connected devices so smart speakers smart watches smart sensors what-have-you and an interesting trend to notice here is that technology which only used to be available on the more expensive devices is now available on the cheaper ones so this rapid increase in the availability of these more and more capable devices has now opened up many opportunities free doing machine learning on device in addition to that though there are several other reasons why you may consider doing on device machine learning and probably the most important one is latency so if you're processing streaming data such as audio or video then you don't want to be making calls back and forth to the server other reasons are that your processing can happen even when your device is offline sensitive data can stay on device it's more power efficient because the device is not sending data back and forth and lastly we are in a position to take advantage of all the sensor data that is already present on the device so all that is great but there's also a catch and the catch is that on device machine learning is hard and the reason it is hard is that many of these devices have some pretty tight constraints small batteries low compute power tight memory and tensorflow wasn't a great fit for this and that is the reason we build tensorflow light which is a lightweight library and tools for doing machine learning on embedded and small platforms so we launched tensorflow light late last year in Developer Preview and since then we've been working on adding more features and support to it so I'll just walk you through the high-level design of the system so we have the tensorflow light format and this is different than what tensorflow uses and we had to do so for reasons of efficiency then there's the interpreter which runs on device then there are a set of optimized kernels and then there are interfaces which you can use to take advantage of hardware acceleration when it is available it's cross-platform so it supports Android and iOS and I'm really happy to say today that we also have support for Raspberry Pi and pretty much most other devices which are running Linux so the developer workflow roughly is that you take a trained tensorflow model and then you convert it to the tensorflow lite format using a converter and then you update your apps to invoke the interpreter using the Java or C++ API so one other thing that I want to call out here is that iOS develop developers have another option they can convert their tray intensive flow graph into the core ml format and use a core ml runtime directly and this tensorflow the core ml converter is something that we worked on together with folks who built core ml so there are some questions that are top of mind for people every time we talk about tensorflow line and the two most common ones are like is it small in size and is it fast so let's talk about the first one so keeping tensorflow light small was a key goal for us when we started building this so the size of our interpreter is only 75 kilobytes and when you include all the support adopts this is 400 kilobytes another thing that is worth noting here is a feature called selective registration so developers have the option to only include the opps that their models need and link those and thereby keep the footprint small so how did we do this so first of all we've been pretty careful in terms of which dependencies we include and the second thing is that tensorflow light uses flat buffers which are more memory efficient than protocol buffers are and then moving on to the next question which is performance to performance super important goal for us and we made design choices throughout the system to make it so so let's look at the first thing which is the tensor flow light format so we use flat buffers to represent models and flatbuffers is this cross-platform serialization library that was originally developed for game development use cases and since then it's been used in other performance sensitive applications and the

advantage of using flat buffers is that we are able to directly access data without going through heavyweight parsing or unpacking steps of the large files which contain weights another thing that we do at conversion time is that we pre fuse the activations and biases and that allows us to execute faster later on the tensorflow Lite interpreter uses static memory and execution plans which allows us to load up faster there are a set of optimized kernels which have been optimized to run fast on the neon on arm platforms we wanted to build tensorflow lights so that we can take advantage of all the innovations that are happening in silicon for these devices so the first thing here is that tensorflow light supports the Android neural network API the Qualcomm hexagon DSP driver is going to be coming out soon as part of the Android P developer release and who are we and mediatek are a couple of hardware vendors who have also announced their integration with android neural network API so we should be seeing those in the coming months as well second thing here is that we have also been working on adding direct GPU acceleration and we are using OpenGL on Android and metal on iOS so quantization is the last bit that I want to talk about in the context of performance so roughly speaking quantization is this technique to store numbers and perform calculations on them in representations which are more compact than 32-bit floating-point numbers this is important for two reasons one the smaller the model the better it is for these small devices second many processors have specialized simply instruction sets which process fixed point operands much faster than they process floating-point numbers so a very naive way to do quantization would be to simply shrink the weights and activations after you're done training but that leads to suboptimal accuracies so we have been working on doing quantization a training time and we have recently released the script which does this and what we see is that for architectures like mobile LED and Inception we are able to get accuracies which are fairly similar to their floating-point counterparts while seeing pretty impressive gains in the latencies so I've talked about a bunch of different performance optimizations now let's see what do all of these translate to together in terms of numbers so these are two models that we benchmarked and we run them on the android pixel two phone we were running these with four threads and using all the four large course of the pixel two and what you see is that we are seeing that these quantized more than three times faster on tensorflow light than their floating-point counterparts run on tensorflow so I will move on now and talk about what is supported on tensorflow light so currently it is limited to inference only although we are going to be working on supporting training in the future we support 50 commonly used operations which developers can use in their own models in addition they can use any of these popular source open source models that we support one thing to note here is that we have an extensible design so if a developer is trying to use a model which has an OP which is not currently supported they do have the option to use what we call a custom op and use that and later in this talk we will show you some code snippets on how you can do that yourself so this is all theory about tensorflow light let me show you a quick video of tensorflow light in practice so we took the simple mobile light model and we retrain on some common objects that we could find around our office and this is our demo classification app which is already open sourced as you can see it is able to classify these objects so that was demo now let's talk about production stuff so I'm very excited to say that we have been working with other teams in Google to bring tensorflow Lite to Google Apps so the portrait mode on Android camera hey Google on Google assistant smart reply on where OS these are some examples of apps which are going to be powered by intensive low-light in the coming future and with that I'm gonna hand off to Andrew will talk to you about how you can use tents of low light thanks for the introduction so now that we see what tensorflow is our tensor for light is in particular let's find out how to use it let's jump into the code so the first step of the four step process is to get a model you can download that off the internet or you can train it yourself once you have a model you need to convert it into tenths or full light format using our converter once that's done there might be some that ops that you want to spot optimize using special intrinsics or special hardware that is specific to your application or there might be ops that we don't yet support you need to write custom ops for that once all that's done you can go to your app and you can write it using the client API of your choice so let's look at the conversion process so we support input of saved model or frozen graph def once you have that you can use our Python API or a command line tool here we're showing the Python interface so we give it a directory of a saved model and it gives us a TF Lite flat buffer out once that's done well before that's done there might be some things that you need to use to make this work better so the first thing is you need to use a froze frozen graft F or save model is actually easier because it doesn't need to be frozen second we need to avoid unsupported operators a lot of times a training graph will have a lot of conditional logic or kind of checks that are really not necessary for inference sometimes it's useful to create a special inference script lastly if you need to look at what your model is doing the tensor board visualize useful for the graph tough but we also have a tensor full light visualizer to look at the flat buffers and looking at these compared to each other can help you if you find any issues please file them with github and we will respond to them as we get needs so lastly write custom operators so let's see how to do that to write a custom operator intensive row light is relatively simple and it involves

defining four functions the main one is invoke so here I've defined an operator that returns PI just a one scalar PI this is really useful so once you've done that you need to register these new operations now there's a number of ways that you can register operations if you don't have any custom ops and you don't need to do any overriding you can use the built in op resolver but sometimes you might want to ship a tensorflow light binary that's much smaller so you might want to do selective registration in that case you should ship a needed ops resolver or include your custom ops and that same thing once you have that op set you just plug it into the interpreter okay so now we've talked about custom operations let's see how we actually put this into practice in the Java API so in Java you give it your tensor flow light flat buffer schema to the constructor of interpreter once you've done that you fill in your inputs and outputs and lastly you run run which will populate the outputs with the results of the inference really simple next how do I actually include this do I need to compile a bunch of code we're working really hard to make it so that you can use tensorflow from the pip and do all your training and then you don't need to compile tensorflow so one of the things we provide is an Android Gradle file that you can include and then you don't need to compile your own TF light for an Android app we have a similar thing for cocoa pods to run tensor for light so this will make it a lot easier to do so now that we know how to use tensorflow let's look at our roadmap as we move forward we're gonna add more ops so we can support more and more tensor flow models out of the box second we're gonna add on device training and we're gonna look at so you can do you know hybrid training some of it on the some of it on the server some of it on your device wherever it makes sense that should be an option for you and third we're going to continue to improve tooling to analyze graphs better to do more optimizations we have more than we can talk about that we're working on but we hope that this small peak of what we're doing and what our basis is will make you interested and excited to try it so there's one remaining question left which is should I use tensorflow mobile or tensor for light so tensor for mobile is a stripped down version of tensor flow that includes a subset of the ops moving forward we're going to put all of our effort into improving tensor flow light and improving its ability to map to custom hardware so we recommend that you target tensor flow light as soon as possible if it's possible if there's some functionality you need to still need intensive for mobile let us know and we'll work to improve tensor flight in a commensurate way okay demo time so nothing like a live demo right so let's switch over to the demo feed and we'll talk about it and so we saw some mobile phones mobile phones are really exciting because you know everybody has them but another thing that's happening is these edge computing devices so one of the most other ones has emerged for hobbies' is the Raspberry Pi so what I've done is built some hardware around the Raspberry Pi so if we zoom in we can see that we have the tensorflow sorry we have the Raspberry Pi mobile board and this is just a system-on-chip similar to a cell phone chip and one of the great things about the Raspberry Pi is that they're really cheap another great thing is that they can interface to hardware so here we're interfaced to a microcontroller that allows us to basically control these motors these are servo motors common than RC cars and they allow us to move the camera left and right and up and down essentially a camera gimbal and then we have it connected to a Raspberry Pi compatible camera so what are we gonna do with this well we showed the classification demo before let's look at an SSD example so this is single shot detection so what this can do is it can basically identify bounding boxes in an image so given an image I get multiple bounding boxes so for example we have an apple here and it identifies an apple now the really cool thing we can do with this is now that we have the motor we can tell it to Center the Apple so we turned down the motors now so now the motors are active and as I move it around it's gonna keep the apple as centered as possible if I go up it'll go up if I go down it will go down so this is really fun what could you use this for well if you're a person it can also identify you so currently I have that filtered out so if I stand back it's going to Center on me so I could use this as sort of a virtual videographer imagine a professor wants to tape their lecture but they don't have a camera person so this would be a great way to do that I'm sure that all the hobbyists around can now use tensorflow in a really simple way can come up with many better applications than what I'm showing here but I find it fun and I hope that you do and I'm not an electrical engineer or a mechanical engineer so you can do this too all right so thanks for showing the demo let's go back to the slides please so I had a backup video just in case it didn't work it's always a good good plan but we didn't need it so that's great let's so let's summarize so what what should you do I'm sure you all want to use tensor whole light where do you get it well you can get it on tenth you can get it on github right around the tensorflow repository how you can find out about it is looking at the tensor full light documentation on the tensile flow org website in addition we're creating a mailing list TF light at tensorflow org that you can email about some of your issues and your ideas and anything we're also excited to hear about what you use tents for Oh light for so with that I hope to hear from all of you one at a time please but so with that I want to thank everybody thanks Sarah for her presentation I thank your attendance everybody who's around the world listening to this and in addition this was work that we worked very hard with other members of the Google team lots of different teams so there's a lot of work that went into this so

thanks a lot [Applause] [Music] [Applause] [Music] [Music] you Sy Syms School of Business, Washington Heights, Manhattan.