Page 1

SUNY Canton ​he owes in the space station you're right i do have a very cool nasa button here and i think i've seen a power 13 on to me every time that's come on so that's as close as it gets and I got to say I followed this gentleman last year gives away ipads t-shirts it's a tough thing to pop but i will try pardon this is like a three year ago we can't afford him in budgets are tough anyway thank you for joining us topic is virtualized InfiniBand actually I gave my talk on this topic last year focusing exclusively on some virtualized IV testing network I was doing on a synthetic cluster we've expanded our test her eyes a little bit to include both at work and where we're taking you from the true application point so let me tell you a little bit better about where I come from it is NASA Goddard in Greenbelt Maryland right outside of DC it is called the NASA Goddard Space Flight Center but we actually run what's called the NASA Center for climate simulation anybody see the keynote this morning when the invited talks the gentleman from aim car talked about the IPCC we run weather climate models that contribute to that IPCC report and the global warming studies that are not going we're part of that community climate is different than whether that's different than forecast not we're being picky weather is what it's doing out there right now for Cass's tomorrow next week climate we're talking 20 30 50 year kinds of trend stuff is locally we'll run a variety of simulations down to three and a half kilometers we want to see some results of our simulations go to the NASA food a very cool simulation of the hurricane sandy we'll review some of that we also do reanalysis over 30 year periods and we have databases of data that allow you to track temperature pressure at various levels above sea level and every spot on the earth so if we want to go through what is historical and happy new climate we have those kind of resources as well and as I said we contribute to the IPCC we run a very traditional cluster to do this kind of work it's a 3000 physical note plus cluster dual socket notes some 40 to 50,000 sockets or coors rather DVR QDR FDR Mellanox infiniband management network / 17 petabytes rotating storage for our shared file system and a tape archive of 30 10 petabytes bluff so a pretty traditional architecture for that company no presentation is a complete level depiction of our growth as you can see we growing pretty well over the last seven years and we're over a petabyte petaflop rather in calculated performance granted this is peak not actual eyes but that gives you some benchmark to go back last year we clocked in at number 53 with one of our systems which actually introduced new to our architecture the Xeon Phi everybody familiar with the five sort of Intel's the answer to and videos GPU screen that's a quite heavily in nav technology as an adjunct to our normal see on process right now we're a mix of Sandy Bridge and westmere fours in conjunction with these fives and we're bringing in some Ivy bridge's at the end it here to bring that but really what I want to talk to you about is I'm sure you haven't heard this technology before a cloud is probably pretty new i realized but guess what we're building 12 actually if you look back in time NASA built a cloud about three years ago called nebula and we were one of the original contributors to opens them that cloud didn't work out you'll be basically back to wait for the community but now we're coming out of again and we're starting for more of grassroots perspective to build what we're calling as I said our science cloud on a localized Goddard basis why are we doing it we see it being an adjunct to our discover cluster that cluster I described his cult discover running specialized debug jobs regression testing being a place to burst out too if discover gets over subscribe plus we see it as a way of doing temporal processing campaigns believe it or not we supported an effort called I plus it's Iowa flood study they were looking at the effect of heavy rain storms in Iowa and the flooding aspects of that problem so we support it was like a three or four mission three or four month mission rather that was very nice and confited on the cloud go away mission support for launching a new satellite picture called snap it's a soil moisture sensitivity and that process would be a background process that could potentially come these are made don't not up yet guys I don't want but I want to give you the misimpression that the science club is real well this is our goal to satisfy these kinds of situations but what takes us to virtualize the peanut ban when we do this we even want to lose anything when we virtualize we want our pm's to run here at one hundred percent of fair metal you want no denote activities cluster activities to not use anything either mellanox last year introduced me to virtualize the FINA van and we went about proving that virtual I be allow us to bring hpc into a cloud network this is all about and like I said we don't want to get any loss we moved into that space so this is sort of

a summary of the testing that I did before last year and we repeated it actually flashers testing was done on rail six top three I redid it on six top or in the results got even better so this effort was quite a bit of a conjunction between me working with mellanox plus the folks over Fred have to do the tuning so what I did was I set up an eighth note test environment this phone mr. point I want to make your I apologize when we go to build this cloud I believe bird ideas in the middle but there's really three things are trying to get bring together that node the node thing that was speaking of we need a high-performance file system for a cloud and we need a management structure to build the extension instances and clusters in this car that's what I wanted to do two pieces before the next slide so focusing in on that node to node communication first again there's a recap of what I did last year and read it before red hat summit this year as I stood up an eighth note test and Iran benchmarks against bare metal eight node cluster and then I stood up a PM 8 node cluster in the same infrastructure using virtual items I beat to tie the nose together Fran benchmarks Iran memory test benchmark stream which is at a uva the osu benchmarks for node to node bandwidth and latency iran linpack because you gotta run limp back and then there's some benchmarks actually written by nasa it's called the Nazca parallel benchmarks that iran that actually mimics CFD applications and working with red happening they turned me on huge pages and very at newman techniques that would allow me to newman excuse me minimize the loss going to the vm space and then the virtualized IV was done using sr i OV people familiar with silv okay most virtualization is done with software or look all paravirtualization but sr IOV which actually exists both in the traditional nick space but also here in the IC space through a combination of motherboard bios new firmware on the ID cards you can create what's called virtual functions at the physical level so if you were to bring up your shell do an LS pc and see the mellanox card in some number of mellanox virtual functions then you go over to your vm fire up Burt manager or ever you bring up your vm and you can map these virtual functions up to the vm so in essence what you've done is map the physical hardware to the vm so it gets all the characteristics as if it was running on the bare metal i have data later on that we perhaps to share offline that shows you from then run the osu benchmarks for bandwidth and latency the bandwidth in the latency numbers for bare metal and VMS basically lay on top of each other so the only difference it'll close very small packet sizes there's about a half micro second difference in the late so not much losses that was pretty encouraging so if you look at the results we got from a summary perspective bandwidth vm exceeded bare-metal after some of the two new techniques Red Hat gave me as I just said the BM bandwidth between nodes and the lanes in between those rare metal the vm basically matched and very important to us is scale it scaled out to a denote the variable degradation has went to the eighth notes now we've ran a bunch of tests using a 10 gig e interconnect as you start to scale out with our kinds of calculations start to lose performance we're not seeing that with the eighth notes virtualized I baby high level results now these are when i call this is percent efficiency so this is vs bare metal eighth note vm configuration ran at eighty-eight percent of the bare metal same thing if you looked across these numbers the numbers range in the 90s for the amassed parallel park and has parallel benchmark kernels and basically 92 a little bit less than 90 for the mass parallel benchmark pseudo applications so roughly we're in the ten percent twelve percent boss going to BM space versa funny bare metal so as a new technology and first try that's not bad our goal is zero and we think we can get there working with Mel locks working with so now i'm going to switch gears a little bit talked about that scaling problem that all was done on West beer recipe or chips we have since God in 8c you originals we're doing the testing of candles in here to see how it works one little sidebar I know if you guys are familiar with the architectural difference between sandy bridge and West mirror but the pci bus used to be a bridge chip in the West mirror so each socket basically had an equal access to pio pio pci bus is now sucked up onto the socket so if you have a two node system you want to communicate one node has the i/o card so if you're doing a process over here and this guy's got the i/o you got to go to Cuba guide linked through your i/o so if one of the things we're looking at is what is the effect of that new io structure on these themes in our test so that was put that through the side that's just some interesting negative so good that see any persistent we've run some of those same limb packs over bare metal and BM when we also ran tests on your Amazon fire Amazon's high-performance notes are interconnected with ten geeky more traditional cloud architecture I like to believe is you can look at this the AWS numbers start to drop off even at eight nodes in terms of the scaling the VMS are dropping a little but not at the same level what we're going to be doing in the future is scaling out to see if this pasta will holds out the scaling is critical to making this HPC MacLeod work we remain confident that this virtualized dies e-mail and access our iob is the way to go okay now talked about a cluster file system as being our shared file system of choice and the waiver type II ties into this this is hot off the presses so I can't give you a benchmark numbers but what we've done here is we've taken the same concept recently got in 960 terabytes of storage that's orchestrated is for i 0 servers with a 60 big j-bot each tied it into an efendim and switch three servers we brought up PMS on those servers and we have now brought up virtualized I I B and F overburn I be mounted the cluster pile system so we now have brought up a high-performance

file systems into the VMS over virtualized IP doing RDMA we don't have the bandwidth numbers on yet but our goal is again to characterize this infrastructure improve that we can do the same level of bandwidth to bare metal and get that same to fan with Foreman so testing to follow on this interesting sidebar is most of this technology from virtual eyes divey is only available in the rail six core centos export colonel you actually have a customer that i'ma tell you about a minute his bm's were all built on Debbie so I went through the whole trail of bringing up the verdi be in this debian am and even though I didn't lspci and and the mellanox part showed up in the DM the colonel wouldn't recognize it so I work with a guy much smarter than me the implanted a rail 64 kernel and the Debian p.m. so we were running Franken w on these be a mess with the virtual IV testing here you keep that kind of thing impression so next we're doing management you got the bird ID in both know denote communication in the file system and we brought up most recently dish in the red Red Hat OpenStack which is Alabama release which is a small little test system and the thing we're doing here is now got to give two things we've got to get familiar with this infrastructure but we've got to get the virtualized ID constructs into the OpenStack release so you can then that bring up the m's with his bird ID intervals so we're going to be working with the OpenStack community to do that and the other thing we got to do is have a quick way to instantiate a cluster and OpenStack environment pose cloud market for Ben and clouds you bring up a single instance which is essentially the server we want to bring up an interconnect cluster with common namespace shared home directories etc that's going to require work MIT is doing something called star cluster we're going to leverage that kind of activity bring that into a and then we believe as our seat for our science club bringing those three pieces together now I told you we had a use case the nebular cloud that NASA first built sort of was built on the premise of gosh we build this really quite cool cloudy breeze going to come using go guess what everybody think um we're going to start the other direction and find customers that have a need in the growth so there's this guy that has a Purdue or a conference in San Francisco called heu we had to process a lot of Landsat data and do a study essentially over three decades slicing out three years per decade and basically what he's looking for is change in water area over that period of time excuse me it's called the the Arctic boreal availability experiment or above here he had three decades worth of data three years per decade he had to knock out some 25,000 scenes and he came to us about to each release concept gosh I'm trying to do this on my desktop and so he said could you spin me up at the end so we spun up up with me and said it's been up another so atms later and about five days worth of processing we knocked out his 25,000 seems he now has this material for his presentation in hu in San Francisco so this is the kind of use case for our science class we want to take it to wear boots not spinning up with Vienna he's got an OpenStack dashboard is bringing up his own VMs launching his own instances and taken care of his own scene processing alright so what are we doing testing in progress again our goal is zero virtualization overhead both in the network at the vm performance level all are testing the day in one's vm on a node we want to go to VMS per node we may want to put a vm psychopathic be in / socket but another ID card such we have an IV io card on each socket to avoid that sandy persistence i talked about play with different vm configurations see what we can do like I said we stood up an eighth notes and your bridge repeat the testing that here's what I'm really excited about nothing i'm not excited about the recipes at the end of december we have an 80 node ivy bridge system coming actually it's a warm water cool system we're actually going to pump in 45 degrees centigrade water and cool it as a pua test dude but if I product is I get 80 notes of virtualized in finding the test with to see how we scale out beyond that eight mill original customer so say stay tuned for those results as well as they said we got a lot of cluster work to do to perform perform until the right beats up there and see how pm's compare fair metal there's lots of Bob's in cluster that you can play with different configurations festival and work with those that OpenStack in the redhead guys to grow our test class test cluster and assignees cloud and like I said gift guys using it directly asked pooping in the loop so again three circles virtualized IV mellanox in the middle making the pieces come together thanks to Red Hat obviously melmax eos you guys there in the other corner of this booth here for their MPI we ran their info edge 24 hour MPI testing going there that got tshirts no I fans but with that and we'll open it up to questions I was over so to have questions you Simon Business School.