Page 1

evolution writers mario sticker star walkthrough 2 5 05 05

osha fact sheet reporting credit 05 12

evolution writers chromatography lab spinach answers 05 05

analysis evolution writers 05 03



help with nursing research papers 05 03

how to write an essay in letter format 05 14

is evolutionwriters legit 05 04

write my essay on voting for money 05 14


a nice cup of tea orwell essay 05 09


bol movie review bollywood hungama hot 05 12

alte dissertationen findenzio 05 15

cotton mather bonifacius essays to do good 05 12

how long should my common app essay best 05 10


best writers on evolution writers 05 03

need someone to do my movie review on racism online 05 13


healthy foods that help you lose weight quickly 05 14

argumentative research evolution writers childhood obesity 05 03


letter of resignation no notice period 05 08



example of cover letter explaining employment gap 05 15

greenville nc police accident reports 05 10

writing an analysis evolution writers sample 05 05

essay on if become a finance minister 05 15

national trust victoria annual report 05 15

satisfaction survey cover letter sample 05 15



where to buy paperbark maple tree 05 15

childhood essays 05 10


is my research a case study 05 12

essay definition and types 05 02

afrikaans evolution writers 1 november 2019 05 04

evolution writers mario help guide 05 04

baby ate evolution writers what to do 05 05

essay on ozone day 05 02

get personal statement on reality for me 05 10


are human beings inherently evil essay by lance 05 08


ancient mesopotamian writing system 05 04


School of Professional Studies ​were and I are really apologetic that we couldn't come to Korea Korea to be with you in person and thank you for the 10 year test of time award our paper in 2005 was one-size-fits-all an idea whose time has come and gone the context for that paper was the previous 20 or 30 years our whole community was pretty much about making relational DBMS is the answer to any question make them as broad and scope as possible during the 80s and 90s lots of stuff was added to our dbmss abstract data types referential integrity etc etc the sequel spec grew from a hundred pages to about 1500 pages and all of this was an attempt to make relational database systems the one size that fits all by 2005 we sort of realized that was never going to work and why we came to that conclusion was we were working on Ororo which was a streaming DBMS and it looked nothing like the traditional our DBMS is of the time it just had no relationship to those implementations we had also started working on C store which was a column store although they you know nothing had been published by then but it was already clear to us that a column store looked nothing like a row store and the paper in 2005 also talks a little bit about text and it was evident just based on a sample of three that one size didn't fit all and that was pretty much what our paper said in 2005 the team that worked on sort of in this area on this paper pretty much was war and I but there were a bunch of other actors stanza donek and mr. niak were helping with the streaming stuff Sam Madden helping with c-store nessam a pat temple was also helping with the streaming stuff so I want to acknowledge the rest of the people in the brown MIT universe who were responsible for these ideas 10 years later not only does one size not fit all but in my opinion one size fits none so the traditional row stores that are the you know sold by the legacy vendors systems like db2 Oracle and sequel server in my opinion they are obsolete and are good at nothing so 10 years ago they weren't good for everything 10 years later they're good for nothing so I want to go through a bunch of markets and explain why these systems are you know basically not good for anything so let's start with the warehouse market so in data warehouses all the major vendors have now or soon will have a column store column stores are two orders of magnitude faster than row stores there is a ton of evidence to this effect so the data warehouse market is going to be entirely column stores and it may take a decade for all the roasts or implementations to disappear but there's just compelling evidence that column stores are faster and all the new installations these days are all column stores so let's look at another major market transaction processing the neat thing about transaction processing is the databases aren't all that big a terabyte is a really big OLTP database and you can buy a terabyte of main memory for maybe thirty thousand dollars so if you're interested in going fast buy enough main memory and put all of your data in main memory if you do that you've got a very very different implementation because these traditional transaction processing algorithms that are in the legacy vendors row stores just are way too heavy weight so all the new guys you know and new guys mean Microsoft Hecate on as ap HANA and a bunch of startups they all have very lightweight transaction systems that use totally different techniques than the traditional legacy roasts or vendors so in the data warehouse market you're seeing a different sequel implementation in the OLTP market you're seeing a different sequel implementation in the last ten years I don't have to tell you you can just listen to the news there is a no sequel market there's a hundred or so vendors so there are a couple of dominant vendors and a lot of other folks they have a pot pourri of data models and architectures there's no standards whatsoever there's a bunch of key value stores as a bunch of record stores there's a bunch of BigTable clones there's a bunch of Amazon dynamo clones there's a bunch of JSON stores this things are alive and well we'll see how big this market will actually end up being but in the meantime there's a ton of ideas and none of them end up looking like the traditional roast doors from the legacy vendors let's move on to some other topic so far the analytics market has been what's called business intelligence so business analysts use data warehouses and they run business intelligence products things like Cognos things like Business Objects these are nice graphical front ends to sequel aggregates so you can do count some max-min average group by do standard sequel analytics what's gonna happen off into the future is the data scientists are going to replace business analysts may take a decade because we will have to train a bunch of data scientists but it is going to happen and data scientists don't want to run sequel analytics they want to run regressions they want to find eigenvalues eigenvectors they want to do singular value decomposition data clustering predictive models dot dot all this stuff is defined on a race it's not defined on tables so it'll be really interesting to see how the complex analytics market unfolds as we train some data scientists to be able to do that you can simulate all of this stuff and sequel but it is really really slow you can cast tables to arrays and use some table add-ons that's a little bit inconvenient we'll see how well it performs you can implement all of this stuff

trivially in an array database system something like scipy we'll see how that unfolds you can certainly use what people do now mostly which is run a stat package but the trouble with stat packages is they have no data management in them they're just plain statistics and you need to do both statistics and data management jury's out on who will win but the thing that's interesting is that it isn't all that likely that traditional row stores are going to get this market it's likely to be either column stores array stores or maybe some sort of extent of extension of statistical packages the jury's out stay tuned it'll be an interesting market off into the future maybe I can write another paper in a decade sort of talking about this next slide okay the streaming market that we talked about in 2005 is alive and well and it isn't based on traditional roast doors so stream processing engines have some of the market so storm is a popular one stream basis and is a company we started there's a bunch of stream processing engines and what they do is they're a workflow of record processing with very sequel like characteristics however the thing that I think is kind of interesting is the main memory OLTP engines seem to have a greater market share of streaming real-time stuff than the stream processing engines do and so you can view an all TP engine as a message comes into an application the application runs a stored procedure stored procedure runs against the main memory database system blindingly fast performance essentially the same performance you can get from any stream streaming engine so this seems to be a program or preference issue as to whether you want a messaging architecture or whether you want to function call architecture so those two kinds of engines seem to have are the dominant ones in the streaming market I just want to put in a little pitch for a thing called s store which nessam a tech bull here at MIT is working on so we built an OLTP engine H store in the mid 2000s we also built Aurora which turned into stream base so we built two of these things and if you want the capabilities of both of these systems you can either add streaming to an OLTP engine or you can add persistence to a streaming engine so a lot easier to add streaming to an old TP engine and that's exactly what s store does there's a paper on s store check it out for the details in any case the streaming market is not going to be anything that has anything to do with traditional roast door implementations okay lots of people talk about graph analytics and graph databases so Facebook has a big graph and if you want to find the average distance from me to you that's a graph analytics problem how are you gonna do graph analytics well you can there's been a whole bunch of papers recently on how to do that some of them talk about doing this with a simulation on top of a column store and standard data warehouse style architecture you can certainly do graph analytics in an array engine the edge matrix is simply a matrix the node or node is a onedimensional array you can simulate this in an array engine there are a bunch of special-purpose graph engines things like GraphLab giraffe and so forth the jury is out on exactly who's going to win this market but notice that traditional row stores are not even on this slide it's it's simply not going to be them next slide so the summary is we've gone through most of the major markets there's a huge diversity of engines all of them are oriented toward a specific vertical or a specific application and traditional row stores are good at none of these markets and so as these markets you know expand in size and as people get a chance to change vendors the traditional legacy row stores are going to have no market share in any of these markets ie one size fits none so it's a great time to be a database researcher whenever the traditional vendors you know are are the legacy code and there's a whole bunch of new ideas it's a great opportunity for yet other new ideas and there's a whole bunch of them that I think are going to be really significant off into the future non-volatile Ram is going to get here Intel is predicting it within five years it's going to be very cheap and very persistent and is going to get rid of flash completely and so how is that going to fit into dbmss main memory databases are continuing to get bigger and bigger and bigger there's going to be processor diversity so we don't have to just run on Pentiums GPUs you know Intel Xeon Phi's NVIDIA GPUs Numa architectures dot dot dot dot dot FPGAs how to accelerate DBMS is using the processor diversity that's coming at us right now as near as I can tell in any multi node DBMS implementation networking tends to be the bottleneck which is CPUs and disk bandwidth are not not the scarce commodity it's networking there's a bunch of ideas and higher speed networks we'll see how that all unfolds LLVM is intermediate level language compiling database systems into LLVM and things that look like LLVM more higher level versions of LLVM may well change the way current database stacks end up being implemented and as Monay DB showed us a decade ago vectorization is an awfully good idea when you're reading more than a few records so there's lots of things you can leverage lots of markets and we expect to see a lot of new of new implementations off into the future so how are the elephants going to react to all of this well you know there's there's way faster implementations at what their traditional market is lots of new ideas that could well be very disruptive well there's a great book called the innovators dilemma by Clayton Christensen who's a Harvard Business School professor basically his point of view is that whenever there's a transition from the old stuff to the new stuff the vendors who are selling the old stuff have a very hard time morphing from the old stuff to the new stuff without losing market share so we'll see how the elephant's do I expect that sequel and Hecate on sequel server and Hecate on lead the way sequel server 14 is in fact two engines that you know are two completely separate engines united by a

common parser one of them is traditional roast or sequel server the other is a main memory database system called Hecate on so I expect the legacy vendors will start stuffing additional engines underneath a common parser as they try and sort of morph from old implementations to newer ones you know preserving their customer base so as the elephant's try to adapt without losing market share I expect them to try and maintain a common user interface and simply flip out old engines and replace them by new engines over time we'll see how successful that will be however I just want to draw attention to what I think is going to be the main main tent conflict among the elephants most of you can probably surmise that sa P turns out to be Oracle's largest customer well s ap customers running s ap are Oracle's largest customer so as ap databases are largely staged on top of Oracle storage you can imagine s ap is not very happy with that and so they have a system called Hana I expect off into the future to get an announcement at some point that either s ap doesn't support Oracle anymore or that it runs differential wildly faster on Hanah so I expect sa P to be dramatically in the database business and to be a major thorn in the side of Oracle we'll see how all of this unfolds meanwhile Oracle is hard at work getting you to run their proprietary hardware and get their Oracle suite of business products to you know increase market share so they will be competing head-to-head with sa P it'll be fun to watch from the sidelines how this all unfolds so the ultimate summary in my opinion is for most of the 80s and the 90s our field DBMS research was pretty much dead on its feet and this resulted from our belief that one size fits all we just had to polish up the relational model but I think that that philosophy is completely dead and now we live in interesting times there's all kinds of new ideas all kinds of new implementations it is a great time to be a DBMS researcher and so I would encourage all of you to go out and explore this new space of implementations thank you very much for your time and I hope I've made somebody mad so that you'll ask some questions thank you very much Stern College for Women, Murray Hill, Manhattan.