Radiant Advisors Publication
DISPELLING THE MYTHS BIG DATA HYSTERIA
BI GETS UNINTELLIGENT Q&A WITH BARRY DEVLIN
THE ANALYTIC CONTINUUM CIRCA 2013
TRANSFORMING BI WHY THE FUN IS CONTAGIOUS
10 OCT 2013 ISSUE 10
October 2013 Issue 10
Big Data Hysteria
How it's negatively impacting the business analytics industry. [By Glen Rabie] FEATURES
Business unIntelligence begins with what is wrong in business today --then how to fix it. [By Lindy Ryan]
The Data Silo Headache How data abstraction can help analytics and BI make a bigger business impact. [By Robert Eve]
1 • rediscoveringBI Magazine • #rediscoveringBI
The Analytic Continuum Sophisticated analytical practice aren’t either/or or both/ and: it’s a continuum. [By Stephen Swoyer]
The prescription for analytic success is a synthetic architecture that's adaptable and scalable enough to knit everything together: it's an architecture for analytics.
[P17] BI is Fun Again Three ways Radiant Advisors is shifting the mindset around BI. [By Julie Langenkamp-Muenkel ]
EXCERPT UPTOPIAN VISTAS [P4] [From Business unIntelligence]
EVENTS RADIANT ON THE ROAD [P6]
FROM THE EDITOR Analytics is about solving a business problem first. Fundamentally this hasn’t changed: data just keeps getting bigger and bigger. Today, three trends are driving the future of analytics: the democratization of data, the consumerization of data, and the ability to analyze and interpret data. And, with a surfeit of emerging technologies ready to dive into data, analytics are becoming the quickest way to earn insights that translate into tangible business value.
Radiant Advisors Publication
DISPELLING THE MYTHS BIG DATA HYSTERIA
BI GETS UNINTELLIGENT Q&A WITH BARRY DEVLIN
However, unlocking the value in business analytics -- solving that business problem -- requires knowing the right solution. It requires finding the right key to use.
THE ANALYTIC CONTINUUM CIRCA 2013
TRANSFORMING BI WHY THE FUN IS CONTAGIOUS
Each article in this month’s edition of RediscoveringBI hovers around one central theme: how we will work with data going forward will not be the same as how we’ve worked with it in the past. Glen Rabie discusses how big data hype is hindering businesses’ ability to understand and build strategies to compete on analytics; Stephen Swoyer describes what an analytics continuum, circa 2013, might look like; Barry Devlin and I explore business unintelligence; Julie Langenkamp shows us how BI is (finally) getting fun again; and Cisco’s Robert Eve lists six ways that data abstraction can help analytics and BI make a bigger business impact. In our final regular issue of 2013, we’re also introducing a new feature. Our first editorial report “All About Analytics” offers a prescription for analytic success: a synthetic architecture that's adaptable and scalable enough to knit everything together -- or, an architecture for analytics. Be sure to check out page 21 for a sneak peak at our first Special Edition -- BI Leadership reCharged!
OCT 2013 ISSUE 10
Editor In Chief Lindy Ryan firstname.lastname@example.org
Contributor Glen Rabie email@example.com
Contributor Dr. Barry Devlin firstname.lastname@example.org
Contributor Julie Langenkamp-Muenkel Julie.Langenkamp@sourcemedia.com
Lindy R yan
Stephen Swoyer email@example.com
Art Director Brendan Ferguson firstname.lastname@example.org
Lindy Ryan Editor in Chief
For More Information: email@example.com
rediscoveringBI Magazine • #rediscoveringBI •
DISPELLING THE MYTHS
BIG DATA HYSTERIA
[How it's negatively impacting the business analytics industry.]
HE HYPE SURROUNDING the big data “phenomenon” is inhibiting businesses’ capacity to understand and build effective strategies to compete on analytics. After years of working towards pervasive business-oriented business intelligence (BI), technologists have turned the tables by making analytics a highend technical problem with an associated high-end price tag -- with little to no proven business benefit. Instead of investigating and pinpointing its specific usefulness to industry (or functional specific decisionmaking), much rhetoric has descended into self-perpetuating hype and hyperbole, where big data is hailed as some sort of inexplicable cure-for-all-ailments elixir. 3 • rediscoveringBI Magazine • #rediscoveringBI
This climate of unabashed worship -- untied to precise outcomes -- has created considerable confusion and misunderstanding, which is actually damaging the BI and analytics software industry. Sadly, it’s also creating a really poor customer experience.
Defining Big Data: Dispelling Big Data myths Part of this confusion stems from the loose definition of big data in much marketing and media content. Every analyst, vendor, and commentator has his or her own definition. Further, big data is used in so many contexts
that the term itself is starting to lose all meaning. At the most basic level, big data comprises the three V’s: volume (total amount of data), variety (different data types: structured and unstructured), and velocity (the speed with which you are acquiring, processing, and querying data). However, big data is not all about the 3 V’s – its what you do with it that counts. Essentially, exploiting big data enables organizations to do three things: • Collect more accurate and detailed information • Unlock significant business value by making information transparent and usable at much higher frequency • Understand customers at a more granular level in order to develop and deliver more precise (and ultimately successful) products or services
stack of DVDs reaching to the moon and back. They predict this volume to grow 44 times over by 2020. So, big data is becoming bigger, and more complex. But, it’s definitely not “new” -- the term “information explosion” was first used in 1941. Another major big data falsehood is the idea that it’s revolutionary. Based on the industry scenarios produced thus far, harnessing big data enables organizations to improve existing processes, products, or services. However, there are very few examples of big data enabling truly new use cases for business data. If I look back on my 20 years of experience in the analytics market, it’s safe to say that many organizations have been capturing and using data for advanced analytics for decades. Some examples -- such as detecting fraud in the financial industry, predicting a customer’s propensity
The problem with technologists positioning big data as revolutionary is that it distracts business leaders from the critical choices they need to make." Big data bewilderment is also generated from two prevailing myths propagated by tech firms: big data is new and revolutionary -- it is neither. With regards to its newness, I liken big data to the Canadian tar fields: there’s always been a lot of data out there. What’s different about today is that we’re now able to efficiently and cost effectively capture and mine significantly larger data volumes and varieties. Thanks to Moore’s Law -- and emerging technologies -- the relative costs of data storage, management, and analysis have fallen significantly. This technological development has been met with simultaneous growth in existing data sources (such as web traffic, email, digital images, and online videos), as well as new data sources and types (think social media sites, sensor data, or geospatial data stemming from cell phone GPS signals). To put this growth in context, in 2002 we recorded 23 exabytes of information. We now record and transfer that much information every seven days, according to IBM. Market research firm IDC estimated that the total volume of data created in 2009 was enough to fill a
to buy, or modeling flu epidemics -- are certainly not new. In the past, models were built and tested on small but statistically relevant samples. Now, thanks to big data technologies, you could build that flu model on an entire population. That’s a great outcome, if it leads to a vastly more accurate model. Unfortunately, that is not often the case. Instead, what has changed is the scale and scope of analysis that is now possible. With a higher degree of affordability, many more companies are also starting to ask themselves how they too can benefit from competing with analytics. The problem with technologists positioning big data as revolutionary is that it distracts business leaders from the critical choices they need to make. It leads businesses on dubious, IT lead, expeditions to uncover revolutionary ways in which to apply big data to their strategies and operations. Organizations should, however, simply focus their attention on how big data projects can be applied to their current business challenges. You need to remove the technology from the discussion, focus on what you want to achieve, and then determine the best rediscoveringBI Magazine • #rediscoveringBI •
To be successful with big data analytics -- or any type of analytics -- you’ve got to have a strategy and a business problem to solve."
technology infrastructure to support your data-based goals. This means avoid the “just-for-the-sake-of-it” lure of much publicized technologies (such as Hadoop) and assess your needs against the large range of data and analytics solutions that exist today.
Confusion is Disrupting the BI Market So, how exactly is Big Data hype hurting BI and analytics? Simply put, it inhibits people from thinking rationally about the practical uses for big data. This isn’t just a customer issue -- it’s a vendor problem, too. Many vendors are fueling the confusion by over-marketing the virtues of big data technology. In many instances, the emergence of big data has made the customer conversation needlessly complex: rather than focusing on what businesses are attempting to understand and improve upon by leveraging their data, all we’re doing is selling the virtues of technology. We’re not focusing on solving a problem, and we’re not being very precise about when to use what technology. The vendor community has created a level of complexity that’s creating paralysis among customers. Organizations are being distracted by the hype around the available big data technology. They’re no longer focusing on the business problem that we (the vendor) should be helping them solve. As a result, the customer doesn’t know which technology to use, or when to use it. And, we as an industry aren’t helping: we’re perpetuating the issue and creating an environment of stagnation and inaction. We’re not saying, “this is when you use Hadoop, this is when you use a columnar database, this is when you just hit your relational database.” What we as an industry are saying, is that we want you to spend lots of money with us. We want you, dear customer, to buy all this technology -- whether you need it or not. The BI and analytics software market managed to grow 5 • rediscoveringBI Magazine • #rediscoveringBI
by a reasonable seven percent in 2012. By comparison the market grew 17 percent in 2011 -- a significant slow down. The reason is, as Gartner Principal Analyst Dan Sommer calls it, “Confusion related to emerging technology terms causing a hold on purse strings." To be successful with big data analytics -- or any type of analytics -- you’ve got to have a strategy and a business problem to solve. People shouldn’t buy technology for the sake of it.
Taking Action from Analytics The result of this confusion is a noticeable disparity between the much-hyped promises of big data and the tangible outcomes. To rectify this current state of big data disillusionment, the question needs to be asked: How can organizations derive true value from big data initiatives? The answer is the same for any data-based business initiative. By enabling business users to discover and utilize insights gleaned from organizational data stores -- in order to make better, faster decisions -- via BI and analytics technologies. As Colin White, President and Founder of BI Research, said in a recent interview in relation to big data, “It’s time to move from big data management to big data analytics – it’s what you do with the data that matters.” I’m hopeful that the bulk of industry conversation will soon turn from big data technology glorification to business application. If data is to be the new oil, as suggested by renowned data journalist David McCandles, we need to stop talking about the pumps and refineries. We need to start focusing on how that oil is going to drive our businesses forward in an increasingly competitive landscape. Share your comments >
ROAD RADIANT ON THE
BUSINESS ANALYTICS CONFERENCE
2013 PARTNERS CONFERENCE & EXPO
DATA VIRTUALIZATION DAY 2013
EXECUTIVE LEADERSHIP CONFERENCE
CU GOLD LEADERSHIP CONFERENCE
MDM & DATA GOVERNANCE
BIG DATA SUMMIT
UNIVERSITY OF COLORADO 10.4.2013-10.5.2013 BOULDER, CO
COMPOSITE SOFTWARE/CISCO 10.9.2013 NEW YORK CITY, NY MODERN DATA PLATFORMS 10.15.2013 LOS ANGELES, CA NORTH AMERICAN SERIES 10.20.2013-10.22.2013 NEW YORK, NY
TERADATA 10.20.2013-10.24.2013 DALLAS, TX EEOC 10.29.2013-10.31.2013 CAMBRIDGE, MD UNIVERSITY OF COLORADO 11.9.2013 BOULDER, CO
NORTH AMERICAN CONFERENCE 12.8.2013-12.10.2013 SCOTTSDALE, AZ
For more information on upcoming events, visit http://radiantadvisors.com/events rediscoveringBI Magazine • #rediscoveringBI •
[Business unIntelligence begins with what's wrong in business today - then how to fix it.]
LINDY RYAN WITH DR. BARRY DEVLIN (Read an excerpt from Business unIntelligence on p10) Lindy: Barry, can you begin by explaining what "Business unIntelligence” is? In the preface to your new book, Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data, you reflect on your quest in coming up with this title. Ultimately, you came to the conclusion that the term Business unIntelligence is one of cognitive dissonance -- in your own words, Business unIntelligence must "begin from what is wrong in business today, then on how to fix it." Could you share more on this term’s unique distinction in today's BI-industry vernacular? Barry: The term business intelligence carries with it a weight of baggage, especially the second word. Intelligence is often equated -- in the modern Western world, at least -- with reason, logic, rationality, and so on. Psychological testing of IQ, as commonly understood, focuses on these traits. Military intelligence -- from which the term seems to have emerged in the 1990s -- adds the implication that gathering vast quantities of information is a precursor to intelligence. Management schools have been promoting theories of rational decision making since at least the 1950s. All of this -- together with the technology focus of BI vendors -- has, in my view, led to a deep misunderstanding of what is required to support decision makers in today’s high-speed, deeply integrated, information-rich and technology-dependent business environment that I call the biz-tech ecosystem. So, the term Business unIntelligence is really an attempt to encourage both business and IT to think in a novel way about the problem of decision support and possible solutions. In particular, I believe we need to start with a new understanding of how people come to decisions in a psychological/neurobiological/social sense: how much they are influenced by information 7 • rediscoveringBI Magazine • #rediscoveringBI
they receive, what individual and collaborative processes they use, and so forth. Lindy: Until recently, the relationship between business and IT could have been best described as a "master and servant" relationship, with IT listening to and obeying the commands of the business. This relationship -- like so much else in the BI industry -- has begun to undergo rapid transformation as IT has become increasingly essential to the business. In Business unIntellingence, you charge IT to become the advocate for what technology can do for the business, leveling up the business and IT as equal partners. This "symbiotic relationship" -- or as you brilliantly wordsmith it: "the Beauty and the Beast" -- is the basis of the [your] biz-tech ecosystem, and the catalyst for change in the new BI-environment. What do you see as the most critical drivers of change for this new partnership of business and IT today? Barry: In my opinion, the most critical drivers are 1) the level of uncertainty in business, making old planning approaches obsolete in many areas, 2) the speed and consistency of decision-making demanded by customers today, and 3) the opportunities for business reinvention offered by emerging technologies. Addressing
The term business intelligence carries with it a weight of baggage, especially the second word." these three drivers requires a much deeper level of co-operation and integration between business and IT than we’ve previously seen. It also drives integration across silos within the business and with IT. Driver 1 demands that we start to construct highly adaptive processes in both the traditional operational and informational environments. The concept involved here is “sense and respond” as defined by Stephan Haeckel in the 1990s. This is largely a management methodology, which I extend in Business unIntelligence to an adaptive decision cycle that links individual internal processes to peer evaluation, and then to promotion to production via a managed process. Driver 2 requires a new approach to designing the entire information resource of the enterprise, essentially shifting the model from sequentially populated layers to parallel processed pillars. Yet, in some sense, driver 3 is the most obvious: technology today is far more advanced than that on which our old architectures were based and offers new possibilities that business could not have envisaged previously. Lindy: To bridge the business-IT gap and “offer a common language for progress,” you knit the construct of Business unIntelligence and the new biz-tech ecosystem together in the IDEAL conceptual architecture. Within that architecture, you identify five charac-
teristics: integrated, a unity of thought and purpose; distributed, and beyond central control; emergent, the realm of order from disorder; adaptive, via agility and adjustment; and, latent, hidden desire and possibility. You also mention four remaining characteristics --complete, elegantly simple, enterprise-wide, and opensystem --- that didn't quite make the acronym-cut, but offer additional and unique value in your conceptual architecture. Could you elaborate? Barry: The acronyms of the IDEAL conceptual architecture and its REAL logical counterpart were chosen with care. In the first case, the idea is to create a model that speaks clearly enough to make sense to business -perhaps even in the boardroom -- and conveys enough depth to drive innovative IT thinking. The three layers convey the thought that information -- all information -- is the foundation. Process provides the means and the filters through which information is both created and used; people are its ultimate creators and users. The three dimensions of each layer add substance to the characteristics of each. More simply, read the layers as a sentence: “People process information.” The REAL logical architecture, purely by its name, suggests that this is where implementation can and must begin. The IDEAL paints the vision, but the REAL sculpts the stone. My goal was to craft an enterprise architecture that bridges people to information, intention to action, and business to technology. There is certainly much greater detail both in business and technology that will need to be defined -- but that’s another day’s work. One final point: I want to be clear that the people layer of the IDEAL architecture encompasses the ethics, intention, empathy, and social awareness that are deeply and soulfully human. The tools we have created -- and those we will create -- in Big Data and Analytics must support the common good, lest they destroy us in a frenzy of hyper-competition, unbridled expansion, and the ultimate destruction of the privacy and mutual respect on which true humanity rests. Share your comments > rediscoveringBI Magazine • #rediscoveringBI •
december 8-10, big data summit Westin Kierland Resort & Spa, Scottsdale, AZ THE DEFINITIVE GATHERING FOR ENTERPRISE SENIOR EXECUTIVES
CURRENTLY STRATEGIZING THEIR NEXT BIG DATA PROJECTS.
Leading topics to be discussed include: • The Current State of Implementation Across the Enterprise • The Challenges of Operationalizing Analytics in the Enterprise • The Big Data Startup Kit: How Do Companies Implement Successful Big Data Projects and What Are the Key Obstacles to Navigate That Ensure it Can Deliver on its Promise? • Consumerization of IT and Big Data: What to Expect, Securing & Embracing Movement
DESIGNED AND EXECUTED
IN COLLABORATION WITH THE BIG DATA EXECUTIVE BOARD:
Strategic Partner: For further information please contact: Jason Cenamor Director-Technology Summit, North America + 1 312.374.0831 firstname.lastname@example.org 9 • rediscoveringBI Magazine • #rediscoveringBI
AN EXCERPT FROM BUSINESS UNINTELLIGENCE DR. BARRY DEVLIN HE BIG DATA AFFAIR is coming to an end. The romance is over. Business is looking distraught in its silver Porsche, IT disheveled in the red Ferrari. Of course, it wasn’t just the big data. It started long ago when IT couldn’t deliver the data and business looked elsewhere to PCs and spreadsheets. It’s time for business and IT to renew their vows and start working on renewing their marriage of convenience. When data warehousing was conceived in the 1980s, the goal was simple: understanding business results across multiple application systems. When BI was born in the 1990s, business needs were straightforward: report results speedily and accurately and allow business to explore possible alternatives. IT struggled to adapt. The 2000s brought demands for realtime freedom: the ability to embed BI in operations and vice versa. The current decade has opened the floodgates to other information, shared with partners and sourced on the Web. Divorce seemed imminent, IT outsourced. But, almost invisibly, beyond the walls of this troubled marriage, a new world has emerged. A biztech ecosystem has evolved where business and IT must learn to practice intimate, ongoing symbiosis. Business visions meet technology limitations. IT possibilities clash with business budgets. And still, new opportunities emerge, realized
only when business and IT cooperate in their creation—from conception to maturity. The possibilities seem boundless. But the new limits that do exist are beyond traditional capital and labor. The boundaries are imposed by the realities of life on this small blue planet afloat in an inky vacuum, with its limited and increasingly fragile resources and the tenuous ability of its people to survive and thrive in harmony with nature—within and without. For the corporate world, Business unIntelligence will succeed when it brings insight into business workings, innovation into business advances, and integration into business and IT organizations. But in the broader context, in the real world in which we all must live, our success in the social enterprise that is business can be measured first and foremost in the survival of the cultures and communities of alleged intelligent man, homo sapiens, as well as all the other creatures of this tiny planet, and finally in our willingness to limit our growth and greediness and embrace the good inherent in each of us. It becomes incumbent on each and every one of us to integrate the rational and the intuitive, the individual and the empathic. To take stock of our personal decision making and reimage it in the vision of the world we want to bequeath to our children. Beyond Business unIntelligence. Human unIntelligence.
Business unIntelligence is available from Technics Publications and on the Radiant Advisor's ebookshelf: www.radiantadvisors.com/ebookshelf Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data published by Technics Publications, ISBN-13: 978-1935504566.
rediscoveringBI Magazine • #rediscoveringBI •
[Sophisticated analytical practice aren’t either/or or both/and: it’s a continuum.]
DECADE AGO, the term “analysis” in a business context typically meant one thing: online analytical processing, or OLAP. Properly speaking, OLAP wasn't – and isn't – an “analytic” technology, if by “analytic” is meant a machine-driven, lite-interactive, typically iterative kind of analysis. Traditionally, analytics was the demesne of statistical and predictive analytic tools from vendors such as SAS Institute and the former SPSS, which specialized in that kind of thing. Increasingly, analytics as a category encompasses everything from predictive analytics (also known as machine learning) to an emerging class of analytical technologies and methods that make use of tools other than SAS and SPSS; methods that augment or in some cases supplant those of statistical analysis; and – to some extent – ways of thinking that differ drastically from the age of Scarce Data. (With regard to this last claim, consider that in the age of scarce data, the practice of sampling was a pragmatic necessity; in the context of what we're calling big data, a growing number of experts altogether reject the need for sampling.) It is not the case that these sophis11 • rediscoveringBI Magazine • #rediscoveringBI
ticated analytical practices are going to eliminate the bread-andbutter (OLAP-driven) analysis that has powered BI for the last 15 years. Nor is it the case that advanced analytical practices are going to supplement – e.g., as disciplines somehow adjunct to – traditional BI and analysis. The axis isn't either/or; it isn't necessarily both/ and: it's that of an analytic continuum, wherein the work that's done in predictive or advanced analytics actually trickles down to less sophisticated user constituencies in the form of insights or assets that can be embedded into the applications that support everyday business processes. Let's consider what an analytic continuum, circa2013, might look like.
Bread and Butter OLAP Even if you don't hear much about OLAP these days, it nonetheless
remains the bedrock on which business intelligence (BI) and decision support are based. This is because OLAP has to do with analysis across multiple dimensions – and many (if not most) business questions cut across multiple dimensions. For example, a sales analyst might wish to know about (1) sales (2) of a specific product (3) in a specific region (4) over a specific period. This is an explicitly analytical query; it's a question that involves no less than four dimensions. This is the kind of analytical query for which OLAP was conceived. The great bulk of BI reporting derives from OLAP; so, too, do more recent innovations, including (believe it or not) BI discovery. For this reason, OLAP will continue to be a basic analytic building block well into the future. For one thing, it's commoditized: RDBMS platforms from Microsoft (SQL Server) and Oracle (Oracle 12c Enterprise
Edition) ship with integrated OLAP facilities: SQL Server Analysis Services (SSAS) and the Oracle OLAP Option, respectively. In addition, IBM and Oracle also market best-of-breed OLAP engines (namely, IBM Cognos TM1 and Oracle Essbase) of their own. What has changed – and changed radically – is the OLAP usage model. A decade ago, the de facto OLAP front-end tool was a spreadsheet, such as Microsoft Excel: the former Applix, for example, prescribed the use of Excel with its OLAP engine. In other words, the classic OLAP user experience was understood primarily in relation to a backend tool – viz., a server. Nowadays, the OLAP engine is effectively abstracted behind a dedicated front-end tool and (increasingly) layers of self-service amenities. This is what people mean when they talk about the “death” of OLAP: i.e., the abstraction – or disappearance – of the OLAP engine itself. As an enabling technology for user-driven analysis, OLAP isn't going anywhere. This abstraction is a function of a user-centric shift
that's helped to remake BI – for the better. At a fundamental level, even advanced BI offerings – QlikView and Tableau, among others – are powered by breadand-butter OLAP functionality. These and other BI tools comprise a marriage of OLAP with embedded analytic functions, selfservice capabilities, and vastly improved user interfaces; in the case of tools such as Tableau and Tibco Spotfire they actually marry a best-of-breed data visualization frontend with an in-memory OLAP engine in the backend. Even in the Hadoop world, the OLAP model looms large: when Platfora Inc. last year unveiled its “BI for Hadoop” offering, research analyst Mark Madsen, a principal with IT strategy consultancy Third Nature Inc., famously dubbed it “OLAP-by-another-name,” or – more succinctly – “PlatfOrLAP.” To the extent that SQL remains the lingua franca of BI and decision support – and this will be the case for some time to come – OLAP will remain a core analytic technology.
analytics as a category encompasses everything from predictive analytics to an emerging class of analytical technologies and methods"
rediscoveringBI Magazine • #rediscoveringBI •
SQL – Taken to the Extreme SQL isn't going to go away anytime soon, but it arguably has been pushed to its limit, thanks to what's called “Extreme SQL.” This is a category of complex SQL query that's intended to be processed on dedicated analytic platforms. In most cases, Extreme SQL must be offloaded from a conventional RDBMS to a dedicated analytic DBMS, usually one based on a massively parallel processing (MPP) engine. There's a reason for this: Extreme SQL entails a non-SQL payload – in the form of embedded user-defined functions (UDF), calls to invoke database-specific functions or algorithms, snippets of procedural code, and other esoteric functions or operations – that isn't supported by and which simply could not execute on a conventional RDBMS platform. In other words, an Extreme SQL workload can bring
physical memory. This is the reason vendors such as Kognitio and Teradata emphasize a memory-optimized take on in-memory: their platforms are architected to exploit system memory, processor cache (e.g., L1, L2, L3), and even flash storage cache. They claim that this gives them most of the benefits of a completely inmemory design – without the brick wall of a physical memory limitation. Extreme SQL analytics almost always involves structured data – i.e., data in tabular format. This data doesn't have to start out as structured, however: it can originate in a NoSQL platform, in which case it would be sourced from a much larger data set; reduced – for example, by means of a series of MapReduce jobs – to a working data set; transformed (e.g., by MapReduce ETL) into tabular format; and only then offloaded into the Extreme SQL DBMS.
This abstraction is a function of a user-centric shift that's helped to remake BI – for the better." a conventional RDBMS – such as DB2, Oracle or SQL Server – to its knees. Extreme SQL databases spin out workloads and execute them in parallel – i.e., across other database nodes, which (in most implementations) are clustered in an MPP configuration. More often than not, Extreme SQL DBMS platforms use an optimized database storage architecture, an optimized memory architecture, or both. An example of the former would be a “columnar” database architecture, which is able to efficiently compress and store the kinds of data items typically used in DW queries. An example of the latter would be SAP HANA, which implements an in-memory columnar database engine. HANA uses a kind of memory-scaling technology (based on NUMA, or non-uniform memory architecture) to run in single-instance configurations of 1 TB or more of physical memory. Thanks to columnar compression, HANA can actually store 10x (or more) that amount of compressed data in memory. HANA's isn't a completely in-memory database design, however; if it runs out of temp space – e.g., if it is tasked with handling too many large queries at the same time – HANA will spill to disk. The alternative in this scenario is to kill one or more queries. This illustrates the Achilles Heel of in-memory database designs: they're constrained by the amount of available
13 • rediscoveringBI Magazine • #rediscoveringBI
Advanced Analytics The foundation of analysis in traditional decision support is OLAP, which is predicated on a user-driven interaction paradigm and (almost) always consumes structured data. The “grist,” so to speak, for OLAP analysis is transactional data, sourced from OLTP systems. The “grist” for advanced analytics is...transactions and everything else: events, messages, artifacts, and entities. Advanced analytics is analysis in context; it isn't so much interested in concrete answers to specific questions as in events (or sequences of events) that can be probabilistically correlated with causes or used to predict outcomes. Its demesne includes traditional transactional data, like the INSERT or UPDATE transactions recorded in OLTP systems; data from human-readable sources, such as JSON, XML, and log files; or information gleaned from multi-structured content, such as text documents, images, video files, and the like. The advanced analytic usage paradigm differs drastically from that of OLAP, in which a human being – viz., the business analyst – functions as the effective engine of analysis. In place of OLAP's prescriptive interactivity, advanced analytics emphasizes automation and repeatability. It doesn't propose to take human beings out of the loop – e.g., the human analyst plays a vital role in conceiving, bootstrapping, managing, and interpreting
an advanced analytic analysis – but, unlike OLAP, it doesn't make the human analyst the interactive focal point of analysis. Users and vendors are still trying to figure out what to do with – and, more important, how to productize – advanced analytics. Because of its reliance on statistical analysis and its use of mathematical methods (such
Users and vendors are still trying to figure out what to do with – and, more important, how to productize – advanced analytics." as numerical linear algebra), to say nothing of a toolset that – in most cases – centers on big data platforms and the MapReduce compute engine, it requires a significant degree of technical, statistical, mathematical, and (of course) business expertise. It requires, in other words, that rarest of commodities: i.e., data scientific expertise. This is one of the reasons industry expert Jill Dyche, vice president of best practices with SAS, says she finds data scientists at once sexy and infuriating: they're so in demand, the spectrum of skills they possess so hyper-specialized, that they're vanishingly rare. They're unicorns. On paper, advanced analytics is sometimes touted as a proving ground, research lab, or petri dish for the production of analytic insights. The idea is that analytic insights – once discovered – will propagate (or, to switch metaphors, circulate) across the continuum. For example, a data scientist might focus on perfecting a model to reliably predict customer churn. Her analysis might focus on any of several indicators or anomalies that she believes to be consistent with churn. (These indicators or anomalies might themselves have been discovered by a business analyst. This is one of the use cases for which vendors such as Teradata, with its Aster Discovery platform, and Actian, with its VectorWise and ParAccel platforms, tout their extreme SQL systems – i.e., as analytic discovery platforms.) The ultimate goal
is to embed analytics in an operational context – i.e., in the business process itself. In this model, the data warehouse or the BI platform could become a repository for persisting analytic insights, there to be served up (at or close to real-time) to the operational applications that support business processes. A proposed platform for analytic discovery is the Extreme SQL appliance – but a platform such as Hadoop has a crucial role to play, too, chiefly as a platform in which to stage and prepare information for advanced analysis. This is because it's neither efficient nor cost-effective to store multi-structured content in a conventional DBMS platform. While semi-structured content such as logs, XML files, or even sensor events can ultimately be reduced to tabular format, it makes more sense to collect and persist it in a file-oriented repository, such as Hadoop – then to prepare it (e.g., by means of Hadoop MapReduce ETL) for offloading to an Extreme SQL platform. In the case of multi-structured sources like video or image files, “analysis” must effectively begin in Hadoop: it's a function of profiling, analyzing, and collecting statistics about data, applying advanced algorithms and mathematical methods, and generating structured information that can be prepared (again, using an ETL tool) for more conventional analysis.
The NoSQL Shift The category of NoSQL platforms includes the Apache Cassandra distributed database, MongoDB, CouchDB, and – of course – the Hadoop stack. In practice, however, Hadoop tends to get conflated with what's meant by both NoSQL and “big data analytics.” This last term has become a catch-all for a class of analytic workloads that use statistical functions, incorporate new practices and algorithms – such as the use of numerical methods – and consist of a mix of relational and non-relational data-types. In the popular imagination, big data analytics is said to begin where Extreme SQL analytics leaves off – or runs out of gas. This is the space in which highly-structured (relational data) or semi-structured (data from logs, sensors, or other quasi-tabular sources) content meets the multistructured world of text, graphs, GIS overlays, documents, files, and other types of information. Conveniently enough, this is also the world of NoSQL. Even though it is negligibly cheap to acquire, provision, and deploy the Hadoop stack, it's considerably more expensive to start using it. Or it can be, at any rate. (Continued on p21)
rediscoveringBI Magazine • #rediscoveringBI •
THE DATA SILO HEADACHE [Is data abstraction the aspirin?]
CCORDING TO Professors Andrew McAfee and Erik Brynjolfsson of MIT, “Companies that inject big data and analytics into their operations show productivity rates and profitability that are 5% to 6% higher than those of their peers.” CIO’s understand this opportunity and, according to a 2012 survey of 2300 CIOs by Gartner, analytics and business intelligence (BI) were their number one technology priority. However, you cannot do analytics and BI without data. So, data is a critical success factor. The more data the better, and this includes data sourced from the cloud, big data, and existing enterprise data warehouses.
Data Silos are a Fact of Life Enterprise resource planning, or ERP, was supposed to be the answer to the data silo problem, but application integration continues to be a huge challenge -- even within SAP’s largest users, who typically use numerous specialized applications beyond the SAP suite. Cloudbased applications --such as Workday, NetSuite, and Salesforce.com -- have accelerated the proliferation of transactional silos. With its myriad supporting data staging areas, data warehouses, data marts, and operational data stores, BI 15 • rediscoveringBI Magazine • #rediscoveringBI
has generated additional data silos on top of the transactional systems. And recently, big data, with its new storage techniques, processing capabilities, and analytic opportunities has caused an acceleration of analytic and BI data silos.
More Pain than Relief Providing analytics and BI solutions with the data they require has always been difficult, with data integration (DI) long considered the biggest bottleneck in any analytics or BI project. Most organizations struggle with the variety of enterprise, cloud, and big data sources, along with all their associated access mechanisms, syntax, security, etc. Most data sources are structured for transactional efficiency and control, rather than query. Data is often incomplete and typically duplicated multiple times. For the past two decades, BI’s solution to the data silo headache has been to consolidate the data into a data warehouse, and then provide users with tools to analyze and report on this consolidated data. The good news about this approach is that it resulted in a common business understanding of the data, delivered via the data warehouse schema. This schema was consistent and easy to query.
However, based on these traditional consolidation approaches data integration has numerous moving parts that must be synchronized -- including the schema, the mappings, the ETL scripts, and more. Aligning these properly slows solution delivery. In fact, TDWI confirms this lack of agility: their recent study stated the average time needed to add a new data source to an existing BI application was 8.4 weeks in 2009, 7.4 weeks in 2010, and 7.8 weeks in 2011. And, 33% of the organizations needed more than 3 months to add a new data source.
ment data abstraction. From an enterprise architecture point of view, data virtualization provides a semantic abstraction (or data services) layer supporting multiple consuming applications. This middle layer of reusable views (or data services) decouples the underlying source data and consuming solutions, providing the flexibility required to deal with each source and consumer in the most effective manner, as well as the agility to work quickly across sources and consumers as applications, schemas, or underlying data sources change.
Data Abstraction Addresses Data Diversity
Pain Relief at Last!
The schema is the key: this is where the various data silos are rationalized from the various source system taxonomies into common business ontologies. A common term for this activity is data abstraction -the process of transforming data from its native structure and syntax into reusable objects that business applications and consumers can understand. Some data abstraction approaches used today work better than others. For example, some organizations build data abstraction by hand in Java or use business process management (BPM) tools. Unfortunately, these are often constrained by brittleness and inefficiencies. Further, these tools are not effective for large data sets since they lack the robust federation and query optimization functions required to meet data consumers’ rigorous performance demands.
Analytics and BI can make bigger business impact when they can access more data. Data warehouse schemas can also provide data abstraction. Data modeling strategies for dimensions, hierarchies, facts, and more are well documented. Also well understood is the high cost and lack of agility in the data warehousing approach. However, data warehouse based schemas often don’t include the many new classes of data (big data, cloud data, external data services, and more) that reside outside the data warehouse. Data virtualization (DV) is a third way to imple-
Analytics and BI can make bigger business impact when they can access more data. With additional data now required from cloud and big data silos, accessing and integrating these new sources can be a challenge for enterprises accustomed to a traditional enterprise data warehouse centric data integration approach. Data abstraction, when implemented using data virtualization, simplifies and accelerates integration of cloud, big data, and enterprise sources by bridging the gap between diverse business needs and ontologies and even more diverse data sources and taxonomies. Done right, the benefits can be significant, including: • Simplified information access – Bridge business and IT terminology and technology so both can succeed; • Common business view of the data – Gain agility, efficiency, and reuse across applications via an enterprise information model or “Canonical” model; • More accurate data – Consistently apply data quality and validation rules across all data sources; • More secure data – Consistently apply data security rules across all data sources and consumers via a unified security framework; • End-to-end control – Use a data virtualization platform to consistently manage data access and delivery across multiple sources and consumers; and • Business and IT change insulation – Insulate consuming applications and users from changes in the source and vice versa. Business users and applications developers work with a more stable view of the data, and IT can make ongoing changes and relocation of physical data sources without impacting information users. For more information on Composite Software’s Data Abstraction Reference Architecture, please read the Data Abstraction Best Practices White Paper >. rediscoveringBI Magazine • #rediscoveringBI •
BI IS FUN AGAIN
[Three ways Radiant Advisors is shifting the mindset around BI.]
USINESS INTELLIGENCE (BI) has gotten a bad rap over the years, but it doesn’t have to be that way. The common perception that BI is simply the delivery of static reports or dashboards for monitoring performance is, frankly, outdated. As the advances in technology, analytics, big data, and database technologies bring down the barriers to working with data, the result can be true democratization of data -- and that’s fun. But, in order to do that, BI needs to be rediscovered and traditional notions need to be challenged. That’s the premise on which John O’Brien launched Radiant Advisors. O’Brien’s bold mission is to help people: Rediscover what BI is all about for the business. Reimagine how BI can transform analytics. Rethink what data and BI architecture should be. The company approaches this transformational work 17 • rediscoveringBI Magazine • #rediscoveringBI
through three main components: research, advisory services, and developing and mentoring people through editorial, events, and e-learning platforms. The RediscoveringBI publication and Spark! event series were launched with this goal. At the core of all three areas, O’Brien says the foundational approach is to “forget about architectures, forget about everything we’ve been doing for 20 years, get back to the basic principles of things, and really think about why we do something.” It’s a lofty goal, requiring a shift in the pervasive mindset established around BI.
Modern Data Platform for BI The good news is the BI landscape is reinventing itself. Demands for new architecture stem from business expectations, the need for a unifying architectural design, availability of more cost-effective price/perfor-
mance computing resources, and introduction of new data technology. But, how do you recognize emerging technology? How do you integrate it with the data warehouse environment you’ve been building for the last decade? Speaking from more than 25 years of industry experience, O’Brien says it’s not necessary to start from scratch, but it is imperative to embrace new paradigms. “Your data warehouse is a reflection of the analytic culture of your company,” and the key to advancing that maturity is to optimize the architecture for analytics and BI. This is the intent of the Modern Data Platform for BI. “While it can start out very introductory – it doesn’t matter if somebody is new to the business or new to big data or anything like that – it very quickly moves them behind the scenes in understanding holistically how it
all ties together. Holistically, understanding the role of big data or the role of the data warehouse or analytics is the key, because in any one of those areas you can drill into the details.” The Spark! event series is designed to get people plugged into conversations and sharing their challenges and experiences for the benefit of the community. The goal is to facilitate small-group interaction to drill into the pressing issues facing organizations today. O’Brien challenges individuals to really think, rather than simply listen. In fact, he as he introduces the Modern Data Platform, he says the job of Spark! attendees is to try and break it. “When they get that level of engagement, they take ownership and they start to own that methodology – they take it back and they share it with companies and their colleagues, which is what we also want.” rediscoveringBI Magazine • #rediscoveringBI •
Confident in the methodology behind the Modern Data Platform, O’Brien enjoys the opportunity for attendees to learn from each other. He sees his position as a facilitator and hopes attendees will come away with an understanding to help them cut through the noise of the industry and have an organizational strategy for where the technologies fit together in the framework. “The Modern Data Platform is all about data architecture and technology, but whenever they hear a new product or talk to a new vendor, they will look at it differently to say ‘I understand its role inside of a platform now.’” This kind of understanding can lead to more consistency for IT. However, the focus needs to shift from being a technologybased decision to a capabilities-based decision. The emphasis must be on what the business is trying to do. “It changes the conversation between the technologists, the data folks, and the business. So much of what we do is technology for technology’s sake or architecture led, solving problems that don’t’ exist. And I think it helps them shift back into a business-led kind of conversation.” In that context, O’Brien discerningly recognizes that the business doesn’t distinguish between types of BI or data analysis capabilities; they just do their jobs. For this reason, he considers IT to be consultants and brokers of technology on behalf of the business. “I see what we do with in the back end as enablers, but being enablers means bringing in the right technology tools but also helping the business, like a consulting model, to help them achieve things through analytics.” With the recognition of analytics as a competitive differentiator, it is important to be extremely thoughtful about the approach, available skillsets, and data accessible to organizations, rather than rushing forward with analytics, potentially causing great risk. O’Brien warns that analytics is increasingly more complex than previous generations of BI. “We’re in such a rush to deliver an analytic model that a lot of gaps or risks are going to be introduced. And this is compounded by [analyt-
ics] being probably one of the most complex forms of BI we’ve ever delivered. So, I’m worried about people’s rush to deliver value. If they’re not equipped to do it correctly and businesses are going to make decisions based on poor models, we’ll have a backlash.” Focusing on the business problem to be solved is a starting point for business analytics models that is a completely different paradigm than BI. “The data scientists don’t worry about data warehouses; they’ve never even heard of one most of the time. They just think about the [analytic] model. And those of us coming from the BI world trying to build these models think the old way.” “It gets back to the real promise of business analytics, which starts with the business question,” says O’Brien. “Facebook started with ‘I want to understand something about people.’ They didn’t start with all the data and technology and say ‘What can we do with this? Think of a good question now that we have a columnar database.’” The goal is to identify business questions, collect the data, and find the answers. As the world is changing because of analytics and data technologies, O’Brien believes we’re entering a new Information Age, and his passion about the ways technologies can be used to bring business value is contagious. Big data, NoSQL variations, in-memory, and cloud are just a few of the exciting advances that O’Brien states are “enabling anybody to do cool things.” And encouraging people to think about what they could do with all types of data – structured and unstructured - is a new way of thinking. “That whole shift in the mindset has people looking at data differently for the first time, and that’s really, really nice to see. Anything’s possible.” “That’s how the whole world is shifting, which makes it completely exciting again. It’s a complete level playing field, which is the first we’ve seen in a very long time in the BI and data space,” he says. And that’s fun. Share your comments >
It’s a lofty goal, requiring a shift in the pervasive mindset established around BI."
19 • rediscoveringBI Magazine • #rediscoveringBI
SPARK! Los Angeles, CA Modern Data Platforms BI and Analytics Symposium
October 15-16, 2013
At SPARK!, you can participate in discussions and learn how leading companies are implementing big data environments and integrating them with their existing data warehouses, analytic databases, and data virtualization to evolve into a Modern Data Platform for BI that enables self-service capabilities and an Enterprise Analytics Program within their organizations. SPARK! Los Angeles, the final stop on the 2013 SPARK! tour, will feature presentations by industry experts Dr. Barry Devlin and John O'Brien, as well as eight sessions that introduce the framework and methodologies for modernizing data warehouses into data platforms that combine the latest data integration and analytic databases. These sessions focus on learning why and how architectures are driven in order to confidently transform your own. Register now at: http://radiantadvisors.com
Featured Keynotes By:
Dr. Barry Devlin
Founder and Principal
Founder and Principal
(Continued from p14) Thanks in no small part to the rise of Hadoop, expertise in Java, Perl, Python, and Pig Latin – this last is the language used to code for “Pig,” a popular MapReduce programming tool – is priced at a premium. Pig and other tools are getting better – if by “better” one means they're easier for which to program. There's also the Apache YARN project, which will in effect decouple Hadoop from its dependence on MapReduce as a parallel processing compute engine. At this point, programming for MapReduce is a requirement for performing most types of data manipulation or data transformation in Hadoop. (There are exceptions, including the use of proprietary libraries. But using proprietary libraries inevitably means paying for proprietary software licenses.) It is MapReduce, more than anything
else, which accounts for Hadoop's rigidly synchronous, batch-oriented processing model: MapReduce must perform operations sequentially (i.e., synchronously); it does not permit operations to be pipelined (i.e., to be performed asynchronously), which is a requirement for BI, decision support, analytic, and (for that matter) most other kinds of interactive workloads. To the extent that YARN makes it easier to run other The number one NoSQL use case isn't an analytic one, however; it's a pragmatic one. Storing non-relational data in a SQL database has always been kludgey, to say nothing of expensive. NoSQL repositories, on the other hand, can be used efficiently to store information of all kinds. Share your comments >
rediscoveringBI Special Edition
21 • rediscoveringBI Magazine • #rediscoveringBI
ANALYTICS E D I T O R I A L
R E P O R T
THESE DAYS, FEW TERMS SEEM MORE MEANINGLESS THAN “ANALYTICS.” As a predicate, “analytics” gets applied to a confusing diversity of assets or resources – from banal operational reports to a machine analysis involving terabytes of information and thousands of predictive
existing data warehouse environment to support or enable more sophisticated analytic practices? What can – and probably should – stay the same?
last describes the methodological application of machine learning (also known as predictive analytics) at massive scale – involve different tools, different methods, and (to some extent, anyway) very different kinds of thinking. This begs a question: how do you meaningfully distinguish between analytic categories and technologies? How do you grow – or establish – a richly varied analytic practice? What must you change in your
BRINGING ARCHITECTURE BACK: A PRESCRIPTION FOR ANALYTIC SUCCESS Lately, we've spent a lot of time talking about the data warehouse. Its uses, its problems, its limits. Not its limitations: its limits. Today’s business intelligence (BI) and data warehouse (DW) platforms are highly adapted to their environments. They make sense — and they deliver significant value — in the context of these environments. But they cannot be all things to all
The confusion is regrettable, but understandable: the truth is that there's simply a surfeit of analytic technologies, starting with breadand-butter multidimensional assets – i.e., reports, dashboards, scorecards, and the like. Even in an era of so-called “big data analytics,” these assets aren't going anywhere. Increasingly, they're being buttressed by analytic insights from a host of other sources. Advanced practices such as analytic discovery and “investigative computing” – this
rediscoveringBI Magazine • #rediscoveringBI •
information consumers. From the success of BI discovery, which developed and evolved in response to the rigidity of the DW-driven BI model, to the acceptance of NoSQL as a legitimate alternative to SQL (to say nothing of the prominence of Hadoop, both as a NoSQL data store and as a putative platform for big data analytics), to the emergence of an “investigative” analytics, which mixes relational and non-relational data-types, advanced analytic functions, and statistical ensemble models: all of these trends militate against (and ultimately function to explode) the primacy and centrality of the data warehouse. The upshot is that the usefulness of the DW – i.e., the habitat in which it is an apex predator – has been strictly delimited. The data warehouse is a query platform par excellence; it excels at aggregating business “facts” out of queries. That's its domain. Beyond this domain, it's less functional: less
heterogeneous collection of platforms and stacks. It is, in other words, a synthetic architecture that's adaptable and scalable enough to knit everything together: it's an architecture for analytics. Architecture can be a dirty word. But the way to build, grow, and vary an analytic practice is by thinking architecturally. This was as true 25 years ago, when Dr. Barry Devlin and Paul Murphy published their landmark paper, “An architecture for a business information system,” as it is today. (The “business information system” outlined by Devlin and Murphy was, of course, the data warehouse itself.) Thinking architecturally means accommodating existing practices or resources when and where it makes sense to do so, along with moving away from (or de-emphasizing) practices that aren't tenable or sustainable, such as spreadsheet-driven analysis. (This
Architecture can be a dirty word. But the way to build, grow, and vary an analytic practice is by thinking architecturally." adaptable, less usable, less valuable. For two decades, the data warehouse and its enabling ecosystem of BI tools functioned as the organizing focus of information management in the enterprise. No longer. The DW isn't going to go away, but it's no longer at the center. The new model is widely distributed. It consists of systems or pockets of information resources strewn across the enterprise – and beyond. The data warehouse and its orbiting constellation of BI tools comprise one such system. A NoSQL platform such as Hadoop or Cassandra – functioning variously as a platform for NoSQL analytics; as an ingestion point or staging area for information of all kinds; as a platform for BI-like reporting and analytics – could comprise another. Analytic discovery and investigative computing practices might be seen as separate systems – or, alternately, as discrete pockets, fed by, feeding back into, DW and NoSQL platforms. The new model isn't a single-stack; it isn't monolithic; it's a
23 • rediscoveringBI Magazine • #rediscoveringBI
isn't to reject spreadsheets entirely: it's to recognize that the standalone spreadsheet model isn't scalable or manageable.) It means emphasizing self-service, collaboration, and agility by embracing methodologies and technologies that promote or are consonant with agility and by developing practices that reduce latency and promote user engagement. (Engaged users are happier and more productive. If nothing else, they're more inclined to actually use the tools that are put before them – instead of finding out-of-band workarounds.) It means critically interrogating long-cherished assumptions about connectivity: i.e., about how data sources – particularly in the non-relational world of REST – can or should be accommodated. The old way of thinking about connectivity was to hard-code mappings between physical sources and a single, central target – the data warehouse. Connectivity in the context of an architecture for analytics is by definition widely-distributed; it emphasizes abstraction, chiefly as a means to promote agility.
RUBBER, MEET ROAD: AN ANALYTIC ARCHITECTURE IN PRACTICE So how does this work in practice? Let's consider the problem through the lens of Dell's information management portfolio, which is one of a handful of visions that can credibly claim to address the requirements of a synthetic architecture for analytics. Other vendors (chiefly, HP, IBM, Oracle, SAP, and Teradata) have articulated conceptually similar visions, but these tend to be tightly tied to their respective product portfolios: they all prescribe heavy doses of their own middleware, database, hardware, and services offerings. Talking about an analytic architecture from the perspective of an HP, IBM, Oracle, SAP, or Teradata means talking about a largely homogeneous architecture – i.e., about a single-stack solution: lock, stock, and smoking analytic DBMS. (Bear in mind that HP, IBM, and Oracle also market their own proprietary operating systems – along with proprietary microprocessor architectures, too.) Talking about an analytic architecture from Dell's perspective lets us address platform heterogeneity – in the database and middleware tiers – and likewise permits us to consider the benefits (if any) of a reasonably cohesive product stack. It's helpful, too, because Dell purports to be a vendor-neutral reseller of both traditional RDBMS data warehouse systems and analytic DBMSes. Dell markets off-the-shelf data warehouse configurations based on RDBMS platforms from Microsoft and Oracle, as well as a “QuickStart” data warehouse appliance based on SQL Server 2012 Data Warehouse Appliance Edition; it likewise markets analytic database configurations based on Microsoft's SQL Server PDW and SAP's HANA. Finally, Dell has staked out a vendor-neutral position with respect to Hadoop and NoSQL: unlike competitors such as EMC, IBM, and Oracle, it doesn't develop and maintain its own Hadoop distribution. Dell's vision augments the traditional, DW-driven BI practice with a self-service BI discovery platform – in this case, its Toad BI Suite. 2.0. Like other BI discovery platforms, Toad BI Suite focuses on delivering a collaborative, user-self-serviceable visual discovery experience. This isn't unusual: today, many BI tools claim to address the self-service and visual discovery use cases. (Fewer address collaboration, however.) Toad BI Suite
is most compelling from an architectural perspective, however. Toad Intelligence Central, a component of the Toad BI Suite, implements a virtual data integration and collaboration layer, which functions to abstract data from physical systems. In effect, Toad Intelligence Central exposes a “data view” – for example, the query to access one or more multiple data sources and tables. In an analytic architecture, Toad Intelligence Central can function as an data integration layer that unifies the far-flung information resources – distributed across DW systems, Hadoop and other NoSQL platforms, or BI discovery and investigative computing pockets; sitting in operational systems, accessible in files – strewn across the enterprise. In this way, data remains on the data sources and it is not moved. It also makes it easier to quickly provision access to data sources and supports some degree of self-service. For example, Toad Data Point, another component of the Toad BI Suite 2.0, permits business analysts to mash-up existing data sources and to provision them to analytic consumers. It's possible to combine views together to create previews or mash-ups – in effect, analytic applications unto themselves – and to expose them to consumers. An abstraction layer on its own is a pragmatic part of an analytic architecture; a direct data access and virtual data integration layer made self-serviceable is a critical – and potentially transformative – technology: for example, rather than waiting for IT to ticket and deliver a data source or a business view, a business analyst can use the self-service capabilities of Toad Intelligence Central and shift for herself – while at the same time servicing other, less sophisticated users. Toad Data Point also integrates a library of more than 100 statistical and mathematical functions for advanced analytics for structured data such as clustering analysis, time series and others that enable analysts to extract information from the data. Conventional DI isn't going away, of course. ETL tools, for example, will still be used to populate – as well as to get information out of – relational platforms (the data warehouse, operational data stores, analytic DBMSes) and non-relational platforms (information that's been conformed or manipulated – i.e., given tabular structure – in Hadoop, Cassandra, or other NoSQL platforms). In this regard, Dell's approach is likewise illustrative. Instead of defaulting to the use
rediscoveringBI Magazine • #rediscoveringBI •
of a full-fledged ETL tool, with a tier or staging area unto itself, it prescribes the use of its Boomi ETL technology, which supports both application integration (typically a function of message/event passing/routing) and data source integration. Boomi, too, is notable from an architectural perspective: unlike the full-fledged ETL platforms, which were first designed to move and transform data in an on-premises context, Boomi was designed and built with the cloud in mind. As a result, it addresses the problem of integrating information across different contexts. As application development, delivery, and management shift to the cloud – in either an internal (private, or on-premises) or external (public, or off-premises) context – and as cloud stacks themselves evolve, an analytic architecture must also be able to adapt. Dell likewise markets an essential information integration service in the form of SharePlex, a data replication and synchronization product. Data replication is an important tool in today's information integration toolkit; in a far-flung analytic architecture, it's absolutely crucial: a replication tool like SharePlex uses change data capture (CDC) capability to move as little as possible; it only replicates changed data. This is important both from the perspective of petabyte-scale internal volumes – SharePlex supports replication to and from Hadoop, which is being pitched as a landing and staging area for information of all kinds – and synchronizing or moving data between or among private and public contexts. The final vital pieces of any analytic architecture are a rich data visualization component and an analytic search technology – preferably one that exposes a natural language processing (NLP) interface. Data-viz was first popularized in the BI discovery context; two of the biggest discovery-oriented products – Tableau and Tibco Spotfire – are in fact best-of-breed data-viz tools. In an analytic architecture, data-viz is about a lot more than BI discovery, however. This is thanks chiefly to what we're calling big data, which poses an unprecedented – and persistently misunderstood – problem of scale. We all know that big data entails massive data processing on the back-end – e.g., the ingestion, parsing, and management of a wide variety of data-types from a wide variety of sources – but the single biggest big data bottleneck isn't technological: it's organic. It's the human brain, which just isn't built to process information at big data-scale. In this respect, data-viz func-
25 • rediscoveringBI Magazine • #rediscoveringBI
tions as an essential interpretive tool for big data: it helps to make information at massive scale intelligible. The same is true for NLP and analytic search, which make it possible (first) to more effectively analyze and synthesize information during analysis and (second) for non-technical users to consume the results of a multistructured analysis. Dell's piece of the data-viz puzzle comes by way of its Kitenga Analytics Suite. By now it should be obvious that even the best or most revolutionary analytic front-end technology is all but useless if it isn't able to connect to and consume information sources. Dell positions Kitenga as a best-of-breed data-viz tool for investigative computing and analytic discovery. The former emphasizes a highly iterative, test-driven methodology; the latter involves the exploratory or programmatic analysis of multi-structured data; both require a massively parallel data processing platform of some kind – such as Hadoop. As we've seen, the challenge with using Hadoop as a platform for any kind of analytic workload concerns the difficulty of preparing data for analysis: this is nominally a MapReduce-centric task, but programming for MapReduce requires proficiency in Pig Latin, Java, Python, or Perl, to say nothing of the requisite analytic or data scientific expertise. With Kitenga 2.0, Dell focused on helping to automate this activity: Kitenga can generate MapReduce code to select and prepare data for analysis. It likewise automates the scheduling of MapReduce jobs for certain kinds of statistical analysis. Finally, it exposes an NLP analytic search capability via a portal interface. This makes it possible to disseminate – e.g., even if only as a summary view – the results of an analysis. Call it the “BI trickle down effect,” in which advanced analytic insights get recirculated back into and embedded in the operational context, there to inform or drive decision-making. CONCLUSION So far, we've said nothing about what's arguably the most important aspect of any analytic practice: the people doing the analysis. No application, system, programme, or architecture is turnkey, self-regulating, fully-automated. Someone has to design, deploy, configure, manage, and use it on an ongoing basis. In the case of more traditional business analytic technologies such as OLAP-driven analysis or BI discovery, someone has to ask questions, interact with the data, synthesize findings
or insights. In the case of advanced analytic practices like analytic discovery or investigative analytics, someone has to formulate hypotheses or identify problems; identify potential sources of data; select, prepare, and stage the data; design the experiment or build the model; choose the algorithms, conduct the analysis, and interpret the results in a disciplined and rigorous – in a statistically and scientifically valid – fashion. An architecture for analytics doesn't “solve” any of these problems. It doesn't pretend to be turnkey, self-regulating, or fully-automated. It cannot propose to replace human analysts. The architecture thus described likewise doesn't get in the way of human analysts, either. If it's far from frictionless, it isn't overly obtrusive. The terms “work flow” or “process flow” are suggestive in this regard: activity in an organization, like water coursing through a river bed, seeks a path of least resistance; to the degree that an organization introduces too many friction points into a process – e.g., by means of policies that are aren't simply too restrictive, but which are at once restrictive and easily circumventable; by means of chokepoints or bottlenecks that are impracticable or inessential – a process will flow around, circumvent, undercut, and ultimately bypass them. This was the problem with the traditional DW-driven BI model, which refused to revisit its organizing assumptions and which insisted on maintaining its rigid model – even in the face of compelling alternatives. This is why we have BI tools that are hard to use and data warehouses that are frustratingly hard to change. This is why BI uptake continues to lag. The architecture for analytics thus described aims to minimize friction points or impediments by emphasizing re-use, self-service, a rich (and above all intelligible) user experience, and some degree of automation. It makes pragmatic use of existing technologies such as ETL, data replication, change data capture, and an abstraction layer to mitigate the data preparation and connectivity issues that are such formidable bulwarks to BI and analytics. It treats of DI as a tool, not as a vouchsafed stage or tier – i.e., not as a set of a priori structures or requirements – unto itself. It uses DV, for example, as a tool to manage data sources and connectivity; to encourage reuse, chiefly by exposing a catalog of pre-built business views that can be mashed up or “stacked” to create new applications; and to sup-
port a limited self-service capability, such that analysts or business subject matter experts can be empowered to build new views or to stack views in order to create new applications. It uses automation and automated scripting to enable programmatic access to information residing in non-traditional sources, such as NoSQL data stores. Dell's Kitenga offering, for example, combines data visualization – a critical interpretive technology in an age of escalating data volumes – with self-service capabilities and automation: Kitenga can generate MapReduce code for certain kinds of analytic workloads; it likewise helps to abstract the complexity of programming for MapReduce by offering “wrappers” for non-Java or non-Pig Latin code.
An architecture for analytics doesn't do away with the critical need for top-flight human analytic talent. It focuses on making it easier for human actors to prepare, conduct, and interpret their analyses." An architecture for analytics doesn't do away with the critical need for top-flight human analytic talent. It focuses on making it easier for human actors to prepare, conduct, and interpret their analyses. It is explicitly collaborative, exposing collaborative capabilities in all dimensions -- from data integration to BI discovery to the more arcane advanced analytic practices, such as investigative analytics. This permits an organization to do more with less, so to speak: i.e., it makes it easier for human analytic rockstars – be they data scientists, stat geeks, or especially savvy analytic discoverers – to package and productize their insights so that they can be recirculated and consumed by rank-and-file knowledge workers.
rediscoveringBI Magazine • #rediscoveringBI •
CHADVISED OPRESEAR REARCHAD CHADVISED DVISEDEVE DEVELOPR RADIANT ADVISORS
R E S E A R C H . . . A D V I S E . . . D E V E L O P. . . Radiant Advisors is a strategic advisory and research firm that networks with industry experts to deliver innovative thought-leadership, cutting-edge publications and events, and in-depth industry research.
rediscoveringBI F o l l o w u s o n Tw i t t e r ! @ r a d i a n t a d v i s o r s