Six-Step Approach to Identify a Big Data Problem and Choose the Right Solution Author : Manju Devadas VP Solutions and Technology, Bodhtree firstname.lastname@example.org www.linkedin.com/in/manjudevadas
Many have heard the dire predictionsabout the state of information technology with 10x data growth projections over the coming years. While there is truth to the exploding growth rate of data and the accompanying complexity of analysis, we have faced similar exponential growth in dataovereach of the most recent decades; and every time,technology has risen to the challenge and delivered needed capacity for business, governments and individual users. For parallels we need only look to distributed computing in the 1990s and websites in the 2000s. In the 2010s, big data is a phenomenon nearly everyone comes into contact with, whether they realize it or not. If you carry a smartphone for work or dump thousands ofdigital photos on your home computer, you're already swimming in the Big Data ocean. You just may not have the tools yet to capture, store and process that massive data flow for better decision-making. The purpose of this paper is to demystify Big Data and provide a methodology to assess whether the problems you encounter in your enterprise are Big Data problems. Working with large companies and start-ups in the Silicon Valley has allowed me to validate this methodology in diverse business verticals and company sizes.If nothing else, this paper will help you act from a position of knowledge surrounding Big Data, avoiding the hype and misinformation that commonly accompanies the latest technologies.
First Our Ecosystem Each time you press a key, rate a product or navigate a GPS map, you generate data in some form. Of all this stored data,usually only a small portion is being analyzed to find new answers to challenging questions. If Samsung launches a new phone, the most unbiased and direct feedback today probably comes from Facebook rather than traditionalcustomer surveys and support lines. Book a flight online today at Kayak.comor Expedia.com and switch to your inbox to find emails with Priceline hotel recommendations in a matter of seconds. How the hell did Priceline know that I wanted a 3.5 star hotel in Monterey for the holiday weekend?With or without my knowledge, I allowed them to capture my personal preferences and travel plans, analyze current web offerings using Big Data, then email merecommendations. The Australian government is using Big Data to analyze seismology patterns and predict earthquakes precious minutes earlier. Big data analysis has found a role in vehicle maintenance, predicting part failure; space exploration such as the Rover landings on Mars; and
fraud prevention, often identifying unauthorized purchases before a customer even realizes their credit card is stolen. The total computing power provided by an optimized Big Data system is capable of analyzing all the data on every desktop in your neighborhood in less than 1 second. The digital image of a hundred-year-old document could be retrieved from a city governmentâ€™s archive databases in less time than it takes to pull a book from the shelf. The evolving technologies in Big Data world are not only making this kind of analytical power possible, they are democratizing it through approaches affordable even to the smallest businesses. The analysis of large volumes of data at lightning speedsis great, but does it actually create real value for people and businesses?Letâ€™s start with your next promotion, which depends on the successful launch of a new product line. Big Data becomes your cheat sheet for understanding customers, allowing you to proactively analyze buying trends and marketing strategies over the last ten years of similar launches. Or consider biometric monitoring that signals you to go to the ER before you actually feel any symptoms. Or maybe your goal is enterprise efficiency in a competitive industry, and you need to identify and eliminate bottlenecks in your supply chain. Each of these challenges can be addressed with Big Data solutions. Solving most business problems in large companies involves some form of data analysis. With the data now being captured in all forms including human, environment, and machinegenerated, it is necessary to identify which problems are Big Data-relatedand which can be solved using traditional data analysis techniques. The last thing management wants is to purchase a new system only to realize existing tools were capable of achieving the same results. (Remember the story of NASA spending millions to invent a pen to write in space when a pencil would have been an adequate solution). Nothing is more wasteful in business than a great solution in search of a nonexistent problem.
What exactly is Big Data?
Big Data is simply complex data sets in massivevolumes (petabytes) and multiple formats (table contents, text, audio, video). With the speed and amount of data being generated today, the corresponding technology demand is drivingnew ways to analyze the information faster, cheaper, and with better results. Three types of data may be present in your enterprise:
a. Large Volumes, e.g. Data stored in Database tables, Excel spreadsheets, Access database b. Unstructured data, e.g. Video, Audio, Facebook, Twitter, Blogs, Customer Reviews, Log files c. ‘Gray’ data, e.g. web traffic where the exact usage is yet to be determined based on business needs that may arise
Enterprises with ever increasing data volumes must take measures to better analyze these data sets to accelerate progress toward company goals and objectives. Even if business seems fine now, you may be ignoring this data at your peril since your competition could be using it to run more efficiently, respond faster, and make better business decisions. As is so frequently the case with technology, if you’re holding still, you’re falling behind. Enterprises also need to have people who can think about data in new ways – not just information stored in tables, rows and columns, but also data as blogs, videos, Facebook posts, GPS coordinates, and traffic sensors. As of today, these ‘Data Scientists’ are difficult to train internally; and the natural reaction is to look for outside hires. But hiring from the outside can present its own set of challenges as these transplants may not bring the same understanding of your business challenges and differentiators. I recommend that you begin with the employees you already have and apply the methodology outlined below toward creating an effective Big Data strategy and roadmap.
How do I know if my problem is a Big Data Problem?
Without delving into details about the nature of the business challenge and existing sources of data, it is difficult for anyone to determine for sure if the problem is a Big Data problem. A Fortune 100 High Tech company in San Jose, California, paid us to fix what they labeled a Big Data problem. Following the initial analysis, we concluded the problem was best solved with traditional data analysis techniques rather than a Big Data implementation. We educated the customer about the unique characteristics of a Big Data problem, and saved the team substantial money since their existing tools were adequate to solve the issues. Hence, even though no general model can substitute for a thorough hands-on analysis, the simple methodology we outline below has been highly effective at quickly determining whether a challenge isBig Data-related.
Quite a few companies see Big Data as a concern only forweb product companies like Facebook and Google with petabytes of data to organize and process. However, a 2011 McKinsey Global Institute study argues otherwise. The McKinsey report found that investment firms averaging less than 1,000 employees have 3.8 petabytes of data stored, a data growth rate of 40 percent per year and a mix of structured, semi-structured and unstructured data types. Overall, McKinsey found in 15 of 17 USindustry sectors have more data stored per company than the U.S. Library of Congress (which currently has 235 terabytes of data) and companies from all sectors have at least 100 terabytes stored, as shown in Figure 1:
Big Data Solution classification:
There are five data conditions called the “Vs” that assist in defining a Big Data problem: 1. 2. 3. 4.
Volume, e.g. multiple petabytes of data Velocity e.g. results need to be analyzed in seconds or less Variety, e.g. Structured and unstructured data like social media posts and video files Variability, e.g. Constantly changing like a stock market
Value, e.g. You’ve identified the clear business value you plan to derive from the data
What does all this mean? The relentless growth of data, new data formats todeal with, and the competitive advantages achieved from managing large volumes of data all emphasize why Big Data should matter to you. If you are an IT professional, you already recognize how difficult it can be to find a solution capable of handling a task as monumental as big data management. Whether you are looking for growth, profitability or productivity in your organization, you are invariably dealing with data; and when that data shows the 5 V characteristics, you now need to start thinking of it as a Big data problem and approach it differently than traditional solutions.
How do you get started? Many of the enterprises fail to implement a Big Data solution because they have not identified clear business cases for the tools. The common trigger to initiate Big Data development is a data blast that existing systems can no longer manage. As these datasets continue to grow in size, the enterprises face the problem of managing, storing and processing the data at the speed required for timely business response. Below is the Bodhtree’s six-step process to take enterprises from Big Data Problem Definition to Solution Implementation, a methodology which has been applied with excellent results at a large Bay Area networking company and several other Bodhtree customer locations:
Bodhtree Six Step process for a Big Data problem definition to solution delivery:
Step 1: Understand the Use Case
Depending on where you reside in the organization,the chances are high that you will first feel a sense of data overload before you can articulate a clear business case to leverage that data. Often this prompts enterprises to reactively implement a Big Data solution without deciding in advance what problems it will be used to resolve.
It is critical that you deep dive and understand the business case first before even thinking along the lines of Big Data. Otherwise, there will be a lack of focusthat feels a little like staring through a microscope at unintelligible detail without ever stepping back to see what is specimen sitting on the glass. In terms of IT, one Bodhtree client managing a large warehouse of customer, product and geography information with 100s of terabytes of data said he had a Big Data use case, but everything he spoke about involved only structured data, failing the 5 Vs test. Even before worrying about Big Data, do a litmus test by asking the following questions:
Business Case – Do I understand the value of solving the problem in hand? Can I quantify the potential value of a big data solution or at least articulate the qualitative benefits?
Dependencies – Have I collectedall the relevant information about the customer, install base etc.?
Complexity –Have I inventoried the data sources and characteristics to determine the complexity?
Lead Time – Have I created a reasonable plan with adequate time to acquire relevant hardware and data?
Initiative Alignment – Is the project aligned with corporate objectives and are project sponsors committed to the end-to-end process?
Step 2: Understand the Current Landscape
Carefully analyzing the use cases defined in step 1 enables you to identifyall data entry and storage points. Often critical data entry points are discovered during this review process which were not realized initially.
Map the end-to-end process and data flows for the business capabilities, e.g.How does the data flow to you from the customer and among internal teams?
Build a Reference Architecture to highlight the current systems and tools and its readiness for Big Data. Validate you have access to the data you plan to analyze.
Step 3: Build a Blueprint
Define your overarching architecturalchallenges in doing theBig Data analysis defined in the use cases, e.g. What architecture will I need to store the customer install base information along with product information?
Identify the right high level Big Data solutions leveraging technology agnostic vendors and advisors.
Document a clear delta between As-Is & To-Be with the introduction of the Big Data solutions while addressing the pain pointsat eachtransition phases
Document the Risks & Dependencies that could impact business results, cost or schedule. Remember rolling out sophisticated tools does not guarantee success. Watch out for hidden landmines, e.g. Data Quality.
Step 4: Identify the Big Data Technologies
Deep dive into the Big Data technology dependencies and the impacts they have on the system/tools and organizations. For example, you might consider howHadoop adoption overlaps with your Business Objects installation to analyze the customer and product data.
Determine which users will be consuming the information and analysis. What formats do these reports need to be in? Do they require mobile interfaces? Current BI reports and subscribers often provide relevant insight to these questions.
Step 5: Build a Big Data Roadmap
Avoid the traps of either over investing or under investing – have the business cases drive the solution.
Plan the roadmap for your Big Datarollout based on such factors as – •
Business priority and management support. Remember, your execs may need to be educated in order to understand the relative business value offered by each phase.
Timeframe of expected results and ROI.
Big Data technology complexities, i.e. Apply the right order to ensure a clean data foundation before conducting analytics.
Step 6: Big Data Solution Rollout
Formalize the right team, experienced in conducting multiple implementations.
Divide scope items across multiple phases/releases to track progress and provide important quality checkpoints.
Document Business Requirement, Functional Analysis and the Solution Architecture
Begin user training before the implementation is complete so analysts can immediately realize business value, building momentum for expanded uses.
Conclusion Big Data, by its very nature, contains endless possibilities for business insight and improved operations. But much like venturing into space without a defined mission, the Big Data world demands that businesses clearly define what they intend to achieve in advance. Otherwise enterprises can spend substantially on fancy tools that may never happen upon real business benefit. Once those business goals are defined, and you have captured a clear picture of the current state of your data, apply the 5 Vs screening questions to determine if the problem truly warrants a Big Data solution. An objective vendor that specializes in a broad cross section of BI and Big Data solutions can assist in this process and advise solutions that maximize your ROI. Upon identifying a Big Data problem, carefully proceed through the six steps of the Problem to Solution Methodology. Realize the real value of Big Data solutions do not come simply with implementation but through applying creative and insightful approaches to harvesting business value from the data. Ensure all dependencies are considered so that your data foundation is
clean, comprehensive and current. Finally, proceed with the implementation, highlighting “quick wins” to convey business value to execs and analysts building momentum for the full implementation. If the above methodologies are applied right, you will end up with saving time, energy and achieving better results with by applying the right Big Data prescription for a REAL Big Data problem.
Contributors : Ryan Madsen, Sushanth Reddy References : •
McKinsey Global Institute study
Bodhtree Customer Case Studies