Data Lakes and Getting Buisness Users the Data They Need

Page 1

Jose Director of Analytics 路 Dunn Solutions NameHernandez路 路 Title 路 Dunn Solutions

2017


Today’s Agenda Introduction to Dunn Solutions Group What is a Data Lake? You Need a Data Lake Q&A


Dunn Solutions Delivers Velocity to Businesses

Dunn Solutions is a digital commerce and business transformation consultancy focused on delivering velocity to our clients. Velocity is achieved by the combination of both speed and direction. Dunn Solutions helps our clients achieve speed by automating business processes and direction using advanced analytics. Our teams align with organizations to optimize their unique processes and help them discover the most profitable routes to business success.


Dunn Solutions is a full-service IT consulting firm founded in 1988

Minneapolis Delivery ďƒ— Training

Chicago Delivery

Raleigh, NC Delivery ďƒ— Training

Bangalore, India Delivery


Practice Areas

Solutions

Application Development

Analytics •

Data Lakes

Training

Portals

IoT

Certified SAP, Liferay, Microsoft

e-Commerce & Content Managed Websites

Predictive Analytics

Accountable Care Orgs (ACO’s)

Machine Learning

Corporate Legal

e-Commerce

Higher Education

Classroom, Onsite, Computer Based & Virtual

Optical Shop

Mobile App Development

Custom App Development

Search Engine Optimization

Analytics

Cloud - BI Platforms

DW & Data Integration

Mentoring & Custom Training

Frameworks


Selected Clients


Partnerships


Analytics Practice

Business Intelligence • • •

KPI’s and Metrics Dashboards Exploration and Visualization Ad Hoc Analysis & Reporting

Big Data • • • •

Hadoop, Hive, Sqoop, Spark NoSQL MapReduce

Business Analytics Data Integration • • • •

Data Mining Predictive Analytics Prescriptive Analytics R, AzureML

Data Repositories • • • • •

Data Lakes Columnar In-memory EIM (Data Integration & Data Quality Dimensional Modeling


Analytics Services in the Cloud

Analytics Services • • • •

Develop Forecasting Models Productionizing Predictive Models Retail Analytics Machine Learning

Migration Services • Migrate your Data Warehouse to the Cloud with Azure and AWS • Migrate SAP BusinessObjects deployments

Big Data Services • Data Lakes • Big Data • Integration with Data Warehouses

Data Warehousing Services • Full Lifecycle Data Warehouse Development • Extend Data Warehouse to the Cloud • Massive Data Warehouses in the Cloud • Snowflake


Microsoft Azure Consulting Services

Azure HDInsight

Azure Training Partner

Azure Machine Learning

Azure SQL Data Warehouse

Azure Stream Analytics

Azure Data Lake Azure Event Hubs


Amazon Web Services Consulting

Amazon EMR

Amazon IoT

Amazon DynamoDB

Amazon Kinesis Firehose

Amazon Lambda

Amazon Redshift Amazon Machine Learning


Dunn Solutions Global Delivery Model People

• • •

U.S. based management of teams and client communications All resources interviewed and approved by DSG leadership Right Model/Right Project • U.S. only • U.S.-- India • India only (EMEA clients)

Process

• • • • •

Mature and proven Phased approach Project sensitive Software Engineering methodology Certified Quality Processes

Technology

• •

Current technology awareness Risk awareness


Today’s Agenda Introduction to Dunn Solutions Group What is a Data Lake? You Need a Data Lake Q&A


Jose Hernandez, Director of Analytics


Warning! Today’s data consumer is very demanding, and rightly so! 80% of consumers need KPIs and operational data – The data warehouse is ideal for them.

10-15% of consumers do more analysis; they use the data warehouse as a source, but dive back into source systems to get more data.

The rest of the consumers do very deep data analysis – this includes data scientists. They are voracious data consumers and data creators! (IT can’t keep up with them)


Savvy Data Consumers Needs Access to information… • What information?

A: any, all, even data not though of

• When?

A: anytime, now would be great

• How much?

A: all of it, as much as there is

Analyze the data.. • What tools?

A: whatever tool is need (lots of great tools are available commercially and open source)

• What kind of data?

A: all kinds


Traditional Data Storage and Management Challenges The demand for data has never been greater! • • • •

Business users rely heavily on IT IT controls access to the data Accessing data across sources is very challenging Schema on write*

What about the enterprise data warehouse? • Does not provide just-in-time data • Requires lots of lead time • Limited to the “required” data

KPI

*Sorry  I could not avoid this terminology, more in a bit….


The Data Lake Provides Relief


What is a Data Lake? The Data Lake is about democratization of information

It provides your organization a cost effective way to store information for later processing It lets your information consumers and researchers focus on finding the next big thing, not wasting time finding the data For the techies in the crowd‌. A Data Lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. It also provides compute power to work the data.


Origin of the Data Lake James Dixon of Pentaho is credited with coining the phrase “Data Lake” Dixon’s analogy… Think of the data mart as bottled water; cleansed, packaged and delivered for your consumption The data lake is a man made reservoir of water in its natural state, no processing


Purpose of the Data Lake Feed the data starved users Make it easy to consume and combine Deliver the data just-in-time

Store all kinds of data (whether you have a specific need today or not), and lots of it Worry about how it’s going to be used later (schema on read)

Provide boundless playgrounds • To store data • To process data


Warning! The Data Lake does not replace the Enterprise Data Warehouse!


Comparing the Data Lake to the Data Warehouse Data Lake

Data Warehouse

• Stores everything • Unprocessed / RAW • Unstructured, semistructured, structured • Democratization of data • Shared data stewardship • Provides compute power

• Data focuses on Business Processes • Highly processed & massaged • Tabular & structured • Lots of effort on design & build • Optimized for data retrieval • Highly governed


It’s Not Just About Data Storage Storing and accessing data is only part of the Data Lake’s Purpose The Data Lake must also provide the ability to:

• Massively process data (usually in place) • Process and combine structured, semi-structured and unstructured data • Grow and shrink in both storage and compute power as needed • Onboard data very fast • Perform advanced analytics (massively process data)


Supporting Top-Down and Bottom-Up

Data Warehouses use the Top-Down approach

From generalized principles (known to be true) to a specific conclusion

Descriptive

Data Lakes use the Bottom-up approach

Predictive

From specific instance into a generalized conclusion


What Does a Data Lake Look Like?


Filling the Data Lake Types of data • Structured Data • Semi-structured Data • Unstructured Data No schema is applied at load time Data loads very fast The Data Lake is infinitely deep and can hold all data


Consuming from the Data Lake Supports many uses • Data Exploration • Staging for the Data Warehouse • Data enrichment • Predictive analytics

• Mixing disparate data • Apply schema on demand (on read) • Processing massive amounts of data

• Sandboxes for experimentation


Warning Don’t let your Data Lake turn into a data swamp! It’s not the Wild, Wild, West. Governance is still needed.

Data consumers must also be citizen data stewards.

Include metadata (data about your data)

Don’t contaminate the Data Lake with bad data (get it from trusted sources)

Data Lakes hold all data; however set and enforce boundaries.

Have a vision for your data lake; know what it will be used for.


Security and Governance Access and Security • It’s a data playground, even playgrounds have rules • Not all the data should be available to all users (confidential information that must be protected) • Is the data sensitive in nature? Are there laws governing the data that require encryption?

Data Quality • poor quality data, don’t put it in your data lake • Trust the source


Today’s Agenda Introduction to Dunn Solutions Group What is a Data Lake? You Need a Data Lake Q&A


Voracious Data Consumers Must Be Served! Getting back to the 10% of users that need all the data; the Data Scientists Your organizations success and survival depends on • • • •

Innovation Efficiency finding the next big thing getting (and keeping) an edge

The data scientists and data analysts give you the ability to do this. The data lake supports:

• • • • •

Predictive Analytics Prescriptive Analytics Machine Learning Experimentation (A/B Testing) Qualitative data analysis – help steer strategic decisions


How Does a Data Lake Complement the EDW? Your enterprise data warehouse is home to historical data and metrics that feed your KPIs, PIs based on your business processes. It does this by extracting, transforming and loading the data required to support your “known” KPIs and metrics. What if you determined that some data element was needed to provide a KPI you should have been tracking? You would add that to your data warehouse and start populating from that point forward. Too bad, wish I would have thought of this sooner, there are some historical trends that I would be able to identify 


Give Super Powers to your Data Warehouse! Imaging you could go back in time! In the previous scenario you did not have the historical data because: a. It was not being captured because it wasn’t considered b. The EDW staging area is transient and typically only goes back for a short period of time The Data Lake would have given your data warehouse the ability to go back in time!

The data lake can serve as a great staging area for your EDW. It can store transactional data from the beginning of time: a. Letting you go back in time and reconstruct your EDW to incorporate the information you did not consider b. Also it would allow you to rebuild your EDW from day one in the event of a catastrophic failure


Warning! Deploying a data lake is very expensive and challenging. So don’t!


Do It in the Cloud!

Easily Scales Pay for what you use

Reliable & Trusted Supporting Tools


Delight Your Data Consumers!

You’re wondering whether the Data Lake can help you with your data starved consumers. The simple answer is yes. You don’t have to start huge (that’s the beauty of cloud based data lakes). We can get you started immediately. Your data consumers will be very happy.

Contact us info@dunnsolutions.com


Question & Answers

Watch for more webinars featuring how Data Scientist “do their thing” with Data Lakes in the cloud!

Jose Hernandez· Director of Analytics · Dunn Solutions


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.