Big Data Innovation, Issue 26 by Innovation Enterprise

T H E L E A D I N G V O I C E I N B I G D ATA I N N O VAT I O N

BIG DATA INNOVATION JAN 2017 | #26

Hadoop Is Falling Hadoop has been linked to big data throughout its explosion in recent years, but has hype killed the elephant? | 12

Say Goodbye To Your Data Lake In 2017

What Does Music Streaming Success Mean For Data?

As a concept, the data lake has been exciting companies for a few years, but the realities are starting to show themselves | 18

Music streaming has been increasing record label profits for the first time, but what does it mean for data? | 26

Big data innovation summit april 19 & 20 2017 san francisco

speakers include:

jc@theiegroup.com

www.theinnovationenterprise.com

+1 415 614 4191

ISSUE 26

EDITOR’S LETTER Welcome to the 26th Edition of Big Data Innovation

Despite the huge amount of data we collect and the reliance that we have on it for almost everything we do, when it is discussed in public it is often either misinterpreted, misrepresented, or outright ignored. Nothing has shown this more than climate change, where data has shown irrefutably that it exists, but despite this there are still several members of the senate who deny it exists or that it is caused by human influence. With the new administration here, there has been some considerable, and understandable, fear about how the huge amount of data on the subject may be treated. This has even caused several leading US climate scientists to copy and move decades of climate data to different countries in order to protect it from manipulation in the US. What we are seeing is the same data being presented in different ways to

prove polar opposite points, essentially undermining both the right and wrong answers. The biggest impact this has is on institutional trust, where the Edelman Trust Barometer has shown that in two thirds of all countries surveyed, less than 50% trusted businesses, media, government, and NGOs. Even more shocking was the 71% who said that government officials are not at all or somewhat credible, and the 63% who said the same for CEOs. At the same time, 60% of those questioned trusted ‘a person like yourself’ as much as a technical expert or academic.

This is a dangerous place to be and data is the only defense against it. It is our job as those responsible for collecting and communicating the data to make sure we are doing it clearly and with as little wiggle room for bias as possible, otherwise this trend is only likely to continue.

george hill editor in chief

This essentially means that if a world renowned economist says to an individual that a move by a government will damage their economy and somebody with no qualifications who shares similar views to the individual says it won’t, they put the same weight on both conclusions.

data visualization summit april 19 & 20 2017 san francisco

speakers include:

rasterley@theiegroup.com

+1 415 692 5426

www.theinnovationenterprise.com

contents 6 | AI STARTUPS TO WATCH OUT FOR IN 2017

18 | SAY GOODBYE TO YOUR DATA LAKE IN 2017

As artificial intelligence has grown, we look ahead to see which startups are going to be making waves in the next 12 months.

As a concept, the data lake has been exciting companies for a few years, but the realities are starting to show themselves

9 | INTERVIEW: BIG DATA TOOLS ARE ONLY USEFUL IN THE HANDS OF A CRAFTSMAN

20 | THE THREE BIG DATA INNOVATIONS THAT NEED TO HAPPEN IN PHARMA

We speak to Cetin Karakus, Global Head, Analytics Core Strategies & Quantitive Development at BP

Pharmaceutical companies have not had an easy time of it recently, but wider adoption of big data can help

12 | HADOOP IS FALLING

24 | THE DIFFERENCE BETWEEN BIG DATA AND DEEP DATA

Hadoop has been linked to big data throughout its explosion in recent years, but has hype killed the elephant?

14 | HOW IS BIG DATA SHAPING BANKING? We speak to Sreeram Iyer, the Chief Operating Officer for ANZ Bankâ&#x20AC;&#x2122;s Institutional Banking business

Two terms that many incorrectly believe to mean the same thing, so what exactly is the difference?

26 | WHAT DOES MUSIC STREAMING SUCCESS MEAN FOR DATA? Music streaming has been increasing record label profits for the first time, but what does it mean for data?

WRITE FOR US

ADVERTISING

Do you want to contribute to our next issue? Contact: ghill@theiegroup.com for details

For advertising opportunities contact: achristofi@theiegroup.com for more information

managing editor george hill | assistant editor james ovenden | creative director nathan wood contributors alex lane, jessica zhang, olivia timson, gabrielle morse, kayla matthews

AI Startups To Watch Out For In 2017

Alex Lane, Organizer, Predictive Analytics Innovation Summit IN THE WIZARD OF OZ, THE TITULAR WIZARD responds to Dorothy’s accusation that he’s a humbug and a bad man, ’Oh no, my dear. I’m a very good man. I’m just a very bad wizard.’ For many, this is AI in a nutshell. The industry is rampant with good intentions, but it has so far fallen short of its promise. Debacles such as Microsoft’s racist Twitter bot early in 2015 demonstrated the degree to which machines still require human involvement, and we are still early in the technology’s maturity cycle.

Research firm Gartner included AI and machine learning in its list of 10 strategic tech trends for 2017

While the technology may still lack maturity, however, progress is being made. In the past, AI has been stuck in a vicious cycle, going through periods of rapid progress only to hit a wall and see investment flatline - a phenomenon known as ‘AI winters.’ However, Andrew Ng, chief scientist at Baidu Research, believes that progress is now such that we have seen the back of such cycles, noting ’there’s definitely hype, but I think there’s such a strong underlying driver of real value that it won’t crash like it did in previous years.’ Evidence would appear to reinforce Ng’s assertions. Research firm Gartner included AI and machine learning in its list of 10 strategic tech trends for 2017, universities have started to offer classes around AI’s possibilities thereby increasing the talent pool to help drive AI forward in 2017, and M&A activity remains high. The real drivers of the AI revolution, though, are the many innovative startups that have sprung up in recent years, heavily backed as they are by the de ep pockets of VCs and tech giants. We’ve looked at some of the startups set to make waves in the next year and help drive us to a truly autonomous future.

Skymind There are a number of open-source frameworks for deep learning, with both large companies and startups realizing the benefits. Indeed, Andrew Ng cites the prevalence of open sourcing for AI as one of the reasons that AI winters have come to an end. Skymind’s Deeplearning4j is one of the best. It is unique in being the only commercial-grade, opensource, distributed deep-learning library written for Java and Scala. It integrates with Hadoop and Spark, and it is specifically designed to run in business environments on distributed GPUs and CPUs.

The company was founded in 2013 by CEO Chris Nicholson and now has a staff of 15 spread across the globe. Earlier this year, they closed a $3 million funding round with financing coming from Tencent, SV Angel, GreatPoint Ventures, Mandra Capital, and Y Combinator. Its libraries were downloaded 22 thousand times in August - a number that is growing at 17% month-on-month - and its customer list already boasts such giants as $42 billion French telecom Orange SA.

6sense 6sense has now raised total equity funding of $36M in 3 rounds from a list of investors that includes Bain Capital Ventures, Battery Ventures, Venrock, and Salesforce. It helps companies like Cisco and IBM predict everything a sales department would need, using a private network of billions of timesensitive intent interactions to reveal new prospects at every stage of the funnel and determine any prospects that are available to buy, how much they would be willing to spend, and when.

iCarbonX iCarbonX is attempting to apply advanced data mining and machine learning to an individual’s biological, behavioral, and psychological data in order to construct an ‘digital you’. The technology looks at everything from saliva and DNA, through to diet and environmental factors like air quality. Using this data, it creates accurate, individualized health analysis from which they can suggest tailored wellness programs, food choices, and medicines. Despite having only recently opening its doors in October 2015, iCarbonX already has a valuation of $1 billion following a $155 million Series A round led by Tencent, an Asian internet behemoth with a market capitalization of around $200 billion.

Ozlo For many, the first real interaction they will have with AI is through the personal assistants on their phones. Apple’s Siri and Microsoft’s Cortana have already proved themselves in the market place, but while such virtual assistants have a clear advantage in that they come built in to people’s phones, they have competition from independents. Ozlo, for one, launched on iOS last October. Its focus thus far has been on finding people restaurants, bars, and recipes by analyzing data from sources like TripAdvisor, FourSquare, and Yelp, among others, and it is looking to expand. In 2017, Ozlo is being made available to third-party developers so that they can build their own versions of Ozlo. It differs from its rivals in that it pulls together different data sources in a single query, and while it is up against it in terms of matching its rivals’ for funding, it is well positioned moving forward should it get all the training data it needs.

Clarifai Clarifai was started in 2013 by Matt Zeiler, a former Google Brain intern with a big reputation. It specializes in visual recognition, working in a similar way to Google Photo but with a number of improvements. It also now allows customers to train neural nets of their own, with developers able to easily integrate its visual recognition services into their apps. The company already provides services to major brands including Unilever, Curalate, Trivago, and others. To date, it has raised total equity funding of $40M across 2 Rounds from 11 Investors, with its most recent funding round seeing it raise $30M in October 2016.

Premium online courses delivered by industry experts

academy.theinnovationenterprise.com

Big Data Tools Are Only Useful In The Hands Of A Craftsman An Interview with Cetin Karakus, Global Head, Analytics Core Strategies & Quantitative Development at BP

Jessica Zhang, Organizer, Big Data Innovation Summit

The first thing someone new to the field will experience is confusion

AHEAD OF HIS PRESENTATION AT THE Big Data Innovation Summit in Singapore on March 1 & 2, we spoke to Cetin Karakus, Global Head, Analytics Core Strategies & Quantitative Development at BP.

Cetin has almost two decades of experience in designing and building large scale software systems. Over the last decade, he has worked on design and development of complex derivatives pricing and risk management systems in leadi ng global investment banks and commodity trading houses. Prior to that, he has worked on various large scale systems ranging from VOIP stacks to ERP systems. In his current role, he had the opportunity to build an investment bank grade quantitative derivatives pricing and risk infrastructure from scratch. Most recently, he has shifted his focus on building a proprietary state-of-the-art BigData analytics platform based existing open source tools and technologies.

/ 10

Cetin has a degree in Electrical & Electronics Engineering and enjoys thinking and reading on various fields of humanities in his free time. How do you think big data has impacted the energy industry over the past decade?

This is not to say that Data Lakes are going to disappear this year, only that companies entering into data lake investment should do so only after a good deal of consideration Even before the term â&#x20AC;&#x2DC;big dataâ&#x20AC;&#x2122; was popular with mainstream IT world, the energy industry, especially the upstream oil and gas exploration business, were making heavy use of data and analytics nowadays considered as big data analytics.

Getting oil and gas out of deep offshore basins is extremely complex and expensive undertakings. You better know where you are digging and be sufficiently certain that you will find oil and/or gas reserves there. The way energy companies deal with this is to collect huge amount of data (geoseismic) and use this in sophisticated reservoir simulation and visualization models and hence get a sense of what is out there before any expensive physical extraction operations undertaken. Big data analytical techniques have also been used in plant operations and transportation networks. You could have thousands of miles of pipelines to transport oil and gas with associated pumps, valves, storage tanks, interconnect hubs, etc. in a pipeline network. There could be millions of sensors constantly monitoring the network and capturing vital operational data statistics that have to processed and acted upon by operations staff. Using big data analytics type systems you could resource your operations teams optimally i.e. use the optimum number of resources that will maintain healthy and safe operations.

What do you think is the most important use of big data in BP today? I would say, this is my personal guess, it would be in upstream oil and gas exploration: reservoir simulation and visualization and reserve prediction. These models not only consume huge amount of data, they are also very computationally expensive and we utilize massively distributed computing infrastructure to run them. They play key role in upstream investment project decisions.

What challenges are you currently facing in using big data? Being an integrated energy company, BP deals with the entire spectrum of an energy business: exploration, production, transportation, refining, distribution and trading. Moreover, it operates across the globe, employs tens of thousands of people and has presence in pretty much every country. When you think about it, a huge amount of data passes through BPâ&#x20AC;&#x2122;s systems, some BPâ&#x20AC;&#x2122;s own data, some belonging to third party data sources, and other publicly available data. Combining all this data and getting a holistic picture of the energy world, mapping business and market dynamics, I would say, is the main challenge. This is an enormous undertaking with a prize matching in size and is more of a journey than a fixed project.

How do you see the use of big data in the energy industry changing in the next 5 years? I think we will see more usage of data analytical techniques in downstream (refining, marketing and distribution) businesses. This will improve demand forecasting and will optimize the end of the supply chain. Ultimately similar

methods will be employed across the entire energy value chain, creating value by optimizing the whole of the supply chain, from production to distribution. As energy markets becoming more liquid and moving from a long-term supply based model to a trading based model, the use of data analytical techniques to gauge the market dynamics and supply/ demand imbalances will also play a key role in any playerâ&#x20AC;&#x2122;s ability to remove those inefficiencies through profitable arbitrage operations.

In my presentation, I will go through a modular framework of combining big data tools in the data flow pipeline and use those pipelines to accomplish different tasks. It is an extremely scalable approach both in terms of tools & software components, but also in terms of the people and teams who develop those tools. In fact it prescribes a role-based, distributed working model where you will have component developers, pipeline designers, pipeline executors, etc. that will fit nicely into a diverse big data analytics team.

The last question is about the presentation you are going to give on the summit. What can the audience expect to take away from your presentation? There are a bewildering number of tools, applications and technologies in the big data space nowadays. The first thing someone new to the field will experience is confusion. While there are clearly no shortage of tools that do a specific task (e.g. store data in compact form, executes jobs, builds a machine learning model, etc.), none of those tools themselves will solve your specific problems and hence create value out of box. Just like tools in a toolbox, they are only useful in the hands of a craftsman who knows how to put them to good use to solve real problems. Continuing with the toolbox analogy, tools are also really useful when used together. It is very hard to find a single tool that does a lot of things and even when you find such a tool, it more often than not, does each of the things it is supposed to do rather poorly. / 11

Hadoop Is Falling George Hill, Editor-in-Chief

THREE YEARS AGO, LOOKING BEYOND HADOOP was insanity, and there was little else that could come close according to many in the media. However, the reality has been a little different. / 12

.For a long period, Hadoop and big

data were almost interchangeable when they were being discussed by those in the media, although this was not necessarily found to be the case amongst data scientists. A study by Silicon Angle in 2012 analyzing Twitter conversations between data professionals talking about big data found that they actually talked about NoSQL technologies like MongoDB as much, or more, than Hadoop, which would indicate that it has not actually been the must have that many assumed it was.

For some companies this has had real world financial impacts Most would argue that Hadoop has been one of the single most important elements in the spread of big data, that it is very much the foundation on which data today is built. We are also still finding new ways to use it, in warehousing for instance. That being said, to the surprise of many, its adoption appears to have more or less stagnated, leading even James Kobielus, Big Data Evangelist at IBM Software, to claim that ‘Hadoop declined more rapidly in 2016 from the big-data landscape than I expected.’ The reasons for this are hard to ascertain, but could be down to a problem common in data circles. A 2015 study from Gartner found that 54% of companies had no plans to invest in Hadoop, while 44% of those asked had adopted Hadoop already or planned to at some point in the next two years. This could, depending on your point of view, be taken to mean either that it would see even further expansion or that the majority were ignoring it. However, the survey also revealed a number of other telling factors with implications unlikely to have subsided since. Of those who were not investing, 49% were still trying to figure out how to use it for value, while 57% said that the skills

gap was the major reason, a number that is not going to be corrected overnight. This coincides with findings from Indeed who tracked job trends with ‘Hadoop Testing’ in the title, with the term featured in a peak of 0.061% of ads in mid 2014, which then jumped to 0.087% in late 2016, an increase of around 43% in 18 months. What this may signal is that adoption hasn’t necessarily dropped to the extent that anecdotal evidence would suggest, but companies are simply finding it difficult to extract value from Hadoop from their current teams and they require greater expertise.

current issues. Instead it is perhaps the hype and association of big data that has done the real damage2. Companies have adopted the platform without understanding it and then failed to get the right people or data to make it work properly, which has led to disillusionment and its apparent stagnation. There is still a huge amount of life in Hadoop, but people just need to understand it better.

Another element that may be cause for concern is simply that one man’s big data is another man’s small data. Hadoop is designed for huge amounts of data, and as Kashif Saiyed wrote on KD Nuggets ‘You don’t need Hadoop if you don’t really have a problem of huge data volumes in your enterprise, so hundreds of enterprises were hugely disappointed by their useless 2 to 10TB Hadoop clusters – Hadoop technology just doesn’t shine at this scale.’ Most companies do not currently have enough data to warrant a Hadoop rollout, but did so anyway because they felt they needed to keep up with the Joneses. After a few years of experimentation and working alongside genuine data scientists, they soon realize that their data works better in other technologies. This trend has had impacts beyond a slow down in the adoption of an open source platform though, for some companies this has had real world financial impacts. Cloudera and Hortonworks are two of the biggest companies that build their products out from a Hadoop framework. Both have lost significant value in-part due to its decline, with Cloudera reported to have lost 40% whilst Hortonworks’ shares have plummeted 68% since mid 2015. Criticism within this article may seem harsh on Hadoop, but it is not the platform in itself that has caused the

/ 13

HOW IS BIG DATA SHAPING BANKING? AN INTERVIEW WITH SREERAM IYER, THE CHIEF OPERATING OFFICER FOR ANZ BANK’S INSTITUTIONAL BANKING BUSINESS

Jessica Zhang, Organizer, Big Data Innovation Summit

AHEAD OF HIS PRESENTATION AT THE BIG DATA & ANALYTICS

Innovation Summit in Singapore on March 1&2 2017, we spoke to Sreeram Iyer, the Chief Operating Officer for ANZ Bank’s Institutional Banking business. As the Chief Operating Officer for Institutional Banking, Sreeram is responsible for supporting ANZ’s business strategies across several countries; generating value for customers through Technology and driving Operations in all the regions of ANZ’s business. His portfolio includes Institutional Operations, Property, Transformation and Major Programs. Sreeram has more than 25 years of experience in the banking industry, including 18 years at Standard Chartered Bank, with leadership roles across diverse functions, disciplines, and geographies. He has set up and ran the bank’s offshore Shared Service Centres and is an active sponsor and supporter of Robotics Process Automation in Operations. / 14

HOW DO YOU THINK BIG DATA HAS IMPACTED COMPANY LEADERSHIP IN THE PAST DECADE?

has led to a supply chain of both upstream and down-stream providers of people who are data scientists, deeprooted analysts and professionals on capturing data.

That’s a big question in the first place because it covers leadership and it covers a whole decade. But what comes to my mind is this phrase called Big Data which, on the one hand, is actually an aggregation of small data, and, on the other hand, characterised by high volumes of data.

I see Big Data in the shape of a phenomenon, which is taking place in banking, as well as other industries. I also see it as a capability, which is around how you obtain valuable outcomes. And ultimately, it is also an emerging industry, which is spawning different kinds of specialist skills on the supply side of people and on the data capture side. So it’s quite a new thing, because of the ability of the technology to handle data that is becoming cheaper and cheaper every passing day.

The pace of technology has grown so exponentially, that the impact on Company Leadership is on two fronts. One is that you feel overwhelmed with the volume of data and increasing customer expectations, while it’s also mandatory to respond to the regulatory challenges. So one is a sense of being overwhelmed, but, actually there is a sense of big opportunity in relation to how Company Management has been reacting in the recent past. That opportunity comes from many aspects of data, so it makes a whole world of difference between being unable to cope with Big Data and in seeing the business opportunity.

WHAT IMPACT HAS BIG DATA HAD ON YOUR ROLE SPECIFICALLY?

I think Big Data, going back to my earlier point on volume, is a little bit too big to digest, if it is not properly addressed

I think Big Data, going back to my earlier point on volume, is a little bit too big to digest, if it is not properly addressed. Everything I read says it takes shape in three forms generally: one is that big data is a phenomenon based on ever-changing growing information, which affects the way we do business and depends on the habits of individuals with many different touch points. Second, it relates to capability, so it’s about gathering data, how you process data, and how you obtain sensible and valuable outcomes as to how well you can use the data. Third, it has become an emerging industry in its own right, leading to a profession and

HOW IMPORTANT DO YOU THINK AUTOMATION THROUGH DATA IS GOING TO BE IN THE FUTURE? Big Data has now led to different skillsets, such as data science, robotics, new complex algorithms, new models of anticipating customer behavior, et cetera. So in that space, there are different areas, where applications of Big Data within banking will be important. I’d like to say 3 or 4 areas, as follows. One is Big Data being applied to improve transactional efficiency. So if we can use technology and Big Data in banking, we are in a position to improve the way we understand the transactions that flow through the bank. This can be transactions relating to the transfer of funds to and from certain geographies, it can be alert triggers relating to antimoney laundering, it can be mining of text and chatbots between retail customers and the bank. It can be a whole host of transactional matters for our Global Markets business. So there is an opportunity to improve the efficiency of transactional areas in a bank. This is not new, but we all see how consumer behavior is starting to play an important part in the way

/ 15

Current efforts from banks to figure out how to deal with Big Data will get sorted soon

we run businesses - in areas such as credit cards, usage of credit cards and within that, analytics on how the card has been used. As an example, there can be so many perspectives that are about weekend usage, weekday usage and holiday usage. One of the aspects facing banks today is that jurisdictional regulations are quite asymmetric. There might be some common factors, but there are many asymmetric expectations depending on which geography you refer to. So to be able to respond to regulatory expectations based on asymmetric jurisdictional requirements is another opportunity where Big Data plays a huge part. Within that area there are Sanctions, KYC, and Data Privacy matters. And they are each a topic in itself. Lastly, there is an importance of Big Data in capital allocation models that banks are expected to manage. These are under different requirements from various regulators and industry bodies. How you manage the quality of data going into the capital models (because it is quite a big machine), could be a significant differentiator as well. The better the quality of data, the better the outcome that allows to fully realize the capital’s potential. So in summary of my points on Data Automation opportunities, there is one chapter on transactional efficiency, there is another chapter on consumer behavior, and yet another one on regulatory and capital matters.

HOW DO YOU SEE THE USE OF BIG DATA CHANGING IN THE BANKING INDUSTRY IN THE NEXT 5 YEARS? I honestly don’t know if I can see that far ahead because it depends on Technology and, as they often say, Technology changes every Monday morning. So five years is too long a time. But it feels to me, that almost anything that goes on in Big Data today is arguably somewhat foundational in nature. / 16

So today banks are building foundational platforms in areas, such as Data Warehouses based on strategic reviews of what banks want to do using the Big Data. Once the establishment of organizational models is done, which is what many companies are currently going through, I don’t see banks having to spend too much time on organizational models to deal with Big Data - it will become a customary ability to handle data because it will become routine. Toolkits will be well-established and it will become something that you can track. As I said, there are new things surfacing, like Machine Learning, so that is still at an early stage of maturity. That will be a certainty in the next few years. At the end of the day, my point is that current efforts from banks to figure out how to deal with Big Data will get sorted soon. I think technology toolkits will continue to improve. I think new areas of applications will come through, such as Robotics, which I am going to touch upon in my Summit presentation. Also, one can argue and debate whether Big Data will be a differentiator at all in the future, because I don’t believe it will be in a material way, since the abilities to handle it will become somewhat similar between any two institutions.

WHAT CAN THE AUDIENCE EXPECT TO TAKE AWAY FROM YOUR PRESENTATION? I think the only thing I can offer to share is what is my bank’s genuine on-the-ground experience in handling the topic of Big Data, what some challenges are in responding to both the opportunity of it and the overwhelming challenges of Big Data as an overall capability. And I think we can share our own experience on how to use Robotics to increase transaction efficiency. So those are the takeaways that I would like to share with the audience.

internet of things summit april 19 & 20 2017 san francisco

speakers include:

rasterley@theiegroup.com

+1 415 692 5426

www.theinnovationenterprise.com / 17

Y o T e 17 y b 20 d o In o G e k y La a S ta Da via Oli

Tim

, on

tor

ta en

o aC

r u o

THE HYPE AROUND DATA LAKES INCREASED dramatically in 2016, with Gartner finding that inquiries related to the term rose 21% year-on-year. However, while interest in data lakes may have mushroomed, so too has skepticism around whether or not they actually work, and many believe that they are due a fall from grace in 2017. Data lakes are enterprise-wide data management platforms that store disparate sources of data in its native format, until such time as you query it for analysis. So, rather than putting the data in a purpose-built data store, you move it into a Data Lake in its original format. Its popularity is down to a belief that by consolidating data, you get rid of the information silos created by having independently managed collections of data, thereby increasing information use and sharing. Other benefits cited by its supporters include

This is not to say that Data Lakes are going to disappear this year, only that companies entering into data lake investment should do so only after a good deal of consideration

lower costs through server and license reduction, cheap scalability, flexibility for use with future systems, and the ability to keep the data until you have a use for it.

data lake products often seems to suggest that all users can dip into data lakes and pull out insights as if it was an arcade game, which simply isn’t true, and this is leading to a great deal of disillusionment.

While these benefits persuaded many to look to Data Lakes as a solution, companies who took the data lake plunge are often not seeing the kind of results they wanted. Indeed, they are often creating more problems than they solve. In 2014, Andrew White, vice president and distinguished analyst at Gartner said, ‘The need for increased agility and accessibility for data analysis is the primary driver for data lakes. Nevertheless, while it is certainly true that data lakes can provide value to various parts of the organization, the proposition of enterprise-wide data management has yet to be realized.’ That was in 2014, and the same is true today. Indeed, Adam Wray, CEO & President of NOSQL database Basho, has gone so far as to call data lakes ‘evil’, explaining that, ‘They’re evil because they’re unruly, they’re incredibly costly and the extraction of value is infinitesimal compared to the value promised.’

Another question mark around Data Lakes surrounds the quality of the data. The entire point of a data lake is that it pulls in any data with no governance. With no restrictions on the cleanliness of the data, there is real potential that it will eventually turn into a data swamp. This lack of governance also leads to security issues. Companies may not even know where the data they’re collecting comes from, what kind of data it is, and the regulatory requirements around its privacy. Companies cannot just store all their data where and how they please, there are rules, and the security around data lakes is often lacking. The protection of data

is vital to a company’s reputation, and it requires strict governance or companies are leaving themselves wide open to all types of privacy risks. All of this is not to say that Data Lakes are going to disappear this year, only that companies entering into data lake investment should do so only after a good deal of consideration over whether it is really the best option and not think that it is somehow going to be the answer to your business intelligence dreams. This year is, however, likely to see more onus put on certified data sets created by IT. Certified data can be shared across departments, solving the problems of data lakes while retaining the benefits.

The flaws are many and risks substantial. For a start, data lakes lack semantic consistency and governed metadata, increasing the degree of skill required of users looking to find the data they want for manipulation and analysis. According to Gartner research director Nick Heudecker, ‘The fundamental issue with the data lake is that it makes certain assumptions about the users of information. It assumes that users recognize or understand the contextual bias of how data is captured, that they know how to merge and reconcile different data sources without ‘a priori knowledge’ and that they understand the incomplete nature of datasets, regardless of structure.’ Simply put, companies still lack expertise among their business users to actually use data lakes. The marketing for many

/ 19

The vast majority of the cost involved in manufacturing drugs is in the discovery phase

ruptor & BOOMer, Digital BOOM

Nick Ayton, Digital Dis-

THE 3 BIG DATA INNOVATIONS THAT NEED TO HAPPEN IN PHARMACEUTICALS Gabrielle Morse, Organizer, Big Data & Analytics for Pharma Summit

AS WE TURN A NEW LEAF FOR 2017, THE calendars may have changed for pharmaceutical companies, but the threats and challenges that existed in 2016 are still very much on the minds of pharmaceutical leaders across the world. / 20

We have seen the UK government hit Pfizer with the largest fine ever after it was found that they had overcharged the NHS by upwards of 2,600% over the sale of an anti-epilepsy drug whilst Actavis is currently waiting to hear if it will face similar punishment for allegedly increasing the price for hydrocortisone tablets by 12,000%. There is also the threat that Donald Trump’s promise to repeal Obamacare could have a significant impact on pharmaceutical companies, given that a report from the Robert Wood Johnson Foundation and the Urban Institute, claims that its repeal would lead to 24 million people losing their health insurance. That would lead to significant decrease in the number of drugs sold so profits are likely to decline as a result. Similarly with Brexit, the price of certain drugs are likely to increase within the UK, meaning that people are going to struggle to afford them and therefore profits will decrease. This combination of close monitoring of prices by the biggest countries in the world and potential challenging business environments means that action needs to be taken, but luckily the spread of big data throughout the industry has allowed for several innovations that could be the saviour of many pharmaceutical companies in the next 12 months.

huge variety of people and diseases that they could be used on. With the acceleration in both the amount of data available to pharmaceutical companies and the speed in which it can be analyzed, it is possible for pharma companies to theorise and either reject or move forward with drugs considerably faster. In fact, Mckinsey have estimated that these better informed decisions could generate up to $100 billion in value for pharma companies. The vast majority of the cost involved in manufacturing drugs is in the discovery phase, where successful products often need to also offset the costs from unsuccessful experiments elsewhere in the company. By modelling drugs and predicting their successes or flaws, this is likely to significantly reduce costs, both for the company in the discovery stage and for the consumer who won’t need to bear the cost of all the failed drugs befovre it. Variety Of Data Collected And Analysis One of the elements that makes big data ‘big’ is that it gathers together huge varieties of data, not simply about the drugs being produced directly, but about anything that could impact the company.

Predictive modeling as a concept has been around for a long time, but with increasing computing power and database size, the pharmaceutical industry has some significant opportunities to use it in the coming 12 months.

In normal circumstances this could be from weather conditions in subSaharan Africa which increase the number of mosquitoes, thereby triggering pharma companies to increase manufacture of malaria drugs and mosquito repellants. In 2017 this is not likely to be any different, especially as we have seen the mosquito-borne Zika virus making headlines around the world, but it will also need to look at more societal data too.

Molecular modeling has had a number of largely unsuccessful iterations in the past, but 2017 may be the time where it can really gain traction given the developments in the area. The ability to identify which ingredients are going to work together and which are going to mix and kill people is essential, but something that has always left a significant margin of error given the

Going back to the potential repeal of Obamacare, pharma companies will need to know the potential impact this may have. This needs to take a huge number of variables into a predictable algorithm. In order to accurately predict sales they will not only need to look at the numbers of people insured and basic rates of disease, but then factor in people who will lose

Modeling

The last twov years have not been good for perceptions of pharma companies

/ 21

insurance, how that will impact their health (fewer check ups means more dangerous illnesses could progress further before diagnosis) and how much people can afford to pay (which will then be linked to average wages, which then need to be subdivided into geographies and demographics). Factoring all of these constantly changing data points into a workable model will certainly be a struggle. Given the lack of clarity surrounding major political decisions (a Trump Presidency and Brexit), it is an environment in which planning will be difficult. The use of data, potentially even in real-time, will be essential for pharma looking to maximize their potential in the coming year.

Business Strategies The last two years have not been good for perceptions of pharma companies, which essentially started because of Martin Shkreli, who GQ / 22

referred to as ‘the worst person of 2015’ and who Seth Myers labelled ‘a real slappable prick’. This came after his company, Turing Pharmaceuticals, bought the out-of-patent drug Daraprim and increased the price by 5000%. It was picked up by every major media outlet and was even used as a campaigning point by several major politicians, including presidential nominee Hillary Clinton, and brought the term ‘price gouging’ to the national lexicon. Following this came the cases brought against Actavis and Pfizer by the NHS in the UK, which saw them overcharging the public healthcare system by 12,000% and 2,600% respectively. Essentially, the price of drugs is now an international sore point and every price change is strictly analyzed by millions of people. This makes the job of leaders within the pharma industry much more difficult, because in the face of potential loss of customers they cannot be seen to significantly increase pricing, given how controversial it is to do so in the

current atmosphere. With this option essentially off the table, pharma leaders need to look at other ways of saving money and making their businesses more efficient, which will require empirical approaches through data. Having the ability to accurately see trends, cost centres, and department performance will make this considerably easier. At present there is little known about the internal data of many pharma companies, given that they are naturally, and understandably, secretive. However, when the challenges of the coming year begin to really manifest, we are likely to see those doing it properly and those who aren’t.

Want to contribute?

channels.theinnovationenterprise.com/authors contact ghill@theiegroup.com / 23

THE DIFFERENCE BETWEEN

BIG DATA AND

DEEP DATA Kayla Matthews, Technology Writer

BIG DATA HAS BECOME AN IMPORTANT topic for nearly every industry. The ability to study and analyze large sections of information to find patterns and trends is an invaluable tool in medicine, business and everything in between. Employing analytics, the root of big data, in your business can lead to advances and discoveries that you might not see otherwise. When it comes down to it, though, big data isnâ&#x20AC;&#x2122;t really a new concept. Itâ&#x20AC;&#x2122;s simply taking data already available and looking at it in a different way. Deep data, on the other hand, may be the real tool that you need to change the world, or at least your industry. / 24

Of the two options, is big data or deep data the best option for your business?

What is Big Data? Big data is an amalgamation of all of the data collected by a business. The specifics will vary by industry, but generally, its information like customer or client names and contact information and other data collected over a business day. Depending on the side of the business, this can be mind-boggling amounts of information, much more than it would be possible for a regular human to go through. Businesses can employ predictive analytics to help sift through the data to find patterns and trends, but much of the information is often useless or redundant.

What is Deep Data? Deep data is, in essence, taking the data gathered on a daily basis and pairing it with industry experts who have in-depth knowledge of the area. We’re talking exabytes or petabytes of data — much more than what could fit on a standard computer or external storage drive. Deep data pares down that massive amount of information into useful sections, excluding information that might be redundant or otherwise unusable.

What’s the Difference? Big data and deep data are inherently similar, in that they both utilize the mass of information that’s collected every single day by businesses around the world. Companies can pair this data with analytics and use it to help predict industry trends or changes, or to decide what departments need to be investments or reductions in the coming year. So how are the two types of data gathering different?

The key is in the data analyzed

to determine what is useful and what is just junk code. Deep data, on the other hand, looks for specific information to help predict trends or make other calculations. If you want to predict which products are going to sell the best during the next calendar year, for example, you wouldn’t necessarily be looking at your customer’s location, especially if you sell online. Instead, you look at data like sales numbers and products information to make predictions. That’s the essence of deep data. Deep data analysis applies to medicine and other similar fields as well. Focusing on one specific demographic, such as age, weight, gender or race, can help make trail participant searches much more streamlined and increase the accuracy and efficacy of drug or treatment trials.

Which one do you need?

` Of the two options, is big data or deep data the best option for your business? That will depend on the kind of business that you run, the industry that you’re in, and the type of data you’re collecting. In general, though, when searching for specific trends or targeting individual pieces of information, deep data is going to be your best option. It allows you to eliminate useless or redundant pieces of data while retaining the important information that will benefit you and your company. Big data and deep data are still both very useful techniques for any type of business. A data consulting firm can help you determine the best techniques to gather and process your data. We are entering the age of big data, and it won’t be long before big data or deep data becomes a necessity rather than an option.

Big data collects everything, down to the last insignificant zip code or middle initial. Trends can be found this mass of data, but it’s much harder / 25

What Does Music Streaming Success Mean For Data? Jordan Charalampous, Big Data Industry Analyst

Music, like fashion, is generally cyclical which works well for data analysis

/ 26

2016 HAS JUST TURNED TO 2017 and many people are

reevaluating. One of the keys to this is throwing out ‘clutter’. For many, this takes the form of throwing out hundreds of CDs that have done nothing except gather dust for years. They were replaced almost a decade ago by MP3 downloads and since then streaming has become the preferred medium for a growing number of music listeners. The growth of those using streaming services over the past 12 months has been huge, with revenue increasing by 57% in the first half of 2016 alone whilst revenue steeply declined for downloads (down 14%) and physical albums (down 17%). It has seen what was once viewed as an inevitable cause for stagnation and the possible destruction of the music industry by the internet as a lifeline. Warner Music, the third largest music label in the world, recorded revenues of $3.25 billion, its highest in 8 years, with $1 billion coming from streaming alone. Today there are around 90 million people who use music subscription services across the world, but this number is steadily increasing and a huge market has developed. Apple Music had added 4 million subscribers between April and September and Spotify has averaged 2 million new subscribers every month across the majority of 2016. The business side of this is huge too, streaming companies now have both paid subscribers who don’t hear adverts and make up a slightly smaller portion of total users, and free users who have limited access to services but have to listen to adverts. According to a report from GroupM, this represents a $220 million opportunity for advertisers. This is due to 60% of streaming activity taking place on mobile devices, where musical choices are more often linked to moods and emotions. It is here that the real value in streaming comes, not only in terms of monetisation but in the huge potential it has to understand an audience. At

present there is a considerable risk in new acts for record labels, for every multi millions selling Taylor Swift there are thousands of Courtney Stoddens who fell into musical oblivion. When you look at the popularity of specific genres or styles, it is possible to see what is popular or what is likely to be popular in 6-12 months.

about their performance directly, rather than relying on secondary reporting from record companies. It also allows them to see considerably more detail in a much shorter timeframe. Suddenly weekly sales and monthly chart positions don’t matter as much, because they can see a minute-byminute update of how their songs are performing.

There are around 90 million people who use music subscription services across the world

Music streaming is growing at an impressive rate and it is giving companies and artists opportunities with data that they’ve never had before. We have already seen how this has impacted industries across the world, now the doom and gloom is lifting it will be fascinating to see what happens in music.

Music, like fashion, is generally cyclical which works well for data analysis as there are predictors that can help to show music labels what is going to be popular in the future. They can then concentrate resources on specific acts that fit within these trends, increasing the chances of success and decreasing risk. We have similarly seen that the move to digital has opened up new opportunities for artists, with global superstars like Justin Bieber and the Arctic Monkeys getting noticed after growing their fanbase online before being signed to a label. Through monitoring of traffic and taking more of a datadriven approach to finding new acts, labels can identify these artists earlier, meaning that bands can grow much faster through having the support of labels at an earlier stage. However, music is not all about music labels and the digital revolution has meant that more artists now have the opportunity to go it alone. This isn’t something that only small obscure bands do either, with some of the biggest acts in the world representing themselves, such as Radiohead, Chance the Rapper and NOFX, all of whom have sold tens of millions of records between them. The uptake in streaming and the data it creates allows these artists to draw insights

/ 27

FOR MORE GREAT CONTENT, GO TO IEONDEMAND

www.ieondemand.com

Over 4000 hours of interactive on-demand video content

There are probably of definitions the single job of Head of Innovation and any withemployee, them dozens of perspectives What would happendozens if a company fundedfor every new product idea from no questions asked? on it should beAdobe done.did Without anythat. official credentials the Randall subject will I was asked give my personal account As how an experiment, exactly In this session,on Mark share thetosurprising discoveries of running an in innovation team in the of an innovation-hungry organisation that started on the highfor street Adobe made creating Kickbox, thecontext new innovation process that’s already becoming an industry model and has innovation. grown to employ 16,000 people overa 80 years. In red the box pastpacked year orwith so Iimagination, have learnedmoney that when comes igniting Each employee receives mysterious and ait strange to innovation culture trumps everything and there really aren’t any rules. In order to get by, I stick some guiding game with six levels. Learn why the principles behind Kickbox are so powerful, why Adobe is opentosourcing the principles and lots gutany feel. Join me forcan an honest andprinciples straightforward perspective entire process andof how organization tap these to ignite innovation.on a modern job without a Mark Randall's serial entrepreneurial career conceiving, designing and marketing innovative technology spans nearly 20 years and three successful high-tech start-ups. As Chief Strategist, VP of Creativity at Adobe, Mark Randall is focused on infusing divergent thinking at the software giant. Mark has fielded over a dozen award-winning products which combined have sold over a million units, generated over $100 million in sales and won two Emmy awards. As an innovator, Mark has a dozen U.S. patents, he’s been named to Digital Media Magazine’s “Digital Media 100 and he is one of Streaming Magazine’s “50 Most Influential People.”

/ 28

Stay on the cutting edge Innovative, convenient content updated regularly with the latest ideas from the sharpest minds in your industry.