Futurebrightdatamanagement engels by SAS Nederland

FUTURE BRIGHT A DATA DRIVEN REALITY

TABLE OF CONTENTS

Foreword by Bert Boers

Preface by Jeroen Dijkxhoorn

HR service provider Securex heads for 100% reliable CRM

Data-integration evolves thanks to Big Data and open source

Infographic Data Management

Rijkswaterstaat gains comprehensive insight into its performance

Ronald Damhof allows enterprise-wide discussion on data with the Data Quadrant Model

Improving insight into Belgium’s economic situation

DSM gains control of Master Data

Who is your data governor?

Jill Dyché blogs about her book The New IT

Credit insurance company integrates quality ratio’s in risk assessment

Data dependent on quality improvements

Data-driven decisions make the difference

Master Data Management as a foundation of your business

About SAS

FOREWORD

The seventh edition of Future Bright explores the theme of ‘A Data Driven Reality’. The data driven society is evolving more rapidly than many organizations seem to realize. This is fuelled by developments such as the Internet of Things, which generates vast new flows of new data, creating new business opportunities. At the same time market research firm Forrester has declared that we are living in the ‘Age of the Customer’. Customers leave a digital trail and expect their suppliers to use that information to provide a better and more relevant customized offering. Both developments have a significant impact, as many organizations are now beginning to realize. But as yet they are taking little structured action to genuinely prepare their organization for this Data Driven Reality. That’s understandable, because this is uncharted territory. How can you ago about it? Where do you start? Organizations are aware that they first need to get their foundation in order. At the same time they see a lot of low-hanging fruit that they can pick with rewarding projects in the field of big data analytics. How do those two worlds interrelate? Which investment generates the fastest return? We aim to provide new insights with this book. Drawing on interviews with customers such as Securex, DSM, Rijkswaterstaat and Crédito y Caución and experts as Jill Dyché and Ronald Damhof, we show the steps necessary to give data management a central role in your organization, so that you can get the basics in order and can fully exploit your data to drive innovation, conversion and satisfaction. We hope you find it inspiring reading. Bert Boers Vice President South-West Europe region SAS Institute

PREFACE

Jeroen Dijkxhoorn

DIRECTOR ANALYTICAL PLATFORM CENTER OF EXCELLENCE AT SAS

PREFACE

Data quality has been an issue ever since the first database was created. It was a subject that for a long time received little attention, for the simple reason that process efficiency was always more important than the completeness and accuracy of data. Data was a by-product of the process. This time is over. We are heading towards a data driven reality.

The fact that data was a by-product of the process resulted in databases with a considerable number of errors and omissions. To cope with this, the data was always validated before anyone used it. If a lot of data was found to be incorrect, all efforts were suddenly focused on supplying missing data, correcting incorrect data, and/or cleaning contaminated databases. Human intervention was always required.

Data automatically initiates processes This operating method is becoming problematic now that data sources are increasingly linked and processes are likely to start at any time. Whereas the start time of an e-mail marketing campaign used to be determined by a marketer, it now starts when triggers are received from the customer and you want to respond. The more you understand the customer journey, the easier it will be to respond to those triggers and the more relevant you will be as an organization for your customers. This forces you to set out policies stating how your organization will respond when your customer or prospect requests certain information or signs up for an e-mail newsletter. The process then continues entirely automatically, without human intervention and hence also without the validation that used to take place. The correctness of data therefore has to be checked automatically, by means of services which can be deployed throughout the process. Here we can draw a distinction between data validation (technical correction of data in a physical data stream) and data quality (verification of functional correctness).

Data Driven Reality Organizations used to be driven by processes, but now they are driven by data. This means any failure to identify an error can have an immediate, major impact. Manual correction is no longer possible, so the error will show up in multiple locations. That makes data quality monitoring much more important. It also explains why compliancy rules and regulations are imposing new data quality requirements. Supervisors nowadays want data, not reports. That requires a data driven organization. We are now speeding towards a data driven reality. The problem is not technology, but the lack of central supervision of the consistency of data definitions â&#x20AC;&#x201C; data governance. That is typically the task of a Chief Data Officer, which many organizations still lack.

PREFACE

Age of the Customer and Internet of Things are drivers It is high time to take action, because in the Age of the Customer you need to respond flexibly to triggers from customers. This requires a 360 degree view of the customer. We have been talking about it for many years, but still don’t have it because customer data are spread across various systems. The lack of supervision of data definitions makes it impossible to pull the data together. Another driver is developments resulting from the Internet of Things. This will generate a new stream of data that you will want to use to optimize and largely automate your processes. This also requires a good vision on data management.

Combination of different types of data Whichever of the two stated realities is your main driver, in both situations it is increasingly important to combine 100% reliable data with data containing a degree of uncertainty. Examples are weather forecasts or social media sentiment analyses. How is it possible to combine these unstructured data, often stored in Hadoop clusters, in an appropriate way with structured data that is 100% accurate, such as route planning for truck drivers or purchasing behaviour data?

“Organizations used to be driven by processes, now they are driven by data” Jeroen Dijkxhoorn

From a cost point of view it is not feasible to store all that data in the same database. But that would also be highly undesirable from an organizational point of view, as Ronald Damhof explains later in this book. After all, there is a big difference between data with which you have to account to supervisors and data which you use to experiment, in pursuit of ideas for innovation. And yet those different ways of using data must be combined, without physically lumping all the data together. This complexity requires a clear logical data model and clear data definitions. Without these data definitions and good data stewardship, it is impossible to exploit the opportunities that are arising in the market and which your competitors will respond to in droves. The question is therefore no longer whether you will start, but when. Our advice is: act today. Data is your main asset. Act accordingly and do something with it, before a competitor or a new market player beats you to it. ■

CASE

HR SERVICE PROVIDER SECUREX HEADS FOR A 100% RELIABLE CRM

CLEANING UP THE CLIENT RELATIONS DATABASE AND THEN KEEPING IT CLEAN 10

CASE

Like many companies, HR service provider Securex was witnessing severe problems with their CRM database. Chief among the problems was that marketing data was poor and becoming increasingly unreliable. They cleaned up and updated the database using a SAS Data Management platform. On top of that, this platform is also being set up as a permanent watchdog to ensure the accuracy and consistency of both batch updates and manual data manipulations. The result has been an improved database with greatly enhanced contact information.

CASE

Securex is an HR company active in Belgium, France, and Luxemburg providing services for large businesses, SMEs, self-employed professionals, and private individuals. Services include Payroll, Staff and Insurance Management, HR Consulting, and Health and Safety Services. Securex has a staff of approximately 1,600 throughout their nearly 30 offices, serving more than a quarter of a million clients.

Data inconsistencies lead to frustrations “We want to make sure that whenever anyone within the organization enters or modifies data, the changes are automatically processed and rectified,” reports Securex Business Architect Jacky Decoster. Any data inconsistencies invariably result in considerable frustration by everyone involved. Employees are constantly updating client data, adding and changing contact information and contract data, all while marketing teams are uploading foreign data for new campaigns and other client communications. “Each of these manipulations can produce small data errors or inconsistencies,” observes Decoster. “Since the data is being manipulated by a multitude of persons and departments, problems can easily arise such as duplicate entries, client records with incomplete contract information, and missing contact information such as first name, gender, post and e-mail address, or phone number. This is frustrating, especially for marketing teams running a campaign: many e-mails simply bounce, some mail is sent twice to the same person, and others are based on wrong or missing information. This sometimes severely damaged our reputation.” Although a centralized SAP CRM database had been in place since 2004, the problems have been growing worse in recent years. Decoster noted that complaints about data quality were coming in from both staff and clients. “Obviously we had to do something about it and do it effectively and convincingly.”

SAS Data Management clean up successfully launched The data quality issue was put high on the agenda when Securex launched its comprehensive Client+ project. This change project included the migration of the highly customized SAP CRM database into the cloud-based, standardized, scalable Salesforce.com solution. Securex decided to deploy SAS Data Management to facilitate that migration. Decoster explains that their reasoning proved to be spot on. “SAS Data Management enabled us to meticulously clean the data before uploading it into our new database. The data were normalized, duplicate entries were merged, and missing information was automatically added whenever. SAS Data Management has built-in tools such as data dictionary definition, fuzzy matching, full name parsing, reliable gender determination, phone number standardization, and e-mail address analysis that comprehensively covered all of our concerns. We have already completed the migration of our enterprise accounts in record time and the marketing department tells us they have virtually zero complaints about data quality. It is a huge improvement that would have been unthinkable without SAS Data Management. We are now finalizing our self-employed and individual accounts.”

A permanent watchdog for data quality Decoster insists however, that improving data quality is not a one shot affair; it must be a continuous concern within the organization. It is one reason why Securex opted for a comprehensive approach. Their Client+ project includes the redefinition and streamlining of marketing and sales processes. Part

CASE

“Marketing says that the data quality has improved dramatically, an achievement that we previously considered impossible” Jacky Decoster

of this effort is sensitizing staff about the impact of their data manipulations and insisting that they be both careful and precise. At the same time, SAS Data Management is being set up as a permanent watchdog for data quality. Decoster explains why: “One can never be 100% sure that every single bit of data will be entered correctly, even when people are fully trained and sensitized. That is why we have SAS Data Management make consistency checks and updates on a regular basis, in fact every week. Our next step will be to implement a near real time check. Whenever someone in the organization enters or modifies data, the changed record is automatically processed and corrected by SAS Data Management. This is a process that takes just a couple of seconds.”

Robust architecture and great flexibility Decoster and the Securex staff have nothing but praise for the robust architecture and great flexibility of the SAS Data Management platform. The system can be integrated into any software environment. For example, SAS Data Management provides direct certified connectors to a variety of systems, including Salesforce.com. This avoids the development of customized interfaces. Furthermore, all functionality is offered in stored procedures, ensuring that every transaction is safe and reliable. SAS Data Management is also easy to deploy. “It has a powerful data profiler, which enables us to examine any available data and assess their reliability along with the risk involved in integrating them into new applications. We use this profiler in particular to analyze all data we purchase.” The software also provides a powerful tool to define batch jobs to clean and normalize the data, based on the profiling statistics and information. Decoster then added a final plus. “The learning curve for SAS Data Management is very short: after a two day training we were able to define all of the jobs we needed.” ■

INTERVIEW

“For me, Big Data does not exist as a volume concept.” This is a remarkable statement for a data integration expert to make. “Size is relative. It reflects where you come from.” As such, you cannot define a lower threshold for the ‘big’ in Big Data, but the phenomenon does touch on the field of data integration, which itself has practically become a commoditized specialization.

INTERVIEW How to deal with the explosion of data and the importance of analysis

DATA INTEGRATION EVOLVES THANKS TO BIG DATA AND OPEN SOURCE These doubts about the existence of ‘big’ data are voiced by Arturo Salazar. He is Data Management Advisor Analytical Platform at SAS. Salazar explains how ‘big’ has a whole other meaning for a small business than it has for a large corporation such as a financial institution. So he argues that there can be no lower threshold for the ‘big’ in Big Data. The Big Data trend certainly has major implications for the field of data integration, as this field is now confronted with more data and more unknown data variables. Salazar explains that data integration has existed for some time now and is considered almost a commodity today. However, this is not to say that all organizations feel completely at home with data integration: the importance of using and efficiently deploying data is not understood by all. But as a specialization it has now reached adulthood.

Outside the organizations comfort zone The ‘big’ in Big Data is indeed a relative dimension, he agrees. “Big Data is all data that falls outside the comfort zone of an organization.” It also involves the comprehensiveness of the data sources and, even more important, the speed with which the information can be integrated in order to derive new and deployable insights from it. The question arises whether Big Data is related to the degree of maturity of an organization. If the data in question is only just outside the comfort zone, isn’t it a question of simply expanding, of growing up? “Yes, it is a question of reaching maturity,” says Salazar.

Monthly reporting is inadequate He continues: “Businesses are currently going through a technology transformation with regard to the way information used to be collected and used.” He mentions the striking example of how data integration was introduced, already some years ago now. “Take the web world, for example, and logs of web servers.” The machines recorded web page visits and click-throughs in their log files, including the IP addresses of the origin servers, cookie data, et cetera.

INTERVIEW “All those clicks; that is a lot of data.” And it’s all data that could be useful. Salazar puts it into perspective: “Most log data can actually be thrown out, but you simply don’t know which data you should keep.” Moreover, the valuation of data on web surfing has shifted. Data that used to be erased may now actually prove to be valuable. While it used to be of little interest which other pages on the same website a visitor clicked to, today that data could be crucial. “Take the real-time recommendations provided by modern online retailers.” Another example is tracking surfing habits while visitors are logged on to a site. Profiling is now a spearhead in the customer loyalty campaigns of web companies.

Growing data mountains versus storage costs The example of logs in the web world has brought a wider awareness of the usefulness of data integration. This has in turn fed a demand to deploy data integration for a wider range of applications. There is added value to be had from connecting the website data to ‘traditional’ information sources such as a CRM system or a data warehouse. It has long been recognized that such a connection makes sound business sense. Continuously evolving insights have since ensured that this is not simply limited to the one-sided import of website logs in a CRM application, for example. Efficient use of the data requires two-way traffic and a wider scope. This means the amount of data gets bigger and bigger. In the first instance, the rapid growth of the data that companies collect, store and correlate may not appear to be a major problem. The capacity of storage media continues to increase, while the price per gigabyte is being forced down. As if hard drives abide by their own version of Moore’s law, the exponential growth in the performance of processors. However, not only is the curve for storage capacity increasing less steeply than that of processors, the increase is also insufficient to keep ahead of the explosive data growth.

Extracting unknown nuggets of data An additional problem for data integration in the information explosion is the software, and more specifically database software. A very significant proportion of the looming data mountain cannot simply be stored in a relatively expensive database or a costly data warehouse configuration. Although these enormous mountains of data might contain gold, it is as yet unknown how much there is and where it is. The SAS experts confirms that this is in essence a chicken and egg problem: the as yet unknown value of the data versus the cost of finding it. But hope looms for this form of data mining. New technology is relieving the pioneers of the manual task of sieving for nuggets in the streams that flow out of the data mountain. Nor do they have to laboriously dig mine shafts in the mountain any longer.

Going down the same road as Linux This is where the Hadoop open source software comes into play, a cheap software solution that runs on standard hardware and that can store and process petabytes of data. How powerful is it? Hadoop is based on technology developed by search giant Google to index the internet. “Hadoop is going down the same road as Linux,” explains Salazar. The market is gradually adopting it for more serious applications. “No one wants to store their logs in an expensive database.” However, a problem for many businesses is that Hadoop is still at the beginning of the road that Linux travelled long ago. Both stem from very different worlds than what regular businesses are used to and both require quite some technical knowledge, of users as well. “Initially people were afraid of Linux too,” says Salazar. Since then, companies like Red Hat have combined the system’s core software with business applications and offer the results as packages. Hadoop has just started this packaging process. He points to Cloudera and Hortonworks; he thinks

INTERVIEW

Arturo Salazar

DATA MANAGEMENT ADVISOR ANALYTICAL PLATFORM AT SAS

“Although these enormous mountains of data might contain gold, it is as yet unknown how much there is and where it is” Arturo Salazar

these programs will do for Hadoop what Red Hat did for the adoption of Linux. “Many businesses still consider Hadoop intimidating and too complicated,” says Salazar. They normally employ specialists for such open source software, for installation and configuration as well as maintenance and even everyday use. What skills are needed? Experienced programmers who have coding skills and administrative talent, alongside the knowledge and expertise normally associated with data analysts. This is a rare and therefore expensive combination of qualities.

Bringing Hadoop to the masses Despite its complexity, Hadoop is gaining popularity. “It offers so many advantages,” explains Salazar. Business Intelligence vendor SAS is also responding to this trend. He says that the company uses technology such as Hadoop “under the hood”. The complexity of this software is hidden within processes and programs that the customer is familiar with. Businesses are able to focus on actually using the tools for data integration, instead of first having to call on special experts with knowledge of the underlying software. In February 2015, SAS has introduced a new product to its data management range to increase the user-friendliness of Hadoop under-the-hood. Salazar explains that the new web-based application, called SAS Data Loader for Hadoop, will make it possible to delve even deeper into the data mountain. This application can be used to prepare and then mine the data stored in Hadoop and can be used by data analysts and even ordinary users. Soon we will all be able to mine for gold! ■

DATA MANAGEMENT

CASE

Jacorien Wouters

PROGRAMME MANAGER FOR THE NETWORK MANAGEMENT INFORMATION SYSTEM AT RIJKSWATERSTAAT

CASE An integrated and clear view across highway and waterway networks

RIJKSWATERSTAAT GAINS COMPREHENSIVE INSIGHT INTO ITS PERFORMANCE Rijkswaterstaat, the executive agency of the Dutch Ministry of Infrastructure and the Environment, is responsible for the principal highway and waterway networks and the main water system in the Netherlands. In order to account to the Dutch Ministry of Infrastructure and the Environment and to the lower house of the Dutch Parliament, besides managing its own internal organization with regard to operational processes, Rijkswaterstaat needs to have the right information at the right time and to be able to access it internally and externally. For this purpose it developed the Network Management Information System (NIS).

Rijkswaterstaat began developing the NIS a number of years ago. The system was designed to give an integrated insight into the performance delivered by Rijkswaterstaat, allowing tighter control and providing a broad view of the overall achievement. In the tendering process, Rijkswaterstaat selected SAS’ solutions because they were able to support the entire process from source to browser. Rijkswaterstaat consequently uses SAS Business Intelligence and SAS Data Management for the NIS. “The NIS is now one of the most important information sources for the management of Rijkswaterstaat,” says Jacorien Wouters, Programme Manager for the NIS. “It has brought together the two largest flows of information on our organization’s networks: the performances of the networks and data on assets such as roads and bridges, but also the Wadden Sea, for example. This was preceded by an intensive data integration process.”

CASE

Better decisions When the NIS was introduced in 2004, the data from various applications was spread across the information systems of Rijkswaterstaat’s ten regional departments. Now, the NIS periodically obtains data from over 40 source systems. The power of the system lies among other things in the possibility of combining data and presenting it in charts and maps. This gives a fast and clear insight into the performance of the individual departments and of Rijkswaterstaat as a whole. The figures in the NIS have official status. That is very important internally, but also externally since Rijkswaterstaat reports to the Ministry three times a year on the status of specific performance indicators, or PINs. As part of a service level agreement, appointments have been made for a four-year budget period.

More complex analyses In addition to improved control, access to the information has been greatly simplified, as is clear from the increasing number of NIS users at Rijkswaterstaat. “Information which previously could only be obtained by a few employees from a specific data source is now available through the NIS portal to all employees at Rijkswaterstaat,” Wouters explains.

CASE

“The insight into the underlying data helps us to operate more efficiently and hence ultimately to save costs. The clear reporting method also saves time. Now there is a single version of the truth” Jacorien Wouters

A single version of the truth “The insight into the underlying data helps us to operate more efficiently and hence ultimately to save costs,” Wouters continues. “The clear reporting method also saves time. Now there is a single version of the truth, so discussions on definitions or figures are a thing of the past. The fact that information is more readily available in the NIS means we can also make faster, better adjustments. We used to report on performance three times a year and matters came to light which we would have preferred to tackle immediately. Now we can do just that.”

Developments In implementing SAS Rijkswaterstaat took a step towards improving data quality. It also started to use SAS Visual Analytics. “As we simply have more insight into our data, our management can take more forward-looking decisions,” says Wouters. “We’re making constant progress in combining information, highlighting connections which would not previously have been visible.” ■

INTERVIEW

Ronald Damhof

INDEPENDENT CONSULTANT INFORMATION MANAGEMENT

INTERVIEW

Independent information management consultant Ronald Damhof developed the Data Quadrant Model

â&#x20AC;&#x153;Make data management a live issue for discussion throughout the organizationâ&#x20AC;?

The data management field is awash with jargon. Most business managers have no idea what all those terms mean, let alone how to use them in understanding the precise value of particular data and how to handle it. To allow an enterprise-wide discussion on data, Ronald Damhof developed the Data Quadrant Model.

INTERVIEW

Damhof works as an independent information management consultant for major organizations such as Ahold, De Nederlandsche Bank, the Dutch tax authorities, Alliander, and organizations in the financial and healthcare sectors. These are data-intensive organizations which share a growing realization that the quality of their work is increasingly determined by the quality of their data. But how do you move from that realization to a good data strategy? A strategy which everyone in the organization understands, from the director in the boardroom to the engineer in IT? Damhof developed a quadrant model to make data management a live issue for discussion.

To push or pull? Damhof starts by explaining a concept which everyone will have encountered in high school: the ‘Push Pull Point’. This concerns the extent to which demand impacts the production process. He takes as an example the building of a luxury yacht, a process that does not start until the customer’s order is known. The decoupling point is at the start of the production process. We can take matches as an opposite example. If a customer wants matches, he or she goes to the supermarket and buys them. Unless he wants black matches, then he is out of luck. The decoupling point is right at the end of the production process. The production of a car, however, comprises standard parts and customized parts. Customers can still state that they want a specific colour, leather upholstery or different wheel rims. The decoupling point lies somewhere in the middle of the production process. “Similarly, in the production of a report, dashboard, or analytical environment, the decoupling point lies somewhere in that middle area,” Damhof explains. The decoupling point divides the production process into two parts: a push and a pull side, also referred to as a supply-driven and a demand-driven part. Push systems are aimed at achieving eco

The Data Push Pull Point Push/Supply/Source driven

• Mass deployment • Control > Agility • Repeatable & predictable processes • Standardized processes • High level of automation • Relatively high IT/Data expertise

All facts, fully temporal

Pull/Demand/Product driven

• Piece deployment • Agility > Control • User-friendliness • Relatively low IT expertise • Domain expertise essential

Truth, Interpretation, Context

Business Rules Downstream

INTERVIEW

“A quote I have stolen from Gartner analyst Frank Buytendijk: in an average organization the car park or art collection is better managed than data” Ronald Damhof nomies of scale as volume and demand increase, while the quality of the product and the associated data remains guaranteed. On the other hand there are pull systems which are demand-driven. Diffe rent types of users want to work the data to produce ‘their’ product, their truth, on the basis of their own expertise and context.

Opportunistic or systematic development? On the y-axis Damhof projects the development style dimension. “By that I mean: how do you develop an information product? You can do so systematically; the user and the developer are then two different people and you apply defensive governance, aimed at control and compliance. This puts into practice everything that engineers have learned in order to create software on a sound basis. You often see this in centralized, enterprise-wide data, such as financial data and data which is reported to regulators.” You can also use an opportunistic development style. “In that case the developer and the user are often one and the same person. Take for example the data scientist who wants to innovate with data, who wants to produce and test analytical models. Or situations in which speed of delivery is essential. The governance in these cases is offensive, which means the focus is on flexibility and adaptability.”

The Development Style Systematic

• User and developer are separated • Defensive Governance; focus on control and compliance • Strong focus on non-functionals; auditability, robustness, traceability, …. • Centralised and organisation-wide information domain • Configured and controlled deployment environment (dev/tst/acc/prod)

• User and developer are the same person or closely related • Offensive governance; focus on adaptability & agility • Decentralised, personal/workgroup/department/theme information domain • All deployment is done in production

Opportunistic

INTERVIEW

Data Quadrant Model The combination of these two dimensions produces the following picture.

A Data Deployment Quadrant Push/Supply/Source driven

Data Push/ Pull Point

Pull/Demand/Product driven

Systematic I

Facts

Context

Development Style

“Shadow IT, Incubation, Ad-hoc, Once off”

III

Research, Innovation & Prototyping Design

Opportunistic

“Quadrant I is where you find the hard facts,” Damhof explains. “This data can be supplied intelligibly to quadrants II and IV in its full, raw volume. Data in quadrant I is produced by highly standardized systems and processes, so it is entirely predictable and repeatable.” Diagonally opposite, in quadrant IV, is data that is characterized by innovation and prototyping. “This is the quadrant in which the data scientists work, who actually have only three demands: data, computer power, and cool software.” Increasingly, separate departments are set up as innovation labs giving data scientists free rein to use the data for experimentation and analysis, with the aim of innovation. “You need this type of data management to discover and test good ideas. When a concept works, it needs to be raised from the fourth to the second quadrant, because you can only achieve economies of scale with data if you can generate and analyse it systematically. You can then use it enterprise-wide. “I often talk to data scientists who obtain very sound insights in a kind of sandbox environment,” Damhof continues. “But they forget or are unable to monetize those insights in a production situation. They cannot bring their insights from quadrant IV to quadrant II. This is where governance comes into play.” And therein lies the major challenge for many organizations, as Damhof knows only too well. “If you explain this model to managers and ask where their priority lies, they will all say they first have to get their foundations in order, the first quadrant. But if you ask what they are investing their

INTERVIEW “With organizations generating ever greater volumes of data, they can no longer be so slapdash in the way they handle it. Now is the time to make sure your data management and the associated governance are properly set up. The Data Quadrant Model helps you to achieve this” Ronald Damhof

money in right now, where they are innovating, it is often in the fourth quadrant. It is great that they are engaged in this more experimental and exploratory form of data management, but that is only possible if your foundations are right. Otherwise it is like having a hypermodern toilet that is not connected to the sewer system, so it turns into a total mess.” Ask the average data scientist what takes up most of his or her time and he or she will answer getting the data to the right qualitative level: the aim of quadrant 1. “Only a data scientist with powerful analytical software, a lot of computer power, and high-quality data will genuinely make a difference.”

Reliability versus flexibility “Managers insist that systems must be reliable and flexible, but these qualities are inversely related. A highly reliable and robust system is less flexible. And in an extremely flexible system it is necessary to lower the requirements with regard to reliability,” says Damhof. “The Data Quadrant Model makes this clear to managers. In quadrant I reliability takes precedence over flexibility and in quadrants II and IV flexibility takes precedence over reliability.” Quite a few different types of expertise and competence are therefore required in order to make optimum use of data.

Expertise and competences You often find that organizations require a single person to supply expertise and competences which cover the entire quadrant. Such people do not exist. Employees in quadrant I have an engineering profile. They are information and data engineers, trained in data architecture and data modelling. “Note that this is not the classic IT profile. These are engineers who can carry out model-driven development and have a solid understanding of the need for conceptual and logical modelling.” This expertise is very scarce. Quadrants II and IV on the opposite side require people with expertise in the respective business domain supplemented by Business Intelligence and/or analytical competences.

Facts and truth Damhof also calls quadrant I of the model ‘the single version of the facts’. Those facts are then made available to employees in quadrants II and IV. That enables them to create their own thuths. Since the same facts are used to create multiple truths in the right-hand half of the model – depending on the

“People in the business world often talk about ‘the single version of the truth,’ but there is no such thing. There is a ‘single version of the facts’ and there are multiple ‘truths’. After all, how you interpret facts depends on the type of organization, your outlook, background knowledge, and experiences” Ronald Damhof

context and background of the data user – Damhof calls this half ‘the multiple version of the truth’. You should bear in mind that the ‘truth’ quite often changes over time. “You often hear companies talking about ‘the single version of the truth,’ but there is no such thing. After all, how you interpret particular facts depends on the context, your outlook, background knowledge, and experiences.”

Quadrant III So far, Quadrant III has received little mention, even though it is incredibly important. It is the quadrant of data sources which are not under governance, like an ad hoc download which you obtain from an open data provider, a list in Excel that you want to use, or a set of verification data which you have received on a CD. “You may even want to combine governed data from quadrant I with your own dataset in quadrant IV, that’s fine,” says Damhof.

The journey through the quadrants In order to get value from data, you can make various movements in the model. You can move from fact-based data management towards a model in which the context is also important (from quadrant I to II). “This actually is the classic journey of ‘unlock data and produce an information product,’” says Damhof. This is often inefficient, however, because this process is based on known requirements and wishes on the part of the user. “And the user does not really have that knowledge in advance.” Many organizations opt for a more agile-driven form, such as from quadrant I to quadrant IV to quadrant II. Have the employees in quadrant IV produce an information product in an iterative way using the data in quadrant I/III. You then promote the product to quadrant II only if it is important to bring this under management.

INTERVIEW

It is also possible to move from quadrant III to quadrant IV. “You have your own datasets and you want to try something? Great,” says Damhof. The only movement an organization must never make is from quadrant III to quadrant II. “Because in that case you use data that you are not entirely sure of, as it has not been subjected to good governance in the required way. An example is a compliance report for the regulator which you want to produce using data which is not under governance. You should not seek to do that.”

How we produce, process variants Push/Supply/Source driven

DataPush/ Pull Point

Pull/Demand/Product driven

Systematic I

Facts

Context

Development Style

“Shadow IT, Incubation, Ad-hoc, Once off”

III

Research, Innovation & Design

Opportunistic

Make data management a live issue for discussion In his day-to-day work Damhof finds that his Data Quadrant Model helps organizations to talk about data management. “From my current customer, De Nederlandsche Bank, I regularly hear statements such as, ‘I want to move this data product from quadrant IV to quadrant II;’ or, ‘We must put the data in quadrant I first, but the submitter is really responsible for the data in quadrant I;’ or, ‘I want some space to store data temporarily in quadrant III.’ Everyone understands what it means. That is new; the organization has never thought about data in that way before. And that actually applies to almost every data-intensive company. Organizations have long spoken of data as an ‘asset,’ but in practice they handle data in a very unstructured way. As a result they never monetize that asset. With orga nizations generating ever greater volumes of data, they can no longer be so slapdash in the way they handle it. Now is the time to make sure your data management is properly set up. The Data Quadrant Model will help you to achieve this.” ■

CASE

Caroline Denil

PROJECT MANAGER BELGIAN FEDERAL PUBLIC SERVICE

Immediate access to easily comprehensible data

IMPROVING INSIGHT INTO BELGIUMâ&#x20AC;&#x2122;S ECONOMIC SITUATION 32

CASE

Vincent Vanesse

BUSINESS ANALYST BELGIAN FEDERAL PUBLIC SERVICE

The Belgian Federal Public Service (FPS) Economy committed itself to creating a more powerful and transparent presentation of the Belgian economic situation for the general public, statisticians, and university students, among many others. Together with SAS, it created a single web portal that offers visitors direct access to the principal indicators of the Belgian economic situation.

CASE All indicators are visualized in graphs for better comprehension and are fully customizable so that users can immediately consult the indicators in which they are interested. The portal not only created a user-friendly statistical environment, it also opened up possibilities for new business opportunities within other Directorate Generals within the Belgian federal government.

Scattered information creates time-consuming research One of the main missions of the FPS Economy is the generation and publication of statistics and figures characterizing the Belgian economic situation. Until recently, this information was accessible through various sources: Statbel, be.Stat, Belgostat, and the National Bank of Belgium. In such a situation, it is difficult for students, researchers, journalists, and the many other users to find the required information to answer their specific questions and draw accurate conclusions. Hence, the FPS Economy initiated a project to improve the user-friendliness of economic data.

Multi-departmental collaboration improves statistics The first goal of the project was to increase the value of information. This proved to be an intense, but truly indispensible process bringing together FPS Economy business analysts and statisticians. The process led to the development of graphs depicting economical information, as well as metadata that users can consult to better understand the information being presented. “As a result, some twenty graphs were selected and then subdivided into eight categories, including among others, energy, gross domestic product, and consumer price index,” states Vincent Vanesse, Business Analyst at the FPS Economy.

“From now on, finding information on Belgium’s economic situation is easy. The Ecozoom tool on the FPS Economy website gives immediate access to the twenty main economic graphs” Caroline Denil

A single portal for all economic indicators Next, the FPS Economy teamed up with SAS in order to make the economic indicators accessible via a user-friendly tool. “We have been working with SAS for quite a long time now. As a result, we are thoroughly familiar with their competence. The exceptional statistical capabilities, robustness, and extendibility of their solutions made our choice of SAS for this particular project obvious,” notes Caroline Denil, Project Manager at the FPS Economy. The collaboration resulted in the launch of a single web portal (Ecozoom) where various users can find all of the economic indicators they need in just a few mouse clicks. “From now on, finding information on Belgium’s economic situation is easy,” observes Denil. “The Ecozoom tool on the FPS Economy

CASE

“Users can select the indicators they are most interested in and save this information. Each time they subsequently consult the tool, they will immediately start with their desired information” Vincent Vanesse

website gives immediate access to the twenty main economic graphs. Those who want more detailed information can still click-through to the traditional Statbel, be.Stat, and Belgostat websites.”

Visualization facilitates comprehension The online portal presents the economic indicators as graphs that make the information much easier to interpret quickly and accurately. Denil points out that deducing trends based on a graph is far easier than using a table or a long series of figures. In addition, the tool is able to visualize four graphs simultaneously. This facilitates comparisons between various types of data to verify the magnitude of the effect of, for instance, the number of company failures on the unemployment rate. The old adage that “a picture is worth a thousand words” certainly holds true for the FPS Economy confirms Denil. “Our graphs can often convey much more than a lengthy text or series of tables. In our specific situation, the graphs certainly help users to more easily and precisely evaluate the economic situation in Belgium.”

Customization enhances userfriendliness Vanesse is quick to point out that the four graphs that are depicted on the Ecozoom homepage are fully customizable. “Users can select the indicators they are most interested in and save this information. Each time they subsequently consult the tool, they will immediately start with their desired information.”

Opening up new opportunities Although the Ecozoom tool has considerably increased the userfriendliness of economic data, the FPS Economy is already looking into possibilities that will extend its userfriendliness even further. “We are currently testing geo-visualization in order to visualize data for specific Belgian regions,” illustrates Denil. “On top of that, we are also planning to make the tool accessible for mobile use on smartphones and tablets.” The Ecozoom tool might potentially even open up new business opportunities. “The tool has gene rated interest in other Directorate Generals, up to and including the top management level. This could intensify the collaboration between the various FPS, and even create a new type of service,” concludes Denil. ■

CASE

DSM introduces MDM, building on its successes with data quality

DSM GAINS CONTROL OF MASTER DATA DSM is convinced of the value of good data quality. The global sciencebased company that operates in the field of health, nutrition and materials has already implemented data quality successfully and is building on that success with SAS Master Data Management (MDM).

CASE

MDM is a method of managing business-critical data centrally for decentralized use. Errors and discrepancies in the so-called Master Data are tackled: items such as customer names, material types, suppliers and other data used across divisions and IT systems. Consistency in that critical business data plays a vital part in supporting efficient operations. MDM gives DSM control of the ERP systems which it has absorbed in the wake of major acquisitions over the past few years.

From state-owned mines to chemicals and the environment “We have 25,000 employees, a substantially higher number than five years ago due to acquisitions,” says Bart Geurts, Manager Master Data Shared Services at DSM. Geurts cites the acquisition of Roche Vitamins in 2003 as one of the major purchases. DSM is now the world’s largest vitamin maker, and that involves different data requirements. “Good data quality is extremely important, for food safety and health as well as flavour. It is less critical for bulk chemical products.” Geurts alludes to DSM’s origins in the state-owned mines of the Netherlands. “Old businesses know that they have to reinvent themselves in order to survive.” DSM has reinvented itself several times, from mining to petrochemicals, and in recent years from fine chemicals to life and material sciences. In its current form DSM focuses on emerging markets and climate & energy. Geurts cites lighter materials such as a replacement for steel in cars that reduce their weight and make them more economical. The group also develops products that are manufactured using enzymes rather than oil. These are different activities and different markets, so the company has different requirements in terms of company data.

“Good data quality is extremely important, for both health and safety” Bart Geurts

More complete organization overview The many acquisitions involved in this transformation brought not only new activities and people, but also many new IT systems. Geurts explains that these included a large number of ERP systems. In the new organization, the many different IT systems were found to contain errors. Not serious errors, but discrepancies which only came to light as a result of the combined use of data and systems. Geurts mentions the example of a staff celebration marking the launch of the new company logo. When sending out invitations to the company-wide event, 800 people were ‘forgotten’. This was due to an incomplete overview in the HR environment. And he says there were more inconsistencies. There was contamination of supplier data, for example. The same supplier may use different names in different countries, with the result that the various systems in use in a multinational may show it as different businesses.

CASE

Bart Geurts

MANAGER MASTER DATA SHARED SERVICES DSM

CASE

Linking MDM to business processes Building on previous experiences and results with data quality, DSM moved to a central MDM approach. Geurts says the business data is good enough for transactions taking place within a particular silo, such as a country or business division. “But as soon as operations become cross-divisional, problems are liable to emerge.” Leading ERP suppliers offer MDM solutions, Geurts says, but put too much focus on the individual silos. That is why DSM chose the MDM solution from SAS. Geurts stresses the importance of the link between MDM and the business processes. It highlights the benefit for the organization as a whole and for the individual divisions which operate efficiently within their respective silos. Key issues are who in the organization should be the owner of the MDM process, who plays what role, and which KPIs (key performance indicators) are used. A possible company-wide KPI for MDM is measuring how long it takes for one customer order to be processed, delivered and invoiced.

Think big, act small Establishing the MDM process and addressing the issues involved was the easiest part according to Geurts. He describes that as ‘devised on the sofa’. Then came the implementation phase, with the deliberate choice of a relatively small-scale start. “We conducted a pilot in the sourcing department based on the think big, act small precept.” The term ‘small’ needs to be put in context, however. Worldwide, DSM has six sourcing front offices and two back offices. In this small-scale pilot, the inconsistencies in supplier data were tackled first. The diverse vendor data, which included duplicates, was cleaned among other things by using different language algorithms in the SAS MDM product. “The complexity lies in the details,” says Geurts from experience.

What data is critical for your business? As well as tackling the contamination of supplier data, steps were taken to deal with other master data components. The offices concerned were asked to state which data was critical to their business. “Because we couldn’t analyse all the data in the business.” That would be too large an operation and place too heavy a burden on those systems. By answering the question which data was critical, the gap between the MDM initiative and the involved business units was bridged. After all, they themselves specified the selection of data that is crucial for their own processes. Such a selection is necessary due to the range of master data. DSM defines master data as everything that underlies processes and transactions. At first sight, any error can cause inefficiency, but the extent to which it actually does so depends on the type of data. “If the telephone number of a supplier’s salesperson is incorrect, you may be able to send an email instead,” Geurts explains. But no such back-up is available if a bank account number or a supplier’s address is incorrect.

Preventing errors On the basis of this data which the units defined as critical, data rules were drawn up. That took around six months, after which the implementation was completed in around three weeks. A clear benefit which MDM has delivered for DSM is the avoidance of errors. Geurts cites the example of an order entered in the name of the wrong department. DSM is also introducing an improvement in the inputting of supplier data, as people sometimes make errors when searching for an existing supplier

CASE

or entering a new one. If the search is unsuccessful, a new entry is created that is essentially a duplicate. An algorithm is now linked to the data input, which checks and then asks the person who enters the data: “Is this the supplier you are looking for?”

Master Data Management is a continuous process The above advantages of MDM relate to internal matters such as the staffing overview, supplier data deduplication and error prevention. But MDM also offers external benefits for DSM. “Suppose there’s an error in a product or in material. We want to know immediately which products are affected.” Speed is of the essence in such cases. It is also important to continue the data checks after the initial MDM implementation. “Keep on checking! Otherwise you’ll have new problems two or three months down the line,” warns Geurts. After all, MDM is a continuous process that remains active to prevent new errors that would have to be fixed later. “You don’t want that, because it would disrupt your business process.” Making sure that all the relevant people in the organization understand this is instrumental in ensuring success. ■

INTERVIEW

How Data Governance can facilitate future data mining

WHO IS YOUR DATA GOVERNOR? 42

INTERVIEW

The world of data goes much further than simply collecting and using it to make money. Only good quality data can make money and good quality data often entails good management. Not traditional IT management, but data governance. Do you already have data governance strategy in place? Who is your data governor?

An outsider could conclude that data quality and data governance amount to the same thing. This is not the case, however, even though there is a strong relationship between the two data disciplines. “Data quality involves the implementation of quality rules; data governance goes much further,” explains Bas Dudink, Data Management Expert at SAS Netherlands. “Who is responsible for the quality of data? Which responsibilities are involved? What agreements have been made?”

Where does your data come from? Data quality concerns the accuracy of postal addresses and databases, for example. In order to ensure lasting quality improvements in this area, agreements will need to be made and enforced. As such, data quality can be implemented as a component of data governance, but the two are not inextricably linked. Data governance can come from various directions. It may be based on a wider need felt by the organization, or it could be required by legislation and regulations. Dudink gives the Basel agreement and the ensuing regulations for financial institutions as an example. Banks are now required to answer the question: “Where does your data come from?” In practice, the same applies to factories, which apply or are required to apply certain standards for the materials they use.

Metadata, file formats and standards Data governance goes further than the origin of data. It also encompasses metadata and standards for the formats in which data are delivered, processed and passed on to other organizations, including external partners, internal departments, as well as future customers. It could apply to information applications that are presently unknown or unseen. As such, data governance transcends departments and business processes. Data management is now still often encapsulated in the silo of a particular business activity or division. The use of the data is based on the current state of affairs. “The primary, everyday processes usually work just fine,” Dudink describes the practical situation. However, modifications have to be made to enable new activities or company-wide functions such as risk management.

Management consultancy rather than technology Good data governance mainly concerns non-IT issues such as business processes, organizational procedures, working agreements and the enforcement thereof. There is a security component too: who has access to which data and how is it protected? “It’s more like management consultancy: drafting procedures,” the SAS expert explains. He estimates the technology component to be a modest 10 to 20 percent of the total work of a data governance project.

INTERVIEW

The key question is how an organization treats its data, for example in order to manage risks. This is a matter that needs to be arranged for the whole organization, not just a single activity or division. Good data governance takes account of the future, so that any future customers can easily be supplied using consistent data, for example for new partnerships or new business activities. It would seem logical to link this to data quality and this can indeed produce longer term advantages. However, solely improving data quality without involving data governance can reduce the project to an incidental, short-term improvement. It will then turn into an operation that either does not produce a lasting result, or one that has to be constantly repeated to get a result. A data governor can prevent this from happening.

Taking shortcuts to achieve KPIs If an employee, supervisor or department is held accountable for certain targets then they will obviously focus on these. If these targets involve responsibility for an activity, but no accountability, then obviously this activity will not be given priority when things are busy or resources are limited. Such logical business and career decisions are not always good for the company. Take the example of the call centre. Their KPI was their customers’ waiting time in the queue. In order to keep these scores high during a particularly busy Christmas period, they decided to skip a number of fields in the CRM system, ‘only until things have calmed down again’. Is this a smart shortcut to get results, or a lax attitude to the company’s business processes? If no consequences are attached to this behavior, there is a risk that this measure will become permanent sooner or later. After all, it saves time and so boosts the performance of the relevant department and its manager. Even if this somber scenario does not come to pass, the damage has already been done. Because who is going to check and possibly enter the missing data afterwards? This means there is a gap; nuggets are missing from the data treasure chest. Good data governance would have prevented the organization from making this error in its own call center.

Address errors upstream Another practical example is the invoicing and payment process of a healthcare institution. There was a long delay between the receipt of the invoice and payment, and this period was getting even longer. An investigation revealed the cause: the invoices contained erroneous transactions. The health insurer exposed these errors, and so the healthcare institution became embroiled in the time-consuming task of repairing them. Every invoice had to be double-checked, which substantially delayed payment. The institution decided to tackle the problem upstream rather than fixing the erroneous invoices downstream, after they had already been generated. Now every patient-related medical procedure is subject to various data quality tests, and the healthcare professional that enters the data is given direct feedback so they can fix any mistakes immediately. The result is that the payment term has been reduced considerably. An additional benefit is that management reports, analyses and the relationship with the health insurer have all improved.

INTERVIEW

“If you invest in improvements upstream, you will get more than your money back downstream. But if this so obvious, why do we start by building complex data warehouses as purification plants, rather than purifying the water at the source?” Bas Dudink

Get more than your money back It’s actually a childishly simple, universal principle: if you invest in improvements upstream, you will get more than your money back downstream. But if this so obvious, why is it so rarely applied to data management? Why do we start by building complex data warehouses as purification plants, rather than purifying the water at the source? Data governance can prevent gaps occurring in the data treasure chest and the resulting time-consuming task of repairing the errors. So how should you implement data governance? The implementation of data governance is like a tap dance, says Dudink. Step one is generally data quality, because this is the source of the problem. But even before that first step, awareness needs to be raised within the organization. If the awareness is there, and data quality is improved, then you still require insight into the broader perspective. “Data quality is a step-by-step process,” continues the SAS consultant. A holistic approach to data governance is recommended. It is, after all, about so much more than ‘simply’ IT and data management.

The timeliness of value So data governance is not easy to implement. “It’s a complex process,” says Dudink. It’s theory versus practice. In theory, data is always important; both in terms of quality and the control thereof. In practice, this importance is not always reflected throughout the organization. It has not always been identified who is the ‘governor’ of which data. This does not only concern the responsibility for the data, but also the value that is attached to it. Or rather: whether the value of the data is recognized in time. “Imagine your roof has a leak while the sun is shining,” Dudink explains the timeliness aspect. The leak becomes a problem when it starts raining, but by that time it’s too late. Data governance is a complex affair; a company-wide operation that transcends IT and affects matters such as business operations, the corporate culture and Human Resource Management. But in the end, good data governance offers major company-wide advantages in today’s data-driven world. ■

COLUMN

THE NEW IT

Jill Dyché

VICE PRESIDENT SAS BEST PRACTICES

Someone once classified the world into two types of people: those who like categorizing people into two types and those who don’t. I used to be one of those people, the kind that saw executives as either business-focused or technology-focused.

I’ve noticed other people have this tendency too. It doesn’t matter whether I’m talking to clients about analytics, CRM, data, or digital, the question always comes up: “Who should own that, the business or IT?” The question of ownership pervades the day-to-day at companies worldwide. It seems everyone is focused on everyone else—who should own it? Who should manage it? Who should take credit for it? Who should fund it?

But in watching the companies that were effective at bridging the proverbial business-IT divide, I noticed three common traits: »»

The successful companies had leaders who realized that appointing people or changing orga nizational structures wasn’t enough. New ways of doing business were key, and new processes needed to be practiced in order to ensure change adoption.

COLUMN

»»

These companies met their cultures where they were, working within the strictures of top-down or bottom-up and ensuring that these new processes and rules of engagement were new enough to be compelling but not so disruptive that they would encourage inertia or sabotage. Leaders at these companies didn’t embrace these changes for their own sake. Rather they were (and are) considering how trends like digital business are forcing fresh approaches to longstanding business functions.

Using the trend of the digital business and innovation as the key drivers for make-or-break changes to IT, I wrote about practices that successful leaders have embraced to not only transform IT, but to leverage technology in new ways for business benefit. ‘The New IT: How Business Leaders are Enabling Strategy in the Digital Age’ features change agents who have emerged from the trenches to tell their stories.

It’s no longer about business versus IT. Rather, it’s about business enabling IT What I’ve learned from these leaders is what I write about in the book, including: »» »» »» »» »» »» »» »»

If your IT only has two speeds, you’re in big trouble. The question “What type of CIO are you?” misses the point. The real question is, “What type of organization are you leading, and what should it look like?” Collaborating by getting everyone in a room isn’t good enough anymore. (In fact, it’s dange rous.) Corporate strategy and IT strategy can be aligned on one page. Hierarchy is being replaced with holocracy, homogeneity with diversity. Innovation shouldn’t be run by an elite SWAT team in a separate building with sushi lunches and ergonomic desk chairs. Everyone should be invited to innovate! More people are talking about digital than doing it. Except maybe for you, if you can circumscribe digital delivery. You don’t have to be in Silicon Valley to join the revolution. In fact you might not want to be!

The leaders profiled in ‘The New IT’ - including leaders from Medtronic, Union Bank, Men’s Wearhouse, Swedish Health, Principal Financial, and Brooks Brothers, to name a few - have shown that it’s no longer about business versus IT. Rather, it’s about business enabling IT. And vice versa. ■

CASE

Credit insurance company integrates quality ratio’s in risk assessment

CRÉDITO Y CAUCIÓN ADDS DATA QUALITY TO ITS MANAGEMENT MODEL

The ability to assess the risk that invoices are not paid by the customer is of vital importance to credit insurance companies. But what if you cannot rely on accurate information? At Crédito y Caución, data quality spearheads the implementation of long-term strategies. A look in their kitchen.

Crédito y Caución is the leading domestic and export credit insurance company in Spain and has held this position since its founding in 1929. With a market share in Spain of nearly 60 percent, for over 80 years the company has contributed to the growth of businesses, protecting them from payment risks associated with credit sales of goods and services. Since 2008 Crédito y Caución is the operator of the Atradius Group in Spain, Portugal and Brazil. Crédito y Caución’s insurance policies guarantee that its clients will be paid for invoices issued during their trading operations. The corporate risk analysis system offered by Crédito y Caución processes more than 100 million company records updated on an ongoing basis. It carries out continuous monitoring of the solvency performance of the insureds’ client portfolio. Its systems study more than 10,000 credit transactions every day, setting a solvency limit for the client which is binding on the company. To determine that risk, Crédito y Caución requires comprehensive and accurate information on its clients’ clients. The efficiency of the service that Crédito y Caución provides to its clients largely depends on the quality of the data contained in that information.

CASE

Adapting to new regulations Like all European insurance companies, Crédito y Caución must comply with the risk-based supervisory framework for the insurance industry. The framework consists of the Solvency II Directive and its Implementing Technical Standards and Guidelines. Besides financial requirements, Solvency II includes requirements on the quality of the data handled by insurance companies. Under Solvency II, the accuracy of the information is no longer optional. Data quality is essential for decision-making and for certifying compliance with the requirements of the new regulatory framework. Crédito y Caución has approached its Solvency II compliance by pursuing a strategic vision that reaches far beyond the contents of the EU directive. “Information is our greatest asset,” says Miguel Angel Serantes, IT Development Manager at Crédito y Caución. “We are experts in locating it, storing it and analysing it, as well as obtaining business intelligence from this information for our activities. The challenge posed by Solvency II created the opportunity to incorporate quality ratios into the information management and to integrate these into our procedures. We do not simply meet the requirements, but rather we are committed to instilling the highest quality in all our data management operations. We have transformed Solvency II into a company process, integrating it into our business intelligence environment.”

The first step: assessing data quality The first step in taking on this challenge required performing an assessment of the quality of the data handled by Crédito y Caución. “We took advantage of the assessment option provided by the SAS solution, to perform an analysis of the quality of our own data,” adds Serantes. “The results showed us that there was still a way to go. Keep in mind that much of our data such as company names, phone numbers and tax codes come from public, third-party sources with varying degrees of inaccuracy. From there we decided to design and implement a data management quality model and opted for SAS, which we integrated into our management system.” Miguel Ángel Serantes and his team developed the foundations for the data management policy of the company, by establishing the essential criteria to be met by the data managed: it had to be accurate, complete and appropriate to the operations of the company. They determined the various data owner levels that would be responsible for its content, definition, use and administration. They established compliance ratios for each category of data, so that it would be possible, through a system of indicators, to obtain an immediate overview of the quality level of each piece of data.

A constantly evolving process “We decided on SAS for various reasons,” says Serantes. “SAS has been our information management solutions provider for many years. The relationship is very smooth. They have a resident team at Crédito y Caución that works closely with our IT department. All of this has aided us in the efficient integration of SAS Data Management into our information management system. It is a solution that fits our needs. It makes it possible to set criteria and attributes to define the quality of data; it has options for its assessment; it identifies problems with quality and helps to resolve inaccuracies. The solution aids the implementation of long-term strategies and enables permanent monitoring of the quality of our data.”

CASE

Miguel Angel Serantes

“Information is our most important asset” Miguel Angel Serantes

IT DEVELOPMENT MANAGER AT CRÉDITO Y CAUCIÓN

The deployment of the data quality control system at Crédito y Caución took around a year. Although, as Serantes says, it is a never-ending process. Data quality control is constantly evolving. The advanta ges and benefits of this strategy and the technology solution implemented have been obvious from the outset. “For starters, we have a data policy that is well-defined and known throughout the company. We know what to do with the data and who is responsible for each area of data. We know where the weaknesses are in the data and how to correct them. In addition, SAS gives us information on the cause of inaccuracies in the data. We expect to obtain further qualitative benefits, such as the definition of quality objectives for each piece of data. This allows us to focus on those controls that are relevant to the business.” The aim of Crédito y Caución is to achieve 100 percent quality in the management of the data gene rated by the company itself, over which it is able to establish rigorous controls. For data from external sources over which Crédito y Caución has no direct control, the aim is to establish standardization criteria in order to achieve maximum quality. ■

INTERVIEW

Data usefulness hinges on quality

DATA DEPENDENT ON QUALITY IMPROVEMENTS Time is money, and nowadays data is money too. Data is worth its weight in gold, so it’s odd that the quality of data is so often overlooked. In light of the importance of data today, including metadata and Big Data, you would think that data quality should no longer be an issue. Surely this is selfexplanatory? “You would think so,” replies Bas Dudink, Data Management Expert with SAS Netherlands. “But data quality is poor by definition.”

Dudink knows why the data that businesses collect and use is often of such poor quality. It is because everybody uses the same data. This may sound contradictory, but it’s not. On the one hand, it means that poor quality data is continually used and reused, complete with errors, omissions and a lack of context. Dudink is adamant on the latter point: “Data without a context is useless.” On the other hand, using the same data for everybody entails that the same data source is used for different uses and purposes. This may sound like a sensible approach for efficient data administration and entry, but different applications require different quality levels of different data fields. This con cerns IT applications and systems, as well as the use of data in various business activities.

The devil is in the details If you are sending an order to a customer you will need address details but no bank account number. If you are paying invoices, it’s the other way around. This is a simple example, with data fields that

INTERVIEW

are relatively recognizable and carefully administrated – or, in any case, they should be. It becomes more complicated when smaller details are involved that can still have large consequences. Think of deliveries between companies with several divisions, whereby the business units buy and sell servi ces from each other. Subtle sales and/or purchasing discrepancies can result in major variations. Does the one division or subsidiary offer a special discount, or does it have different delivery and payment conditions? The data on prices, conditions and terms can vary and, as such, represent poor quality.

The danger of ignorance Dudink puts his bold statement, that data quality is poor by definition, into perspective: “It’s all in the eye of the beholder. What’s good enough for one person may not be good enough for someone else. It’s about awareness.” The fact that the data are not in good order is not the worst part. The problem is being unaware of this. Dudink quotes a well-known saying: “Ignorance is bliss.” The biggest danger is when you think your data are accurate and up to date and take action and make decisions on this basis. An additional problem is that many companies mistakenly think that they are not data compa nies. In fact, in these days, every company is a data company. This even applies to the seemingly sim ple case of a production company that processes physical goods and turns them into new products. Both its manufacturing process and production line design are based on data!

More of a business issue than an IT matter Design is usually followed by optimization on the basis of practical experience. The data accrued along the way will be important if the company wants to open a new factory or optimize an existing pro duction facility. Repeatability and improvement, summarizes Dudink, are core actions that depend on good data. And good data leads to good business processes. Only then is automation at the business level pos sible. This requires awareness of data quality within the organization. “They have to know that there is something they don’t know.” IT suppliers that offer data solutions are thus more involved in the consultancy business than they are in software. “We don’t only provide software, we also provide advice.”

More than just rubbing out errors Still, the early stages of data improvement have little to do with tools and systems. It is mainly a question of analysis: what data is concerned, how did the company acquire it, who provided it and how? These are quite fundamental questions that could affect established processes and procedures, or even partners. For who dares to openly complain that a supplier or customer has provided poor data? Yet this is quite often the case. “If the data is erroneous, then it’s not sufficient to just fix the errors. You need to comb through the entire process: where was the error generated?” All too often, data quality is seen as a way to rub out errors. This is sometimes egocentric, stemming from an organization’s wish to sell itself as a well-oiled machine. Dudink calls this “keeping up appearances”, but the right approach to data quality goes much further and much deeper.

INTERVIEW

“What’s good enough for one person may not be good enough for someone else. It’s about awareness” Bas Dudink

The need for a champion Data quality is not very popular in practice. The shining examples in data quality improvement that Dudink can think of have been forced into it. “Thanks to regulations. If data quality is not compulsory, then most people would rather leave things as they are.” There is a concrete argument behind the rules and laws that make better data compulsory: risk management. It will be no surprise that banks and financial institutions, including insurers, are among the frontrunners. Without the regulatory stick, an organization needs to have a data quality champion. “Someone who can see what information is going to waste and who sees the added value in doing something about it.” The difficulty is that such a person has to have a broad overview of the organization as well as the insight and power to implement the change. “Preferably a CIO,” Dudink summarizes succinctly.

Basic tip: start small Despite the necessity of having someone in a senior management position to push through data quality improvements, it is still advisable to start with a small project. ‘Small’ is, of course, a relative term, depending on the size of the organization. Dudink recommends that the first project is used to germinate the idea of data quality improvement; as a breeding ground and learning school. The highplaced manager that initiates the project will need to combine a top-down vision with a bottom-up approach. This combination is required to achieve the desired improvement in business data quality. “I think CIOs will see this too.” ■

INTERVIEW

Fast, faster, fastest; that is the motto of these modern times. IT has made things faster, but now the flood of data is threatening to inundate us. Although thorough data analysis can be time-consuming, Event Stream Processing (ESP) and Decision Management provide a way to make it faster and more efficient.

Analysing data streams for operational decision-making

DATA-DRIVEN DECISIONS MAKE THE DIFFERENCE

INTERVIEW We are generating, storing and combining more and more data and we want to conduct ever more complex data analyses. “In certain environments, the sheer volumes of data can be overwhelming,” says Andrew Pease, Principal Business Solutions Manager with SAS Belgium’s Analytical Platform Center of Excellence. “In some cases, it may not be practical to store it all, despite the declining costs of data storage.” And there is more than just the storage costs at stake. Analysis costs can be prohibitive as well, especially if the analysis takes place retrospectively. In some cases, the analysis will come too late to be of any use. The solution is to filter real-time data by means of Event Stream Processing and then apply an automated decision-making process using Decision Management functionality. This automation is carried out on the basis of business processes which have a built-in link to the tactical and strategic decision-making levels. Decisions that affect the work floor or even direct interfaces with the customer can now be taken automatically while keeping in line with the organization’s strategic goals.

Trends and anomalies on drilling platforms Pease provides an example where ESP can be of use: sensor data input on drilling platforms. Some of these sensors detect vibrations, which can be used to identify anomalies. Anomalies can point to problems, so it is important to get the results of the data analysis fast. A key aspect here is focus and scope. “You need to focus on what you want to know beforehand.” The SAS expert explains that it would be pointless to analyse the vibrations every second or every minute. A fixed time range per hour may be sufficient. The main thing is that trends and anomalies are identified, so that prognoses can be made. ESP can process both scoring code generated by po werful predictive analytics as well as expert-written business rules to allow for these kinds of analyses to determine if pre-emptive maintenance is warranted.

Mobile plans and extra cell towers Another example is the use of call data by telecom operators. “Telecom operators sit on huge stockpiles of data,” explains Pease. Each telephone conversation involves some 20 to 30 call detail records. “But they aren’t all important for analysis.” However, the analysis of the right pieces of data can reveal a lot of important information. The obvious example is the clearly useful data on the timing of any customer’s call, so you can offer a better plan. Calling behaviour need not be tracked extremely closely; it will usually be sufficient to identify the general patterns. “If the customer pays too much for too long, the risk that he or she switches to a new operator will increase.” Further, such analysis can indicate gaps in network coverage. The operator may also decide to install extra cell towers on the basis of the number of dropped calls. By analysing the dropped calls, the telecom operator can accurately identify the gaps in coverage or capacity. This means they can erect an extra cell tower in the exact location where the most dropped calls are.

Event Stream Processing The added value of Event Stream Processing is the ability to monitor and analyze large quantities of, for example, transaction or sensor data in real time, which results in immediate reactions to specific situations or the required interference.

57 61

INTERVIEW

Catching stock exchange swindlers red-handed The financial world works at a faster pace than most other sectors. Stock trading is automated and fast as lightning. In this fast-paced world, ESP can be used to detect stock market fraud early on, says Pease. The speed of the trading systems makes it impossible to store all the data that goes through them, let alone analyse all these data streams. “The trick is to filter out the anomalies; the suspicious transactions.” Pease mentions a book that was released this year called ‘Flash Boys’ which is about the so-called flash traders or high-frequency traders. Author Michael Lewis paints a very sombre picture of the international money markets, whereby the speed of data is of the essence. The faster the data, the better the price of a purchase or sale. Although this nonfiction book has also been criticized, it does contain many interesting lessons and truths. Pease relates the story of a Canadian trader’s sly trick. He split up his share trading activities and found a second server that was some two kilometres closer to the stock exchange. He then set up a perfectly timed delay so that his various orders arrived at different clearing houses at exactly the same time. This enabled him to trade in bulk while ensuring that his sales did not lead to a lower stock price and hence lower yields. Cunning and almost undetectable.

Keeping up with fast flowing data streams “These days, some data streams flow so fast that it is impossible to analyse all the data they contain,” says Pease. This is because it takes time to process the data. Just as falling storage costs do not completely compensate for the data explosion, the advances in computing power cannot always keep up with the analysis requirements. The cunning share trading trick using server timing is almost impossible to detect, unless ESP is deployed. This is because ESP turns fast streams of data into reasonably bite-size chunks. There is a rising demand for fast, real-time analysis of data. This is because of the new applications, increasing numbers of sensors that collect data and an increasing range of business models that all depend on reliable data analysis. Where the quality of the analysis used to be paramount, speed has become equally important. The developments affect the financial and telecoms sectors and the manufacturing industry the most, Pease explains.

Decision Management Decision Management aims to improve organizations by enabling faster, smarter (operational) decisions. This can be done by utilizing all available data in combination with automated application of analytical models and derived business rules, weighing the time required and possible risks.

INTERVIEW

“Where the quality of the analysis used to be paramount, speed has become equally important” Andrew Pease

Beer brewing and the right cheese with your wine Manufacturing companies are also installing more and more sensors in their production environments. This is obviously an advantage for high-grade manufacturing of technologically complex products, but other applications are being found as well. Beer brewers, for example, use sensors to detect wet hops so the processing of the raw material can be timed on the basis of the sensor readings. Moreover, it also enables them to schedule maintenance of the kettles more efficiently. ESP is also implemented in other sectors, such as retail. The same applies to Decision Management. Pease points to the interesting opportunities for scanner use by supermarket customers. If a customer puts a bottle of wine in his trolley, then why not suggest a cheese to pair with this wine? The supermarket could even offer a discount to make it more attractive. The IT system knows that there is currently a surplus of cheese that will spoil in an x number of days, so it’s worthwhile selling it at a lower price. This does require a reliable stock inventory system, including information on use by dates. It is not the terabytes that count, but the speed: this all has to happen before the customer reaches the checkout. “You need to make the right offer at the right time.”

The business strategy is paramount The offer to the customer must also harmonize with the organization’s business strategy. The core of Decision Management is to centralize and standardize this strategy. Every decision must concur with the business strategy. Is a discount offered by the marketing department in line with the policy of the risk management department? If a customer buys in bulk, then the discount could be increased. This also has to do with the longer term value that such a customer can have for the company. Clearly defined business rules are required if this is to be managed adequately. “The business rules need to be entered and administered from a single central location.” The rules also need to be consistently applied across the organization’s systems and activities. This is the only way to guarantee that the organization’s various silos can work as a whole in order to facilitate effective Decision Management. ■

CASE

Data can seem deceptively simple. Take a basic data entity such as a supplier: the company name may vary per country, business process and/ or activity. Master Data Management (MDM) is used to resolve these discrepancies and can prevent a lot of unnecessary work and follow-up.

CASE

Getting a grip on your data

MASTER DATA MANAGEMENT AS A FOUNDATION OF YOUR BUSINESS It seems only logical that supplier data are accurate and up to date. After all, suppliers are important business partners; goods and services are received from them and payments are made to them. But still the data on such entities often leaves much to be desired. Alongside simple data entry errors and data corruption, there is another cause of data disparities: data entry differences, potentially across different systems.

Supplier A = Supplier 1 (= B2) One business system may refer to ‘Supplier A’, while a back-end application may use the descriptor ‘Supplier 1’. Automatically finding and replacing ‘A’ with a ‘1’ or vice versa seems the obvious solution. The problem is that this can lead to unwanted side effects, such as references in the relevant systems or from other applications which then cease to work. Then there is the additional risk of historical discrepancies occurring. For example, an invoice sent by Supplier A has gone missing. The reason is that it has been renamed. In theory, an interlayer could exist that has monitored the renaming process and is able to trace it back to its origin. This leaves the question of the impact on the system performance, because Supplier A and 1 is really too simple an example.

61 65

CASE

“Some companies have as many as 50 ERP systems. This means that the Master Data can be stored in 50 different ways and used in 50 different ways” Bas Dudink

Discrepancies due to uncontrolled growth Discrepancies in these so-called Master Data are typically caused by the uncontrolled growth of IT systems, as Bas Dudink, Data Management Expert at SAS Netherlands, has learned from experience. The growth is often driven by the ERP systems, the critical business environments for resource planning. These complex, extensive systems can fulfil their invaluable roles only after a lengthy implementation project. This means that they will only be expanded, customized or phased out if it is really necessary. The explosive growth of IT systems can be part of an organic process, but it could also be the consequence of takeovers. Company X buys company Y and Suppliers A, 1, B2, etc. encounter each other in the combined IT environment of the newly formed division. It is often not possible to completely integrate all systems, or it may concern a megaproject that is scheduled to happen ‘later on’, after the impact of the takeover has been absorbed and after the business opportunities have been seized.

As many as 50 ERP systems “Some companies have as many as 50 ERP systems,” explains Dudink. “For various countries, for various business processes, et cetera.” This means that the Master Data can be stored in 50 different ways and used in 50 different ways. These are not theoretical differences: Dudink sees this in practice all too often.

CASE

Initial steps: centralization and deduplication Dudink recommends involving the people and departments who will reap the benefits of MDM at an early stage. This will help make the value of the project tangible. MDM can have an enormous impact, because those 50 different forms of data storage in 50 ERP systems cause a great deal of inefficiency. The first step is centralized implementation, which leads to considerable administrative cost savings. You would think the logical next step would be to make the data available to the wider organization, but it is not time for that yet. First the second, essential step in MDM must be taken. The second step is data deduplication. This entails finding a supplier who has two different labels in the IT environment and removing one of these, for example. This is how to make sure that Master Data discrepancies cannot lead to an invoice getting lost for three months. Inconceivable? It actually happened to a major corporation. An invoice for a considerable amount of money had simply ended up in the wrong tray – except that no one knew which ‘tray’ in the immense digital environment the invoice was in.

Who is allowed to change what? Discrepancies are a fact of life but they do need to be prevented as much as possible. This could be considered step three of the MDM implementation: prevention of new data discrepancies. The key question is: who controls the Master Data? Who has the authority to change a supplier’s bank account number? The ownership of the data in relation to its use is important. Who owns the data that were entered in the Netherlands and which are primarily used for business in the United States? The identification of data ownership and use should lead to a logical review of that ownership. Once cleaned up, the Master Data will be more deployable for the divisions and business processes that benefit the most. The data will initially be deployed from the (now clean) central storage location and can then be made available to the ‘lower’ levels of the organization.

MDM is more than detecting errors Good Master Data Management should lead to more than just the detection and resolution of errors. It can also be used to detect and prevent fraud, such as the so-called ‘ghost invoices’. These invoices are sent by swindlers and made to appear as if they have been sent by bona fide suppliers. MDM prevents these swindlers from slipping through the cracks in the system. Discrepancies, errors and fraud can occur because companies and their systems have become more and more complex. When a supplier or customer registers for the first time, the transaction will normally take place without any problems. This changes after several years and thousands of transactions, processed by hundreds of people across several departments, whereby the organizational structure may have also changed. The registration of this one supplier or customer has since been subject to countless changes, actions and exceptions. “It is quite a challenge to ensure that every transaction is correctly processed,” as Dudink knows from experience. This challenge stands or falls with good Master Data Management. ■

ABOUT SAS

SAS understands that almost everything is data driven. We want to help you make sure that this takes place correctly. Is your data easily accessible, clean, integrated and correctly stored? Do you know which types of data are used in your organization and by whom? And do you have an automated method that validates incoming data before it is stored in your databases? Take better decisions Thousands or maybe even hundreds of thousands of decisions are taken daily in your organization. Everyday decisions taken as part of a process: Can we grant this loan to this customer? What offer should I make to a customer who calls our contact centre? Tactical decisions, such as: What is the optimum balance between preventive and responsive maintenance of machinery? If we want to scrap five of the fifteen flavours we offer, which should they be? But also strategic decisions on your organizationâ&#x20AC;&#x2122;s direction. For example, in which product-market combinations do we want to be present? Information plays a role in all these decisions. The better the quality of the underlying data, the better the decisions you take. Get control of your data Making sure data is complete, accurate and timely can be a time-consuming job. Fortunately, the task can be largely automated. Spend less time gathering and maintaining information and more time running your business with SAS Data Management. This solution has been built on a unified platform and designed with both the business and IT in mind. It is the fastest, easiest and most comprehensive way of getting your data under control. SAS Data Management brings in-memory and in-database performance improvements which give you real-time access to reliable information. Proper process set-up To succeed in todayâ&#x20AC;&#x2122;s data-driven world, youâ&#x20AC;&#x2122;ll need more than just a robust data management platform. Processes and behaviour also play an important role when it comes to master data management, data integration, data quality, data governance and data federation. We can help you to set these up properly too. Because getting your data in order is not a one-off activity, but a continuous process. Whether you have a large or not-so-large volume of data, you transform it into great value and possibilities. Want to know more? Visit our website www.sas.com/dm, or contact us at sasinfo@sas.com

COLOPHON

Realization:

SAS Nederland Editor:

Ilanite Keijsers Photography:

Eric Fecken Authors:

Jasper Bakker Mirjam Hulsebos Chantal Schepers Cover:

Philip van Tol Design:

Alain Cohen Project management:

SAS Nederland The book, Future Bright â&#x20AC;&#x201C; A Data Driven Reality, was commissioned by SAS Netherlands. Content from the book can only be either copied or reproduced onto print, photo copy, film, Internet and any other medium, with the explicit permission of SAS Netherlands and its management, with proper acknowledgement. SAS Netherlands is not responsible for the statements made by the interviewed parties in this book.