The Need of Data Privacy Preservation in Offshore Operations Balaji Raghunatha
Abstract Outsourcing is proven to be an efficient way to cut costs, however, enterprises face challenges in leveraging this strategy because of the increasing threat in preserving customer data or confidentiality. Risks associated with noncompliance to government regulations on data privacy also hinder enterprises from making best use of such opportunities. Data privacy techniques can be used in such contexts to mask the personal information, thereby generating realistic but, false data that can be safely used for off-shore development, testing and support. In this paper we illustrate what to look for in a comprehensive data privacy solution and leverage the same for the enterprise-wide data protection needs.
Introduction In this new paradigm — do more with lesser or flat IT budgets, outsourcing has emerged as the key strategy for IT application development and testing. However, increasing incidences of data theft and the need to comply with various data privacy regulations related to movement of confidential data outside premises are forcing enterprises to desist from outsourcing applications, which deal with Personally Identifiable Information (PII) and other sensitive data. The terms: data obfuscation, data privacy, data masking are used alternatively to describe an ability to de-identify data for use in external environments.
Implementing a Comprehensive Data Obfuscation solution can mitigate risks associated with outsourcing applications which handle sensitive data.
Existing approach to address data privacy concerns Enterprises tend to protect sensitive data by means of physical separation coupled with network isolation. However, with most outsourcing vendors using cost-effective offshore locations for IT application development, testing and support, this approach is not appropriate. To overcome this issue, enterprises have attempted to use custom SQL scripts to mask sensitive data. Custom SQL scripts have their own limitations when used in an integrated environment. They are database specific and require a lot of maintenance effort. Preservation of referential integrity across data sources is a night mare. The diversity of data sources, database technologies and platforms in an enterprise makes Custom SQL Scripts an inadequate tactical solution for data obfuscation, meant only for simple stand-alone applications.
Need of the hour Given that sensitive data in an enterprise resides not in only in databases, but in files and message queues, the need-ofthe-hour is for a — comprehensive and cost effective data privacy solution. Apart from supporting multiple repeatable and non-repeatable data masking techniques, support for masking of sensitive data in multiple data sources like databases, files, message queues and support for heterogeneous platforms and technologies is a must. The solution must ensure that referential integrity of data is preserved across data sources in order to facilitate an integrated environment. The solution must also integrate with upstream and downstream applications to facilitate an end-to-end testing environment.
Technology diversity A typical enterprise has data distributed across legacy, modern and open source databases. Thus support for obfuscation of data residing in databases like Oracle, DB2, SQL Server, Sybase, MySQL etc. would form a minimum criterion for selection of a data privacy solution. In addition, these databases reside on Windows, Unix, Linux and Mainframes. To add to the complexity, fixed format files, variable format files and XML files store sensitive data. The data privacy solution must support the above data sources, technology and platforms.
Choice of Data Masking techniques The Employee names in the Employee table may need to be shuffled, while the credit card numbers must have only their last four digits visible with the rest of the twelve digits being masked with asterisks (*). The Patient code need to be hashed. The data privacy solution must provide a choice of masking technique.
Preservation of Data Integrity Rules like ―Ron‖ must be replaced with ―Rob‖ and ―Bob‖ must be replaced with ―Sylvan‖ must be consistent across databases as well as files and message queues in order to preserve data integrity. Masking of a primary key should result in foreign key also being masked with the same value in order to preserve referential integrity. The data privacy solution must enable preservation of referential integrity within and across data sources.
2 | Infosys – White Paper
Support for End-to-end integration testing Upstream applications typically exchange information using files and messages in message queues. This information may contain sensitive information. The data privacy solution must allow integration with the upstream and downstream applications. This integration helps the solution mask sensitive data in incoming messages dynamically before the downstream processes the messages. This ability to integrate in a dynamic environment facilitates end-to-end testing while preventing sensitive data from being revealed to downstream applications.
Characteristics of a Comprehensive Data Privacy Solution •
Should be customizable and configurable to enable use in multiple contexts.
Support both static (masking of data at source) and dynamic masking (run-time masking of data at interface and/or interaction points).
Support a host masking algorithms including Shuffling, Substitution, Masking, Encryption, Number Variance, and Character Masking.
Provide deterministic masking, the capability to ensure consistency and standardization in masking of data across the enterprise is another much required attribute of a comprehensive data privacy solution.
Capable of masking of data across heterogeneous data sources and platforms.
Conclusion A comprehensive data privacy solution is a necessity today as it makes data containing customers’ personal information safe for use in developing, testing and support environments. Organizations can now outsource such tasks without the fear of data being misused. The data privacy solutions ensure that customer data is successfully masked for both static and dynamic contexts and can also be reversed if the context so demands. A comprehensive solution also adopts consistent and standard masking procedure which results in uniform masking across the enterprise. For a solution to be universally accepted, it must support files from heterogeneous sources, make use of multiple algorithms and work across platforms.