POINT POINT OF view OF view
Don’t Rock the Boat: Managing Data Flow By: Anand Raman, Commerce Technology Practice Manager, and Arvind Naik, Technical Architect, SapientNitro
THE BIG PICTURE In any e-commerce solution, it is integral to manage every piece of data: whether it’s product, catalog, or merchandise. And to drive successful e-commerce, a business must have complete, accurate, combined data available in a timely manner. Often, data flow is not a top priority. In fact, it’s perhaps a secondary thought at best. And although each e-commerce project brings its own unique challenges, there are common basal data elements across the board. After understanding the challenges and lessons in this paper, technical architects, developers, and project managers alike will be able to identify data flow design and data availability as a fundamental aspect of any e-commerce project and prepare a cohesive plan to address their unique business challenges. WHY DOES DATA FLOW MATTER? At the on-set of an e-commerce project, businesses typically provide little to no specific requirements around data flow. They might plan to have some catalog systems, search features, and product data loaded through backend systems but that’s about it. Typically the focus is on the customer experience and how to get the pertinent data to those customers and other business users. But without a comprehensive data flow, there will be setbacks in the future. Attention must also be paid to the timing of flow development. During the later part of the development, questions often arise such as: When should I expect my price or promotion to show up? What should I do if I want to remove a product right away? Unless you’ve thought about those questions early on, it could be too little too late. For instance, if you’re in the testing cycle of a project, it’s likely impossible to design a solution for data flow issues unless you have been thinking about them early on. Likewise, it’s difficult to have timely, frequent site refreshes without a comprehensive data flow strategy. It’s paramount to think about data flow as it pertains to every functional requirement—what kind of data with what kind of system within what time frame—in order to maintain efficiency and control throughout every process.
© Sapient Corporation 2012
POINT OF view
COMMON DATA TYPES
Fig. 1. A typical data flow diagram Data mapping is a critical part of logical data flow design and this pictorial diagram represents a typical e-commerce data flow solution. Of course, the process can be much more complicated, but this representation offers a basic outline of what to expect when planning requirements around data flow.
Mapping data in an accessible way will facilitate the discussion on data flow. Laying out the best, typical, and worst case Service-Level Agreement (SLA) is paramount in order to arrive at an agreeable set of service levels. At times, new integration techniques and solutions may need to be identified if none of the existing integration techniques are sufficient. Be prepared to even change the foundation of the solution architecture if certain service levels are critical to the existence of the business. Though each e-commerce system is unique and has special business needs, most share common basic categories of data. And all e-commerce systems need to handle many types of data each with its own source, lifecycle, rules, and criticality. Add in the multiple systems, business logic, workflows, and processing businesses must go through, and you’ve got a tremendously complex maze on your hands. The first step in defining a data flow strategy is to identify data types relevant to you. They include, but are not limited to: Product Data • Product information (e.g., specifications) • Product lifecycle (e.g., launch date) • Product images (e.g., various renditions) • Product rich content (e.g., multimedia) • Product merchant relationships (e.g., cross-sell, up-sell) • Product social data (e.g., ratings and reviews) • Product pricing (e.g., MSRP, sale price) • Pricing promotions and messages (e.g., discounts, clearance) Category Data • Category information (e.g., taxonomies: master, product, sales) • Category images • Category attributes
© Sapient Corporation 2012
POINT OF view Marketing Data • Marketing promotions (e.g., order or shipping offers) • Merchandizing relationships (e.g., personalized recommendations) • Shipping rates calculations Inventory Data • Availability • Stock-in-hand • Release/street date • Backorder/pre-order Search Index Data • Searchable attributes • Facets, keywords, SEO Once data types are identified, understand the expectations of the data by engaging in conversations with business stakeholders, analysts and other experts. Many times, the requirements are unclear, even for key stakeholders. In such situations, starting with the necessities that are practical and feasible is often the right approach. It can also help to articulate relationships and dependencies using an entity-relationship diagram. A typical diagram may have hundreds of tables and a number of dependencies, which have significant impact on the SLAs.
Fig. 2. An entity-relationship diagram
DATA SOURCES Major corporations have multiple sources to gather data; e-commerce data does not always originate from a single source. And, for each piece of data, you have to consider where the best source for that data lies. It is important to recognize the benefits and limitations all sources upfront to make the best possible decision.
© Sapient Corporation 2012
POINT OF view Data can originate from a number of systems such as Product Information Management Systems (PIM), Content Management Systems (CMS), Marketing Categorization Systems, Pricing and Sales Management Systems, Marketing Promotion Management Systems, Social Network and Ratings Systems, and Analytics Data Systems. Each system comes with its own technology, integration options, throughput, data quality, and error handling methods. Articulation of these system boundaries is critical as there may be a need to invest time and money to reduce limitations of certain systems in the ecosystem. DATA PROCESSING AND INTEGRATION SYSTEMS Once you’ve chosen the kinds of data and sources you require, you can then choose your data processing and integration systems. Below is a list that are commonly used, but it could get much longer with a real-life project: • Standard DataStage (e.g., ETL) • IBM BODL (Business Object Data Loader) • IBM WebSphere MQ Broker • Custom Integration Layer • WCS Stage Propagation Utility • Secure FTP/MFT Regardless of what system(s) you choose, you must then optimize them. Optimization, a process of improving the performance without compromising quality and maintainability, is a critical activity. Optimization challenges differ based on technology and integration techniques, but these strategies can help: 1. Tune Structured Query Language (SQL) several times to ensure efficiency. 2. Cache frequently used attributes to avoid unnecessary trips to the database. 3. Use batches to commit and process as much as possible, and to avoid high overheads. 4. Use parallel threads of processing wherever possible. 5. Use persistent MQ queues to protect the messages. 6. Pass only the required data to be updated to avoid unnecessary back-and-forth data. 7. Use smart updates when it’s not feasible to minimize the message payloads. 8. Conduct performance tests to ensure that the end-to-end data flow is optimized. THE CHALLENGES There is no shortage of challenges when it comes to data flow. A well-designed data solution requires that you recognize that: Time matters. Every content type is different in terms of its lifecycle and frequency of change. A lot of content is refreshed monthly or weekly, but some content types (e.g., promotions) have the propensity to change much quicker, forcing related messages (e.g., promotional merchandizing content) to change at the same rate. And building any e-commerce system doesn’t happen overnight; instead of months, it typically takes years. Also, each system may be under a different development cycle or timeline, which can add to the complexity. Data flow management requires constant attention. Providing a consistent customer experience in the face of ever-changing business and IT priorities is taxing. Businesses continue to adapt and change, as do their products and priorities. Combine that with on-going maintenance, integration specifications, bug fixes, releases, product upgrades … this is no simple endeavor.
© Sapient Corporation 2012
POINT OF view Architecture and design choices have an impact. Caching, an integral part of any e-commerce implementation, plays spoilsport to the overall strategy if not attended to during the early stages of implementation and development. It is imperative that all architecture and design decisions take into account the entire strategy. These all affect business decision-making and the stability of integration. So how do we build systems then to meet the ever-changing demands of business? And how do we build data flow around this fluid environment? The point is that when we think about data flow, all of these challenges (among others) must become considerations in order to guarantee data availability that’s quality-driven and timely given that we’re standing on such shaky ground. THE OBJECTIVES When we think about data flow aspects in an e-commerce system, we need to stress several goals for a successful business solution. First, identify critical data entities early on and identify the SLA requirements for them. It’s also crucial to identify how soon a data entity can be made available across the systems because the changes may need to be reflected in multiple areas, not just at the front end. In addition, be sure to identify emergency scenarios. You must be prepared for any circumstance that may arise, since it could have a detrimental impact in regards to legal issues, profitability, customer satisfaction, and overall business success—just to name a few. Second, understand your technology and system limitations in order to deliver all data in a timely manner. What can seem sufficient in the beginning can later reveal gaping holes. It’s mandatory that you and your team are thorough and understand each and every data system critical to the data flow design, not just in the day-to-day but in extreme situations as well. Third, set expectations for data availability. When you design a system, there are always limitations and it’s important to set the expectations up front so the business can plan out solutions well in advance. Along those lines, too, make sure to understand the impact on the business if an entity is not available as expected. And last, proactively determine solutions to improve the data flow and update SLA as needed. Doing this upfront gives you the padding necessary to counteract any issues that may arise in the future, such as strains on budgets and timelines. WHAT DOES YOUR BUSINESS NEED?
Fig. 3. The SLA consistency map
Promotions, Image Assets
© Sapient Corporation 2012
POINT OF view This is an example of an SLA consistency map we created for one of our clients, which used four quadrants to help them visualize and prioritize critical data entities. On the x-axis on this particular example we have Availability, the things you need as quickly as possible—in this case, entities like up-to-date images, inventory, and price data. On the y-axis we have Consistency, which reflects the importance of accuracy and precision with attributes like lifecycle changes. It is essential to keep in mind that these frequencies have significant impact on the resources and cost required to architect a data flow management solution. FINAL CONSIDERATIONS The complexity and importance of a high-functioning data flow system should be clear at this point. And with so many systems and options available, there are several questions you should be asking yourself: 1. Do I really need this system? Make sure you’re picking the systems that will allow you to optimize the workflow and make data available as quickly and consistently as possible. 2. Is it fit to handle throughput? If the answer is yes, decide what entities it is suitable for. 3. Can I minimize the systems between the source and the destination? The more steps you take, the higher the risk of out-of-sync data, lost data, or increased time to availability. 4. Can a system be upgraded or replaced with a higher-performing system, and can I improvise the systems for handling data? Again, making these decisions will best serve you if you make them upfront. Data flow management is critical to the success of an e-commerce site. It does not end once the data entities are identified and reasonable data flow architecture and integration techniques are implemented. Constant communication to understand expectations, communicate changes, and ensure alignment on an ongoing basis is absolutely essential. Data flow management should not be an afterthought but must be a priority that is addressed during the early phases—and every phase thereafter—of any e-commerce solution.
About the Authors Anand has been involved in design, implementation, and support of high volume transactional applications for the retail & travel industry. Over the past few years he has been involved in the build and rollout of the Target.com platform. Prior to Sapient, Anand worked with one of India’s largest media houses and worked on putting their popular properties online.
Anand Raman Arvind Naik, Technical Architect, has rich e-commerce sites implementation experience at Borders, David’s Bridal, Agriliance, and Sprint in similar roles. He enjoys large-scale technical problem solving and working with data flows. He has been instrumental in end-to-end data flow management and creating strategy roadmap projects for several projects across clients. He is interested in adding Cloud, PIM, and MDM to his technical portfolio. Arvind Naik
© Sapient Corporation 2012
Published on Jul 18, 2012
In any e-commerce solution, it is integral to manage every piece of data: whether it’s product, catalog, or merchandise. And to drive succes...