Page 1

Big Data – Que es y porque es importante para el Sector Financiero? Septiembre, 2012


Big Data: Massive Data Growth Last 5 Years And 80% is typically not in traditional enterprise data warehouses §  Digital is the primary Complex, Unstructured

driver of new data

§  80% of this new digital data

is complex to analyze in its raw structure Relational

§  Digital data is growing at

62% annually vs. structured data at 22%

2

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

Source: An IDC White Paper. As the Economy Contracts, the Digital Universe Expands. May 2009.


New Forms of Data

Extending Big Data Beyond the EDW

•  Long strings of encoded page clicks, sessions, and actions •  Entry points to a website tracked by cookie strings

Aster Data

Big Data elements •  Social connections

Raw formats: Lengthy text strings, binary, blobs, social graphs Rapid updates, data refreshes: Online click stream, stock orders, social connections/friends High volume: Embedded processing to eliminate data movement

3

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

•  e.g. One stock order split into 100s of transactions over days/ weeks •  e.g. ACH transactions, Service/Customer Support records, insurance claims •  Wide tables with highly descriptive textual strings


New Analytics Are Needed to Gain Big Data Insights Data Size with Multi-Structure Forms Require New Analytic Approaches

Big Data Analytics •  Deliver Path, Pattern Matching, Time Series & Graph Analysis •  Iterative Discovery •  Use of SQL & Non-SQL and Techniques (MapReduce)

4

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

“CIOs face significant challenges in addressing the issues surrounding big data…

New technologies and applications are emerging …and should be investigated.”

Source: CEO Advisory: ‘Big Data’ Equals Big Opportunity, Gartner, 31 March 2011.


What is Big Data?

5

Confidential and proprietary. Copyright Š 2011 Teradata Corporation.


What is Big Data? •  Big Data = Large scale (data volume) analytics

6

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


What is Big Data? •  Big Data = Large scale (data volume) analytics ü  MPP SQL databases have delivered large scale analytics for over a decade. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data.

•  Big Data = Emerging new data types

7

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


What is Big Data? •  Big Data = Large scale (data volume) analytics ü  MPP SQL databases have delivered large scale analytics for over a decade. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data.

•  Big Data = Emerging new data types ü  New multi-structured data types with unknown relationships that require processing of data regardless of size to discover insights. Examples include web logs, sensor networks, social networks, text.

8

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


What is Big Data? •  Big Data = Large scale (data volume) analytics ü  MPP SQL databases have delivered large scale analytics for over a decade. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data.

•  Big Data = Emerging new data types ü  New multi-structured data types with unknown relationships that require processing of data regardless of size to discover insights. Examples include web logs, sensor networks, social networks, text.

•  Big Data = New (non-SQL) analytics

9

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


What is Big Data? •  Big Data = Large scale (data volume) analytics ü  MPP SQL databases have delivered large scale analytics for over a decade. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data.

•  Big Data = Emerging new data types ü  New multi-structured data types with unknown relationships that require processing of data regardless of size to discover insights. Examples include web logs, sensor networks, social networks, text.

•  Big Data = New (non-SQL) analytics ü New Analytic Frameworks that provides parallel processing on semi-structured data. Leveraging the power of MapReduce (Programmatic Languages; Java, Python, Perl, C, C++) 10

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


Big Data Challenges are More Than Data Size

The Four Axes of Big Data

“CIOs face significant challenges in addressing the issues surrounding big data… New technologies and applications are emerging (examples include Hadoop and MapReduce) and should be investigated to understand their potential value.”

Source: CEO Advisory: ‘Big Data’ Equals Big Opportunity, Gartner, 31 March 2011.

11

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


Ease of Development and Reuse Analytic Foundation : 50+ out-of-the-box modules Modules

Path Analysis Discover patterns in rows of sequential data

Business-ready SQL-MapReduce Functions •  nPath: complex sequential analysis for time series analysis and behavioral pattern analysis •  Sessionization: identifies sessions from time series data in a single pass over the data •  Attribution: operator to help ad networks and websites to distribute “credit” •  Histogram: function to provide capability of generating •  Decision Trees: Native implementation of parallel random forests.

Statistical Analysis

•  Approximate percentiles and distinct counts: calculate percentiles and counts within specific variance

High-performance processing of common statistical calculations

•  Regression: performs linear or logistic regression between an output variable and a set of input variables

•  Correlation: calculation that characterizes the strength of the relation between different columns

•  Averages: calculate moving, weighted, exponential or volumeweighted averages over a window of data

Relational Analysis

•  Graph analysis: finds shortest path from a distinct node to all other nodes in a graph •  Tokenization: splits strings into individual words to assist text processing

Discover important relationships among data Confidential and proprietary. Copyright © 2011 Teradata Corporation. 12


Ease of Development and Reuse Analytic Foundation : 50+ out-of-the-box modules Modules

Text Analysis Derive patterns in textual data

SQL-MapReduce Analytic Functions •  Text Processing: counts occurrences of words, identifies roots, & tracks relative positions of words & multi-word phrases •  Text Partition: analyzes text data over multiple rows •  Levenshtein Distance: computes the distance between two words •  k-Means: clusters data into a specified number of groupings

Cluster Analysis Discover natural groupings of data points

•  Canopy: partitions data into overlapping subsets within which kmeans is performed •  Minhash: buckets highly-dimensional items for cluster analysis •  Basket analysis: creates configurable groupings of related items from transaction records in single pass •  Collaborative Filter: predicts the interests of a user by collecting interest information from many users

Data Transformation

•  Unpack: extracts nested data for further analysis

Transform data for more advanced analysis

•  Multicase: case statement that supports row match for multiple cases

13

•  Pack: compress multi-column data into a single column •  Antiselect: returns all columns except for specified column

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


Enterprise Discovery Architecture Data Sources

ETL

Non relational Data

Discovery

Discovery Apps

Aster Discovery Platform

Fraud Discovery

MultiStructured Data

Structured Data

OLTP DBMS’s

14

SAS In-DB Modeling

Teradata IDW

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

Data Scientist

Customer Discovery Business Insight Discovery

ETL

Users

SAS Analyst

R Analyst

R In-DB BI Tools

Business Analyst


Financial POC Data Sets Analyzed

15

Confidential and proprietary. Copyright Š 2011 Teradata Corporation.


Events Preceding Account Closure

16

Confidential and proprietary. Copyright Š 2011 Teradata Corporation.


Events Preceding Account Closure

SELECT * FROM nPath ( ON (…) PARTITION BY sba_id ORDER BY datestamp MODE (NONOVERLAPPING) PATTERN ('(OTHER_EVENT|FEE_EVENT)+') SYMBOLS ( event LIKE '%REVERSE FEE%' AS FEE_EVENT, event NOT LIKE '%REVERSE FEE%' AS OTHER_EVENT) RESULT (…) ) n;

17

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

Closed Accounts Fee reversal seems to be a “Signal”


Aster in Retail Banking: “Last Mile” Marketing Cross-Channel Customer Interactions

Challenge •  Know the “last mile” of a decision •  Data Mining tools predict probability but do not ID the “last mile”

With Aster

17,000 Customers, 1 Month

34,000 Branch Visits

25,000 ATM Sessions

•  SQL-MapReduce listens and predicts the “last mile” -  Identifies all interaction patterns prior to acquisition or attrition

Business Impact •  10-300x less effort to pinpoint a customer in the “last mile”

5,000 Call Center Sessions

43,000 E-mails 18

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

92,000 Online Sessions


Aster MapReduce: Understanding the “Last Mile”

Jan 5: Reverse Fee Request

Jan 10: Request Made Again

Jan 7: Request Made Again

Jan 20: Account Closed

Jan 15: Request Made Again

What if I knew that this customer was likely to leave? I could… •  Apologize •  Offer an explanation •  Reverse the $5 fee “It takes 3x more to acquire a customer than to retain one”

19

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


Aster Removes the “Last Mile” Technical Challenge Teradata Aster MapReduce Platform

Completes the customer profile with digital data •  Adds web, social, call center data

custID

channel1

channeln

•  Stitches rows together by customer in a time-ordered view

10001

Online Banking

Account Close

20001

Call Center

Branch Visit

Total # of Customers

channel1

channeln

Online Banking

Account Close

Bank Branch Visit

Account Close

Scans all customer record patterns in a single pass •  No need to define patterns in advance •  Fully parallelized for SQL-MapReduce performance

Summarizes output for business exploration

•  Rank orders the most popular paths and 35 yet represent the long tail too

26

20

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


Visualizing Aster nPath Analysis Bump Chart

Funnel Chart

Color Chart

Time Plot


Big Data Business Impact: Example Use Cases Some Examples Use Case

Business Description

Digital Marketing Optimization

Analysis of user behavior, intent, and actions across search, ad media and web properties to increase the ROI for digital media marketing efforts.

Social Network and Relationship Analysis

Uncover deep social relationships and interactions hidden in raw transaction data, online behavior, and social networks in order to gain behavioral insights, target influencer marketing, and analyze virality within the social network.

Fraud Detection and Prevention

On-the-fly analysis of transactions, interactions, and systems to detect, block, and prevent malicious users, networks, and programs engaged in fraud.

Machine Data Analysis

Analysis of sensor, location, and machine to machine communications to optimize operational efficiencies.

22

Confidential and proprietary. Copyright Š 2011 Teradata Corporation.


New Kinds of Analysis Graph Analysis Indirect Relationship

Direct Relationship

Social and Relationship Analysis

Social Link

Uncover deep social relationships and interactions hidden in raw transaction data, online behavior, and social networks that can be used for behavioral analysis, influencer marketing, virality analysis, crowd sourcing, and similar applications.

Person

Pattern Matching Analysis Discover patterns in rows of sequential data {user, page, time}

Weblogs Smart Meters

Click 1

Click 2

Click 3

Click 4

Reading 2

Reading 3

{device, value, time}

Sales Transactions

Reading 1

Reading 4

{user, product, time} Purchase 1

Purchase 2

Purchase 3

Purchase 4

{stock, price, time}

Stock Tick Data

Tick 1

Tick 2

Tick 3

Analysis of user behavior, intent, and actions across search, ad media and web properties to create an interaction map of user behavior across digital media assets and drive increased ROI for digital media marketing efforts.

Tick 4

{user, number, time}

Call Data Records

23

Call 1

Call 2

Call 3

Call 4

Call 5

Confidential and proprietary. Copyright Š 2011 Teradata Corporation.


New: Aster MapReduce Appliance •  High Performance Analytics

-  Powerful solution for Big Data Analytics using patented SQLMapReduce framework -  Massively parallel processing architecture optimizes performance

•  Appliance Solution

-  Purpose-built integrated hardware / software solution -  Nodes, software, storage, and networking in a single rack -  8 nodes per cabinet, scalable to 6 cabinets and over 200 TB of customer data -  Delivered ready to run at a competitive price point -  Leading edge Intel processors for fast scans and performance

•  Enterprise Ready

-  Integrated with Teradata Warehouse to expand analytical capabilities -  ODBC and JDBC support for major business intelligence, visualization, and ETL tools -  Native Hadoop connectivity -  Management tools for monitoring system health

24

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


The Broad Business Impact of Fraud

Source: Ponemon Institute, “Consumers’ Reaction to Online Fraud”, April 2011

•  Direct revenue impact -  Cost of uncovering fraud -  Cost of correcting fraud -  Cost of regulatory penalties

25

•  Damage to customer satisfaction -  Customers’ decisions influenced by perceived risk of fraud -  Fraud creates significant indirect impact on revenue

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

•  Risk of regulatory penalties -  Consumer protection regulations -  Privacy regulations


Challenges in Addressing Fraud •  Timeliness of detection critical -  Time to detection determines cost of correction

•  Fraud continues to adapt and evolve -  Techniques rapidly change to adapt to new monitoring techniques

•  Technology enables new means of fraud -  Automated gaming bots -  Online identity threat

26

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

Consumer Concerns •  42% of survey respondents believe they have been victims of online fraud •  57% of respondents do not believe online companies are taking enough precautions to protect them against online fraud Ponemon Institute “Consumers’ Reaction to Online Fraud” April 2011


Implementing Solutions for Fraud Detection Rules for real-time monitoring, updated based on analysis

Real-time processing engine

Aster Data Analytic Platform

Fraud Models

Teradata Integrated Data Warehouse Multi-structured Data •  Web logs •  Text fields •  …

Relational Data •  Transactions •  Means of payment •  Customer profile

Exploration and investigation of data to identify relationships indicative of likely fraud 27

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

Transactions

Payments

Customer Records

Returns

Models in data warehouse updated for ongoing scoring processes


Bringing Together Multi-Structured Data for Fraud Detection and Prevention Social media data

Payment records Location information

ACH transaction data

CRM records

Stock trade transactions

Audit records

Aster Data

Account activity

Purchase history Web log data Purchase records User profile information

Claim forms

Adjuster notes

28

Confidential and proprietary. Copyright Š 2011 Teradata Corporation.


Performing Rich Analytics to Detect Fraud •  Identify suspect data -  False identities, multiple profiles, invalid credit cards, …

•  Identify suspect relationships -  Collusion, transaction structure, money transfers, …

•  Identify suspect patterns -  Order velocity, purchasing behavior, claims submissions, …

Example Analytic Tools

Graph and network analysis

{user, page, time}

Web Logs

Pattern & time series analysis

Click 1

Insurance Claims

Claim 1

Transactions

Stock Tick Data

All at massive scale across multiple data sources and types 29

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

Click 2

Click 3

Click 4

{user, payee, date} Claim 2

Claim 3

Claim 4

{user, product, time} Purchase 1 Purchase 2 Purchase 3 Purchase 4 {stock, price, user, time}

Returns

Trade 2 Trade 3 Trade 1 {customer, SKU, price, date} Return 1 Return 2

Trade 4

Return 3 Return 4


Using Pattern Detection to Identify Fraud •  Manual sampling approaches insufficient -  Find only small sample of fraud

Event Pattern Detection

-  Highly-inefficient approach Event

•  Automation of fraud detection critical to improving detection

Event

Event

Event

Event

-  Need to identify unusual patterns indicative of likely fraud -  Need to rapidly evolve algorithms to improve accuracy and catch new types of fraud

•  Requires unique capabilities -  SQL approach requires knowing pattern in advance -  SQL approach requires highly inefficient multiple data scans & selfjoins to find patterns 30

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

Event

Example events: •  Purchase •  Return •  Claim submission •  Game play •  Securities trade

Event

Event


Pattern Analysis with Aster Data Uncovering suspicious patterns in sequences of events Aster Data Capabilities -  nPath pre-packaged SQL-MapReduce function for finding sequences of events -  Identify patterns across diverse types of events and interactions -  Find all patterns that connect specified events

nPath Pattern Analysis

Benefits -  Pattern detection via a single pass over the data for rapid results -  Allows you to understand any trend that needs to be analyzed over a continuous period of time -  Easily modify analysis without the complex code and significant changes required by SQL approaches

31

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

Examples: •  Identify significant changes in risk scores •  Find unusual patterns in stock trading •  Uncover suspicious sequences in online games


Graph Analysis for Fraud Detection & Prevention •  Identify complex networks of relationships -  Users to transactions -  Transactions to means of payment -  Identities to individuals -  …

•  Fraud identified by graph relationships -  Clusters of connections -  Patterns of activity between connections

•  Understand impact of fraud -  Trace flow of money and goods -  Identify users impacted by fraud

32

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


Examples


Pattern Match Analysis: SQL-MapReduce for Fraud Detection & Prevention Analyzing Play Patterns for Fraudulent Activity Business Goal •  Detect and disrupt various types of fraud, like collusion Analytic Application in Aster Data •  Identify collusion targeted at money laundering schemes •  Monitor play between any two players who are “known” to one another– data captured in binary format to keep up with play •  Tag as unusual a pattern of play where player 1 loses an unusual number of time to player 2 across game sessions— nPath performs advanced path and pattern matching Business Impact •  Revenue protection: 115x faster fraud & pattern detection •  Site integrity: increase player count & market share •  Brand trust by reducing fraud: increases revenue/player Other Aster Data Applications at Full Tilt Poker •  Follow-the Money fraud tracing •  Player dashboard 34

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

•  From 1 week to trace fraud to 15 minutes using SQLMapReduce •  Found 5 new fraud patterns that were previously not detectable using Java & SQL-based analytics •  Full granular detail; from 1200 hands/second analyzed to 140,000 hands/second •  60X faster queries; 90 minutes to 90 seconds


Fraud Analytics: Detecting and Preventing Fraud in Online Retail Transactions Multi-Dimensional Analysis for Fraud Detection •  Provider of cloud-based fraud detection solutions for e-commerce •  Utilize wide variety of data to build a “contextual score” that helps identify fraudulent users •  Adaptable rules and scoring enable rapid, agile evolution as fraud evolves

Business Goal: •  Offer uniquely accurate and adaptable fraud detection solutions for e-commerce Aster Data Unique Value •  Support interactive exploration of data to discover and characterize fraud patterns Business Impact: •  Detect fraud: rapidly identify transactions and interactions that are likely to be fradulent •  Prevent fraud: block fraudulent activity before it occurs based on advanced analysis •  Focus on real users: understand which users and interactions represent genuine customers and prospects worth focusing on for conversion

35

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


Example: e-Commerce Transaction Fraud Uncover fraudulent users and transactions Challenges -  Massive volumes of data about users, devices, and activity to monitor

Examples

-  Need to find complex patterns in and relationships in data -  Need for graph and path analysis to understand data -  Need frequent analysis to rapidly evolve detection rules and algorithms

Aster Data Value -  Ability to store and process massive volumes of diverse multi-structured data -  Massive scalability accelerates exploring and testing patterns on large data sets -  Power and flexibility of capabilities for pattern analysis and graph processing simplifies detection of complex patterns 36

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

•  Card-not-present fraud: identify transactions patterns indicative of credit card fraud •  Returns fraud: uncover patterns that are highly correlated with fraudulent returns •  Collusive bidding: detect collusive bidding behavior on auction sites by detecting suspicious patterns of activity


Example: Claims Fraud Identify patterns in claims and payments that indicate fraud Challenges

Examples

•  Massive volumes of claims information •  Claims records include both relational and multi-structured data •  Unable to effectively combat fraud by manual sampling

Aster Data Value •  Ability to process and analyze multistructured and relational data together •  Massively-scalable processing of data and analytics •  Flexible, powerful tools to enable pattern analysis, text processing, and graph analysis

•  Insurance claims fraud: identify patterns and relationships indicative of claims likely to be fraudulent •  Medical claims fraud: identify networks of interactions and patterns indicative of fraud •  Payment fraud: find patterns that uncover fraudulent payment schemes

37

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


Example: Gaming Fraud Uncover collusion, gaming bots, money laundering Challenges

Examples

•  Massive volumes of game play data from massive numbers of users and gaming sessions •  Diverse multi-structured data formats (encoded game play data, text data, relational data, …) •  Constantly evolving fraud techniques •  Legal requirements for traceability and monitoring

Aster Data Value •  Ability to load, transform, and process diverse data •  Rich tools for pattern analysis on massive volumes of data •  Capabilities to enable identification and monitoring of complex networks of relationships 38

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

•  Collusion detection: uncover networks of conspirators colluding to defraud other players •  Botnet detection: detect machine players in games based on playing patterns •  Anti-money-laundering: track flow of money in money laundering through gaming


Consumer Financial Services Example: Online Fraud Analysis Monitor online consumer behavior

Online Fraud Analysis

•  Monitor log-in and navigation behavior online •  Evaluating click stream transactions often includes analysis of non-relational log files

Rule out false negatives and positives •  Over 40 known click stream fraud patterns that can be detected, e.g. frequent normal paths to creating a wire transfer (creation of transfer without checking balance) •  Identifying fraudulent activity often requires looking for patterns in behavior, e.g. good user logged in but in same time window a stalker also logs in and tries 3 different wire transfers of differing amounts before succeeding

Replay user sessions •  Once fraudulent activity is detected, the support team often requires the replaying of the activity to discuss the issue with the customer

39

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

Characteristics •  Raw web log data •  Complex pattern analysis •  Clustering analysis


Capital Markets Example: Near Real-Time Fraud Pattern Matching for Trader Surveillance Identify suspicious trading activity •  Use time-series based pattern matching to identify unusual patterns of trade activity intra-day

Trader Surveillance

•  Potential patterns include front-running, market manipulation, non-compliant positions that expose the firm to undue risk

Combine trading patterns with diverse data •  Introspect communications over e-mail and chat channels for corroborating evidence of trade misbehavior

Streamline investigations with iterative, hypothesis-driven query interface •  Save valuable time by conducting ad-hoc analysis of trades, cases, alerts, account data

40

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

Characteristics •  High rate of new data generation •  Granular data •  Simultaneous load & query


Healthcare Payor Example: Enhanced Fraud Detection Analyzing claim, demographic and web data •  Business Goal

-  Proactively identify fraud at first notice of loss; fraud rings, collusion or falsified claims -  Fraud identification continues to evolve for improved effectiveness

•  Solution

-  Pattern analysis •  Understand relationships among parties (physicians, consumers, organizations), locations, time of filing, frequency and circumstances •  Detect potential for computer generated claims -  Graph analysis of cohort networks -  Use MapReduce to structure social media for additional insight

Joint Differentiators •  Reduce complexity of analysis with MapReduce •  Pattern detection and nPath analysis against physician, consumer, claim, channel and location data •  Geospatial Analysis

•  Business impact

-  Identify national fraud rings -  Limit or reduce loss caused by fraudulent claims

Raw web logs

Key Characteristics: Granular Channel Data

41

Sessionization

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

nPath Pattern Matching


Example: Healthcare Fraud Waste & Abuse Identify fraud patterns to minimize false positives Challenges •  Interesting data is highly granular •  Patterns can not be identified unless they are part of the software program; human analysis can minimize the false positives

With MapReduce •  Timely and unique identification of unusual billing patterns •  Enables more efficient analysis by putting the tools in the hands of the Special Investigative Unit (SIU)

Example: Claims Data Billing Prov

Service Date

MembID

Serv Loc

123

1/1/2010

10001

Orlando, FL

135

1/1/2010

10001

Miami, FL

123

1/1/2010

10001

Orlando, FL

135

1/15/2010

10001

Miami, FL

123

1/1/2010

10001

New York, NY

234

12/24/2010

10002

Orlando, FL

345

12/24/2010

10003

Miami, FL

Impact •  Early detection minimizes payments of fraudulent claims, (estimates range from $125 - $800 billion in losses) •  Statistics show that $11 is saved on every $1 spent on fighting fraud* * Source: healthcare-informatics.com; Identifying Fraud, Barry Johnson, DDS 42

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


MapReduce simply identifies “Fraud Patterns” ALL interactions patterns evaluated in a single pass Billing Prov

Prepares multi-structured data

•  No need to define patterns in advance •  Fully parallelized for top performance using MapReduce where SQL falls down

MapReduce Platform

Service Loc

234 123 123

Step 1: Pivot data via nPath SQL-MapReduce Billing Prov

Service Loc1

Service Locn

date1

daten

123

Orlando, FL

New York, NY

1/1/2010

1/1/2010

135

Miami, FL

Miami, FL

1/1/2010

1/15/2010

Summarize output for business exploration •  Rank order the most popular paths and yet represent the long tail too 43

Member ID

123

•  Stitches rows together by customer in a time-ordered view

Scans all records to product a complete set of paths

Service Date

Confidential and proprietary. Copyright © 2011 Teradata Corporation.

Step 2: Run nPath SQLMapReduce Java Logic Billing Prov

Service Location

123

More than one service location

135

Single service location


Teradata Aster MapReduce Platform Advantages 1

Faster Combining MapReduce with RDBMS for the best of both worlds

§  Faster exploration of both multi-structured and structured data

2

Easier to Use - Investigative Analytics at Scale SQL and beyond, SQL-MapReduce framework + pre-built analytics, visual IDE

Useable by any SQL-savvy analyst or BI toolset

3

4

44

Easier manageability and ecosystem fit Enterprise -class manageability, extensive ecosystem integration

Plugs into existing IT investments without specialized skill sets

Lower total cost of ownership Better performance and ecosystem support = less hardware & expensive staff

§  Leverage what you have and don’t over-engineer the problem

Confidential and proprietary. Copyright © 2011 Teradata Corporation.


BIG DATA, ¿QUE ES Y POR QUE ES IMPORTANTE EN EL SECTOR FINANCIERO?  

Fernando Lezama Director General TeraData

Advertisement