Issuu on Google+

What is scoring and why you need it Scoring Scoring is used to rate customers according to the probability of business event or customer action, such as timely credit return, risk of default, retention or cross-sale. With scored customers you can automatically take profitable decisions, such as accept/reject credit application, increase/decrease credit limit, send/hold cross-sale offer, etc. Scoring technologies can be used as an objective risk management tool, which help ensure centralized, uniform, more consistent and reliable decision management across your organization. Quality and profitability of scoring-based operational decisions can be statistically monitored and gradually improved or adjusted to new market conditions. Scoring is the most widely used in lending for all stages of a credit life-cycle, from borrower acquisition to customer management to debt collection and recovery. Credit scoring examples will be used in these materials to display scoring techniques.

Scorecard To score customers you need a scorecard. Scorecard is a mathematical model represented as a set of weights assigned to customer’s characteristics that affect his creditworthiness or any other target behavior modeled by the scorecard.



Age 18...37






Cards None




credit card



To create statistically-based scorecard you need to apply statistical (predictive analytics) method to your historical data (data about your past and current customers). The most widely used statistical method for scorecard development is logistic regression.

One None

60 0

no information


Two and more


Credit History None






with negative factors


Cut-off odds and risk groups To make automatic decisions you need to define score cut-off (or threshold) that will divide “Good” customers that display positive behavior (such as good profit and timely re-payments) from “Bad” customers that most probably will display negative behavior (such as default). Those customers, whose score is less than the cut-off point are rejected, the automatic decision for them will be “No”. Those customers whose score is higher than the cut-off point are accepted, the automatic decision for them will be “Yes”. Based on customer score you also can segment your customers to risk groups.

Risk group is defined by odds of being “Good” relatively to odds of being “Bad” (e.g. delinquent). For example, a borrower that belongs to group with odds 300:1 has very low risk of being delinquent. But if a borrower belongs to group with odds 5:1 he/she has unacceptable credit risk. Even the 300:1 risk group contains 1 “Bad” customer for every 300 “Good” customers, thus cut-off point approach assumes that in any case you will have a small amount of “Bad” customers in accepted segment and you will reject a certain amount of “Good” customers.

What scoring can bring you

Increase profitability of every-day operational decisions, such as customer acquisition or customer relationships management decisions.

Eprst Eprst Bank Bank k k

Eprst Eprst Bank Bank k k

87496879835 87496879835 87496879835 87496879835

Eprst Eprst Bank Bank k k

Eprst Eprst Bank Bank k k

87496879835 87496879835 87496879835 87496879835

Eprst Eprst Bank Bank k k

Eprst Eprst Bank Bank k k

87496879835 8 87496879835 8 6879835 6879835 87496879835 87496879835

Eprst Eprst Bank Bank k k

Eprst Eprst Bank Bank k k

87496879835 8 87496879835 8 6879835 6879835 87496879835 87 87496879835 87 79835 79835


accept/reject accept/reject


increase/decrease increase/decrease


sell/collect sell/collect

Eprst Eprst Bank Bank Eprst Eprst Bank Bank 87496879835 87496879835 87496879835 87496879835

Decrease decision time, streamline credit operations

1 - 10 days 1 - 10 days

1 - 10 minutes 1 - 10 minutes

What scoring can bring you


1000 1000

1000 1000


John Doe

Sandra John Lopez Doe

Sandra Michael Lopez Smith

Michael Smith

$ Credit limit

$ Credit limit

$ Credit limit

Ensure both automated and individual approach to every customer

Credit limit





John Doe

Sandra John Lopez Doe

3500 1200


Sandra Michael Lopez Smith

Michael Smith

Automate mass operational decisions and reduce labor costs

Ensure agility to changing market conditions cut-off-point cut-off-point

rejected rejected

Reduce human decision error and obsolete subjectivity biases

accepted accepted

rejected rejected

accepted accepted

Types of Scoring Application Scoring Application scoring quantifies the risks, associated with loan applications, by evaluating the social, demographic, financial, and other data collected at the time of the application. It facilitates customer acquisition decisions. Used in application processing systems, application scoring helps to automate the whole process of loan origination.

Behavioral Scoring Behavioral scoring quantifies the customer behavior to improve your credit portfolio management and customer management. Behavioral scoring helps you to better understand your customers, and, the better you understand your customers, the more effectively you can respond to their individual needs and increase your bottom line. It facilitates customer management decisions. Used in credit management or portfolio processing systems, behavioral scoring helps to automatically segment and rate accounts, customers and portfolios thus allowing managing efficiently a particular borrower’s credit account as well as the entire credit portfolio.

Collection Scoring Collection scoring quantifies the probability of recovery of the outstanding balance for those accounts in collections. Collection scorecard statistically estimates debtor’s willingness to pay and ability to pay and thus helps to define what actions should be done to increase collections. It facilitates debt management decisions. Used in debt collection systems, collection scoring helps you improve your collection and recovery efficiency, reduce write-offs and decrease staff costs.

Fraud Scoring Fraud scoring rank-order applicants according to the probability that an application may be fraudulent and thus alerts you of potentially fraudulent applications before you book an account. It facilitates fraud detection and prevention helping you instantly decide which of the applications should be rejected or set aside for more in-depth evaluation due to high fraud probability. Used as addition to loan origination systems, fraud scoring helps lenders to increase their profits and enhance their customer service, by identifying potential fraud at the earliest point possible.

Judgmental vs. Statistical Scorecard If you have no historical data you can create judgmental scorecard by selecting customer’s characteristics and corresponding weights based on experts opinion.

After several months, when you will have historical data about delinquencies, you can gradually switch from judgment to statistical scorecard.

Comparison of judgmental and statistical approach for loan applicant evaluation Characteristics


Credit Scoring

Payment to Income Ratio



Marital Status



Last Work Record









Credit History



Home Ownership



Time At Branch










Decision Odds of repayment

What you need to start using scoring 1. Prepare Customer Portfolio (Historical Data Preparation)

2. Select Predictive Customer Characteristics (Variables Selection and Binning)

3. Create Scorecard (Probability Model Development) 4. Analyze Scorecard Quality (Scorecard Validation)

5. Integrate Scorecard into Your Business Process (Scorecard Deployment)

Historical Data Preparation Historical Data Preparation 1. Historical data should include a set of characteristics and a target variable. All of scorecard development methods quantify the relationship between the characteristics (input columns) and “Good/Bad” performance (target column).

2. Example of borrowers characteristics. Scorecard characteristics are similar to those used in subjective expert judgment . Credit Product


Credit History


• Amount • Term • Document Goal

• Assets • Debts • Monthly Income • Monthly Expenses

• In Current Bank • In Other Banks • Credit Bureau Data

3. Those characteristics, whose usage is not reasonable, are excluded. For example: on the picture you can see that the “Good/ Bad” distribution does not depend on the Home Ownership characteristics.

4. All borrowers should be marked in the target column as “Good” or “Bad” by a certain rule. For example: all the borrowers to pay in 30 days, are “Good”, but borrowers with a delay of more than 90 days are marked as “Bad”.

• Work experience • Time of residence at current address

• Marital status

Good Intermediate Bad Bad Bad Bad 0

Exclusions Certain types of accounts need to be excluded from the dataset. For example: bank workers or VIP clients records could be excluded from data set.


Data Cleansing Borrowers portfolio data can contain the following anomalies that should be replaced or deleted : • Outliers - values that lie far outside the main volume • Data entry errors

• Missing values

missing value

data entry error







Binning What is binning Binning means the process of transforming a numeric characteristic into a categorical one as well as re-grouping and consolidating categorical characteristics.

Why binning is required • Increases scorecard stability: some characteristic values can rarely occur, and will lead to instability if not grouped together.

• Improves quality: grouping of similar attributes with similar

predictive strengths will increase scorecard accuracy. • Allows to understand logical trends of “Good/Bad” deviations for each characteristic. • Prevents scorecard impairment otherwise possible due to seldom reversal patterns and extreme values. • Prevents overfitting(overtraining) possible with numerical variables

Automatic binning The most widely used automatic binning algorithm is Chi-merge. Chi-merge is a process of dividing into intervals (bins) in the way that neighboring bins will differ from each other as much as possible in the ratio of “Good” and “Bad” records in them.

Analysis and manual correction of automatic binning Sometimes due to particularities in data distribution automatic binning needs to be corrected manually. The example below shows the range divided into 5 bins using an automatic binning (Fig 1.), now we only need to manually adjust the band. For example, manually adjusts the second boundary of the range for several values to the left, from 5.02 to 4.94 (Fig 2.) and recalculate WOE values. As a result, we will get a smooth decreasing WOE curve indicating the correct distribution of values within the ranges. Sometimes, for easier analysis automatic binning ranges should be adjusted to logical boundaries. For example for Age or Job Time boundaries can be adjusted to integers.

For visual cross-verification of automatic binning results one can use WOE values (Fig 1.).

Fig. 1 - Sharply-varied and illogical WOE graph after automatic binning

One of the intervals differs from the adjacent WOE levels Weight of evidence (WOE) - a quantitive measurement of the probability that a borrower segment will pay off predictably.

Fig. 2 - Smooth and logical WOE decline after manual correction

Gini & ROC Curve

The most widely used way to evaluate quality of a scorecard is Gini coefficient and ROC curve ROC curve that located higher and more to the left is indicates better scorecard quality. The evaluation of the quality of classification by the Gini coefficient can be checked with the help of the following tables:

Application Scoring

Behavioral Scoring

Gini value

Classification quality

Gini value

Classification quality

from 0.25


from 0.45


0.25 - 0.45


0.45 - 0.65


0.45 - 0.6


0.65 - 0.8


more than 0.6

Very good

more than 0.8

Very good

Collection Scoring

Fraud Scoring

Gini value

Classification quality

from 0.35


0.35 - 0.55


0.55 - 0.7


more than 0.7

Very good

The Gini approach is not relevant for fraud scoring because the number of fraudsters in a typical dataset is too small, and scorecard quality should be analyzed with other methods.

ROC Curve values usually calculated not only for the dataset that was used to create scorecard (training set), but also for a separate out-ofsample validation dataset. ROC Curve values for training and validation datasets should be close to each other. When several scorecards are compared, preference is given to the one with the highest Gini value. Unacceptable ROC curve performance. Scorecard need to be improved.

Reality. Acceptable ROC curve performance.

Perfect ROC curve performance.

Kolmogorov-Smirnov Curve

Shows how strong is the difference between the distributions of the “Bad” and “Good” borrowers

Kolmogorov-Smirnov curve shows the difference between the distribution of “Goods” and “Bads”. The maximum difference between “Goods” and “Bads” distribution known as a Kolmogorov-Smirnov value, that is often used together with Gini value to asses scorecard quality.

This point is the maximal difference between the distribution of “Good” and “Bad” customers

Kolmogorov-Smirnov values usually calculated not only for the dataset that was used to create scorecard (training set), but also for a separate out-of-sample validation dataset. Kolmogorov-Smirnov values for training and validation datasets should be close to each other. When several scorecards compared, preference is given to the one with the highest Kolmogorov-Smirnov value.

Unacceptable Kolmogorov-Smirnov curves • Kolmogorov-Smirnov for training and validation subsets are not similar

• Kolmogorov-Smirnov value is very small • Is not high enough • Curve shape is not smooth

Risk Segments

Allows evaluating the logicality and magnitude of the risks’ distribution and expected odds for each risk segment

Additionaly, Risk Segments Graph helps to select cut-off point based on “Good : Bad” odds.

Acceptable Risk Segments Graph

Acceptable Risk Segments Graph

High Risk Critical Risk

Not V-shaped distribution: • Not all the biggest values are located on the edges, some of them are in the center • “Good : Bad” odds are not located sequentially one after another, they are mixed. • It is impossible to define acceptable cut-off point

Low Risk

Medium Risk “Good : Bad” Odds are 11:1, 15:1, 36:1

A monotonous decrease of the “Bad” share from left to right and an increase of the “Good” share as the score increases (V-shape curve) means that the Scorecard’s performance is correct.

Strategy Curve

Allows evaluating the degree of the discrepancy between actual odds and odds predicted by scorecard and determining those score ranges where the scorecard makes majority of mistakes Additionally, Strategy Curve helps making decisions about the use of the developed Scorecard and its complete or partial restructuring, depending on the degree of the found discrepancy.

Unacceptable Strategy Curve

Acceptable Strategy Curve Low quality zone. High disparity. Can be left as is since located below cut-off point.

Actual Curve is much worse than Predicted for high scores.

High quality zone. Minimal disparity. Acceptable quality zone. Allowable disparity.

Great mismatch between Predicted and Actual curves

Approval vs “Bad” Rate Shows the dependence between the approved borrowers and the corresponding share of the “Bad” borrowers for each score Approval vs “Bad” Rate chart allows setting the initial cut-off point value that ensures the minimum level of the share of the “Bad” borrowers under a permissible level of the approved borrowers.

Unacceptable Approval vs “Bad” Rate chart

“Bad” rate is greater then Approval. Approval rate decrease faster then “Bad” rate.

Acceptable Approval vs “Bad” Rate chart

Approval rate is greaten then “Bad” rate. “Bad” rate decreases faster then Approval rate.

Close to ideal Approval vs “Bad” Rate chart

“Good” and “Bad” distribution Allows to visually assess the distribution of “Good” and “Bad”, resulting from the use Scorecard Acceptable distribution graphs The typical “hill-like” shape of the peaks and easily seen difference between “Good” and “Bad” distributions indicate a proper Scorecard performance and its ability to differentiate between “Goods” and “Bads”.

Unacceptable distribution graphs Scorecard does not help to differentiate “Goods” from “Bads”

Rejected “Goods”

Approved “Bads”

Close to ideal distribution graphs “Bads” Peak

“Goods” Peak

While number of “Bads” goes down, number of “Goods” rises

Cumulative event rates Shows the dependence between changes in rates of “Good” and “Bad” and changes in the score

Acceptable Cumulative Event Rates graph

An increase in the share of the “Good” outcomes, accompanied by a decrease in the number of “Bad” accounts, confirms that the Scorecard’s performance is logical. A monotonous decrease in the share of the “Bad” borrowers in the upper score range speaks about the correctness of the Scorecard’s performance and its ability to differentiate “Bad” borrowers into the lower part of the working range.

“Bads” situated in the part of the population with low scores

“Goods” appears in the part of the population with high scores

Unacceptable Cumulative Event Rates graph

Intersection of the curves should be closer to the left of the graph

Curve should be smooth but it has drop downs on the chart area with the highest risk

Scoring training