Data Masking a Canadian Social Insurance Number

ÂŠ 2009 Informatica Corporation

Abstract A Social Insurance number (SIN) is a number that the Canadian government issues to administer social programs. This article describes a Data Masking mapplet that you can configure to create a realistic SIN with a valid checksum.

Overview The Canadian Social Insurance number is an account number for the Canadian Pension Plan, unemployment insurance, and other government programs. The SIN is similar to the United States Social Security number. The following example shows how to configure a Data Masking mapplet to mask a Canadian SIN. The example includes a Data Masking transformation to change the SIN. It also includes an Expression transformation to format the SIN, and another Expression transformation that calculates a valid checksum number.

Source Data The source data is either nine characters without dashes or an eleven-character string in the following format: 123-456-789

The last character of the number is a checksum number.

Mapplet The mapplet includes an Expression transformation that converts a 9 character SIN to an 11 character SIN. A Data Masking transformation masks the 11 character SIN. Another Expression transformation calculates a checksum number and replaces the last character of the SIN. The following figure shows the mapplet:

The mapplet has the following transformations:

2

Input. Input transformation that receives the Canadian SIN from the PowerCenter mapping. Passes the number to the Expression transformation.

Exp_SIN_Formating. Expression transformation that converts a 9 character SIN to an 11 character SIN that contains dashes.

DM_Mask_SIN. Data Masking transformation that creates a key mask for the SIN.

Exp_Validate_SIN. Expression transformation that creates a checksum number.

Output. Output transformation that passes the masked SIN back to the PowerCenter mapping.

Input Transformation The Input transformation receives the SIN. Connect the Input transformation to the Source Qualifier in the PowerCenter mapping.

Exp_SIN_Formatting Expression Transformation The Exp_SIN_Formatting transformation receives the Canadian SIN number. If the number is less than 9 characters, the transformation pads the SIN with zeroes. If the SIN does not have dashes, the Expression transformation adds dashes to it. The Expression transformation contains the following ports: Port

Type

Expression

Description

SIN

Input

SIN

Social Insurance number.

SIN_VAR

Variable

LPAD(SIN, 9, '0' )

If SIN is less than 9 characters, the expression pads the number with zeros on the left side.

SIN_OUT

Output

iif( length(SIN) = 9 , substr(SIN_VAR, 1,3) || '-' || substr(SIN_VAR, 4,2) || '-' || substr(SIN_VAR, 6,4), SIN)

Adds dashes to a 9 character SIN. Returns an 11 character number.

DM_Mask_SIN Transformation The DM_Mask_SIN Data Masking transformation applies a key mask to the SIN. The Data Masking transformation has an input port and an associated output port for the SIN number. When you add a port to the Data Masking transformation, the Designer adds an output port by default. Each output port name is out_<port name>. The following figure shows the Masking Properties tab:

3

The Data Masking transformation applies key masking to the SIN. Key masking produces repeatable results for the same source SIN. The Data Masking transformation requires a seed value for the port when you configure it for key masking. For this example, the seed value is a default number. The mask format limits each character in the output column to a numeric character and the dashes are not masked. DDD+DDD+DDD

Expression Transformation The Expression transformation receives the SIN from the Data Masking transformation. The Expression transformation calculates a checksum number for the SIN using the Luhn algorithm. The Luhn algorithm is a checksum formula that businesses use to validate a variety of identification numbers such as credit card numbers. The following SIN has a valid checksum number according to the Luhn algorithm: 046 454 286

To test the checksum using the algorithm, multiply each digit in the SIN by the digit in the same position of the following number: 121 212 121

For example, the first digit of the SIN is zero. The first digit in the other number is one. Zero multiplied by one is zero. Zero is the first digit of the result. The second-to-last SIN digit, 8, multiplied by 2 is equal to 16. When the result is a two-digit number, add the digits together (1 + 6) and use the result, which is 7 in this example. The result is another 9 digit number: 086 858 276

Add the digits together: 0+8+6+8+5+8+2+7+6 = 50

The SIN is valid if the number is divisible by 10. The Expression transformation performs the same algorithm to calculate a checksum number. The following figure shows the ports in the Expression transformation:

4

The Expression transformation contains the following ports: Port

Expression

Description

Example Value

OUT_SIN_KEY

OUT_SIN_KEY

Receives the SIN from the Data Masking transformation.

046-454-284

Sec

TO_INTEGER(substr(out_SIN_KEY,2,1)) * 2

Multiplies the second digit in the SIN by two.

4*2=8

Four

TO_INTEGER(substr(out_SIN_KEY,5,1)) * 2

Multiplies the fifth digit in the SIN by two.

4*2=8

Six

TO_INTEGER(substr(out_SIN_KEY,8,1)) * 2

Multiplies the eighth digit in the SIN by two

4*2=8

Eight

TO_INTEGER(substr(out_SIN_KEY,10,1)) * 2

Multiplies the eighth tenth digit in the SIN by two.

8 * 2 = 16

Odd_Values

TO_INTEGER(substr(out_SIN_KEY,1,1)) +TO_INTEGER(substr(out_SIN_KEY,3,1)) +TO_INTEGER(substr(out_SIN_KEY,6,1)) +TO_INTEGER(substr(out_SIN_KEY,9,1)) +TO_INTEGER(substr(out_SIN_KEY,11,1))

Sums digits in position 1, 3, 6, 9, and 11 of the SIN.

0 + 6 + 5 + 2 + 4 = 17

Hld_sec

IIF(LENGTH(Sec) = 2, TO_INTEGER(substr(to_char(Sec),1,1)) + TO_INTEGER(substr(to_char(Sec),2,1)), Sec)

Adds the digits in Sec together if the length of Sec is two. Otherwise Hld_sec = Sec.

8

Hld_four

IIF(LENGTH(four) = 2, TO_INTEGER(substr(to_char(four),1,1)) + TO_INTEGER(substr(to_char(four),2,1)), four

Adds the digits in four together if the length of four is two. Otherwise Hld_four = four.

8

Hld_six

IIF(LENGTH(six) = 2, TO_INTEGER(substr(to_char(six),1,1)) + TO_INTEGER(substr(to_char(six),2,1)), six)

Adds the digits in six together if the length of six is two. Otherwise Hld_six = six.

8

Hld__eight

IIF(LENGTH(eight) = 2, TO_INTEGER(substr(to_char(eight),1,1)) + TO_INTEGER(substr(to_char(eight),2,1)), eight)

Adds the digits in eight together if the length of eight is two. Otherwise Hld_eight = eight.

7

Hold_value

hld_sec + hld_four +hld_six + hld_eight + odd_values

Adds all digits together.

51

High_number

((10 TO_INTEGER(substr(to_char(hold_value),2,1))))

Subtracts the last digit of Holdvalue from 10.

10 â€“ 1 = 9

Check_digit

substr(to_char(high_number + TO_INTEGER(substr(out_SIN_KEY,-1))),-1)

Adds 9 to the last character of out_SIN_KEY. The last character of the result is the checksum number.

4 + 9 = 13. The checksum number is 3.

New_value

iif(substr(out_SIN_KEY,11,1) = to_char(Check_digit), out_SIN_KEY, substr(out_SIN_KEY,1,10) || to_char(check_digit))

Returns the SIN if the last digit is equal to the check_digit, otherwise appends the check_digit to the first 10 characters of the SIN.

046-454-283

OUT_SIN

New_value

Returns the SIN number with a new check digit.

046-454-283

Canadian SIN Target Table The Output transformation receives the masked Canadian SIN from the Expression transformation.

5

Author Ellen Chandler Principal Technical Writer

Acknowledgements Kevin Ware designed the mapplet.

6