Issuu on Google+

TPump

After completing this module, you will be able to: • State the capabilities and limitations of TPump. • Describe TPump commands and parameters. • Prepare a TPump script.


TPump • Allows near real-time updates from transactional systems into the warehouse. • Performs INSERT, UPDATE, and DELETE operations, or a combination, from the same source. Up to 63 DML statements can be included for one IMPORT task.

• Alternative to MultiLoad for low-volume batch maintenance of large databases; replacement for BulkLoad.

• Allows target tables to: – Have secondary indexes and Referential Integrity constraints. – Be MULTISET or SET. – Be populated or empty. – Have triggers - invoked as necessary • Allows conditional processing. • Supports automatic restarts; uses Support Environment. • No session limit — use as many sessions as necessary. • No limit to the number of concurrent instances. • Uses row-hash locks, allowing concurrent updates on the same table. • Can always be stopped and locks dropped with no ill effect. • Designed for highest possible throughput. • User can specify how many updates occur minute by minute; can be changed as the job runs.


TPump Limitations • • • • • • •

Use of SELECT is not allowed. Concatenation of data files is not supported. Exponential operators are not allowed. Aggregate operators are not allowed. Arithmetic functions are not supported. There is a limit of four IMPORT commands within a single TPump "load" task. In using TPump with dates before 1900 or after 1999, the year portion of the date must be represented by four numerals (yyyy).

– The default of two numerals (yy) to represent the year is interpreted to be the 20th century.

– The correct date format must be specified at the time of table creation.


.BEGIN LOAD Statement Many of the .BEGIN parameters are comparable to those for MultiLoad. .BEGIN LOAD SESSIONS ERRORTABLE ERRLIMIT CHECKPOINT TENACITY SLEEP

max [min] tablename errcount frequency hours minutes

(required) (defaults to jobname_ET) [errpercent] (default is 15 minutes) (default is 4) (default is 6)

However, TPump has numerous parameters on the .BEGIN LOAD statement that are unique to TPump. SERIALIZE ON | OFF PACK number PACKMAXIMUM RATE number LATENCY number NOMONITOR ROBUST ON | OFF MACRODB dbname

(default ON if UPSERT) (default is 20, max is 600) (use maximum pack factor) (default is unlimited) (range is 10 – 600 seconds) (default is monitoring on) (default is ON) (default is logtable dbase) ;


TPump Specific Parameters Specific TPump .BEGIN LOAD parameters are: SERIALIZE

ON | OFF

ON guarantees that operations on a given key combination (row) occur serially. Used only when a primary index is specified. KEY option must be specified if SERIALIZE ON.

PACK

statements

Number of statements to pack into a multiple-statement request.

RATE

rate

If the statement rate is zero or unspecified, the rate is unlimited.

LATENCY

seconds

# of seconds before a partial buffer is sent to the database.

NOMONITOR

Prevents TPump from checking for statement rate changes from or update status information for the TPump Monitor.

ROBUST

ON | OFF

OFF signals TPump to use “simple� restart logic; TPump will begin where the last checkpoint occurred.

MACRODB

dbname

Indicate a database to contain any macros used by TPump.


.BEGIN LOAD – PACK • PACK specifies the number of statements to pack into a multi-statement request.

• Improves network/channel efficiency by reducing the number of sends and receives between the application and Teradata.

• Increasing the PACK rate improves throughput performance – to a certain level.

• Restrictions to consider: – 64K message size limit – TPump limit of 600 statements – Teradata USING clause limit of 2560 columns (from 507) – Teradata Plastic Steps limit


.BEGIN LOAD – SERIALIZE OFF • With SERIALIZE OFF, transactions are processed in the order they are encountered and placed in the first available buffer. Buffers are sent to PE sessions and different PEs process the data independently of other PEs.

• SERIALIZE OFF does not guarantee the order in which transactions are processed. This set of transactions may be processed first. Transaction File Time PI 01 8:00 03 8:01 02 8:02 01 8:03 04 8:04 05 8:05 03 8:06 01 8:07 08 8:08 06 8:09 07 8:10 01 8:11 03 8:12 02 8:13

TPump Buffers 01 03 02 01 04 05 03 01 08 06 07 01

8:00 8:01 8:02 8:03 8:04 8:05 8:06 8:07 8:08 8:09 8:10 8:11 : :

Session 1

AMP 0

Session 2

01 03 02 01

8:00 8:01 8:02 8:03

04 05 03 01

8:04 8:05 8:06 8:07

08 06 07 01

8:08 8:09 8:10 8:11

03 02

8:12 8:13

AMP 1

AMP 2

AMP 3

Teradata

AMP …

AMP N


.BEGIN LOAD – SERIALIZE ON • SERIALIZE ON can eliminate lock delays or potential deadlocks caused by primary index collisions, improving performance.

• SERIALIZE guarantees both input record order and all records with the same PI value will be handled in the same session. It is recommended to specify the PI in the statement column(s) as KEY.

• KEY Fields determine the PE session in which TPump send the transaction to. Transaction File Time PI 01 8:00 03 8:01 02 8:02 01 8:03 04 8:04 05 8:05 03 8:06 01 8:07 08 8:08 06 8:09 07 8:10 01 8:11 03 8:12 02 8:13

TPump Buffers 01 02 01 01 03 04 05 03 08 06 01 02

8:00 8:02 8:03 8:07 8:01 8:04 8:05 8:06 8:08 8:09 8:11 8:13 : :

Session 1

AMP 0

Session 2

01 02 01 01

8:00 8:02 8:03 8:07

03 04 05 03

8:01 8:04 8:05 8:06

08 06 01 02

8:08 8:09 8:11 8:13

07 03

8:10 8:12

AMP 1

AMP 2

AMP 3

Teradata

AMP …

AMP N


.BEGIN LOAD – ROBUST ON • ROBUST ON is the default for all TPUMP jobs. • This option avoids re-applying rows that have already been processed in the event of a restart.

• Causes a row to be written to the log table each time a buffer has successfully completed its updates. – The larger the TPump PACK factor, the less overhead involved in this activity.

• These rows are deleted from the log when a checkpoint is taken. • ROBUST ON is recommended for these specific conditions: – INSERTS into multi-set tables, as such tables will allow re-insertion of the same rows multiple times.

– When UPDATEs are based on calculations or percentage increases. – If PACK factors are large, and applying and rejecting duplicates after a restart would be time-consuming.

– If data is time-stamped at the time it is inserted into the database.

• ROBUST ON is always a good idea for TPump jobs that read from queues. It keeps duplicates from being re-inserted into the table in the event of a restart.


Sample TPump Script (1 of 2) .LOGTABLE restart_log_tpp; .LOGON tdpid/username,password; .BEGIN LOAD

SESSIONS 4 PACK 40 ERRORTABLE Errors_tpp

.LAYOUT layout12; .FIELD table_code .FIELD A_Account_Number .FIELD A_Number .FIELD A_Street .FIELD A_City .FIELD A_State .FIELD A_Zip_Code .FIELD A_Balance_Forward .FIELD A_Balance_Current .FIELD C_Customer_Number .FIELD C_Last_Name .FIELD C_First_Name .FIELD C_Social_Security .FIELD T_Trans_Number .FIELD T_Trans_Date .FIELD T_Account_Number .FIELD T_Trans_ID .FIELD T_Trans_Amount

1 2 * * * * * * * 2 * * * 2 * * * *

SERIALIZE OFF RATE 4800 ERRLIMIT 50 ;

CHAR(1); INTEGER; INTEGER; CHAR(25); CHAR(20); CHAR(2); INTEGER; DECIMAL(10,2); DECIMAL (10,2); INTEGER; CHAR(30); CHAR(20); INTEGER; INTEGER; CHAR(10); INTEGER; CHAR(4); DECIMAL(10,2);


Sample TPump Script (2 of 2) .DML LABEL lns_Account; INSERT INTO Accounts (account_number, number, street, city, state, zip_code, balance_forward, balance_current ) VALUES ( :A_Account_Number, :A_Number, :A_Street, :A_City, :A_State, :A_Zip_Code, :A_Balance_Forward, :A_Balance_Current ); .DML LABEL lns_Trans; INSERT INTO Trans (trans_number, trans_date, account_number, trans_id, trans_amount) VALUES ( :T_Trans_Number, :T_Trans_Date, :T_Account_Number, :T_Trans_Id, :T_Trans_Amount ); .DML LABEL lns_Customer; INSERT INTO Customer (customer_number, last_name, first_name, social_security) VALUES ( :C_Customer_Number, :C_Last_Name, :C_First_Name, :C_Social_Security); .IMPORT INFILE datafile1 LAYOUT layout12 APPLY lns_Account WHERE table_code = 'A' APPLY lns_Trans WHERE table_code = 'T' APPLY lns_Customer WHERE table_code = 'C'; .IMPORT INFILE datafile2 LAYOUT layout12 APPLY lns_Account WHERE table_code = 'A' APPLY lns_Trans WHERE table_code = 'T' APPLY lns_Customer WHERE table_code = 'C'; .END LOAD; .LOGOFF;


TPump Compared with MultiLoad • MultiLoad performance improves as the volume of changes increases. • TPump does better on relatively low volumes of changes. • TPump improves performance via a multiple statement request. • TPump uses macros to modify tables rather than the actual DML commands. Ex. of macro name - M2000216_105642_01_0001

• MultiLoad uses the DML statements. • TPump uses row hash locking to allow for concurrent read and write access to target tables. It can be stopped with target tables fully accessible.

• In Phase 4, MultiLoad locks tables for write access until it completes.


Additional TPump Statements DATABASE

Changes the default database qualification for all DML statements.

EXEC(UTE)

Specifies a user-created macro for execution. The macro named is resident in the Teradata database. DATABASE database ; EXECUTE [database.]macro_name

UPDATE/UPD INSERT/INS DELETE/DEL UPSERT/UPS

Commands and statements in common with MultiLoad: ACCEPT DELETE DISPLAY DML FIELD FILLER IF / ELSE / ENDIF

IMPORT INSERT LAYOUT LOGON LOGOFF LOG ROUTE

RUN SET SYSTEM TABLE UPDATE

;


TPump Statistics . . Candidate records considered:.....‌ Apply conditions satisfied:.......‌ Candidate records not applied:....... Candidate records rejected:..........

IMPORT 1 ========= 200 200 0 0

Total thus far =========== 200 200 0 0

** Statistics for Apply Label : UPS_ACCOUNT Type Database Table or Macro Name U TLJC25 Accounts I TLJC25 Accounts

Activity 100 100

**** 17:33:50 UTY0821 Error table TLJC25.errtable_tpp is EMPTY, dropping table. 0018 .LOGOFF; ===================================================================== = = = Logoff/Disconnect = = = ===================================================================== **** 17:34:08 UTY6216 The restart log table has been dropped. **** 17:34:08 UTY6212 A successful disconnect was made from the RDBMS. **** 17:34:08 UTY2410 Total processor time used = '2.43 Seconds' . Start : 17:33:13 - TUE MAY 06, 2003 . End : 17:34:08 - TUE MAY 06, 2003 . Highest return code encountered = '0'.

Note: These statistics are not for the example TPump job shown earlier in this module.


TPump Monitor Tool to control and track TPump imports.

• The table SysAdmin.TPumpStatusTbl is updated once a minute. • Alter the statement rate on an import by updating this table using macros.

• Use macros and views to access this table. DBA Tools View

• SysAdmin.TPumpStatus - view allows DBAs to view all of the TPump jobs. Macro

• SysAdmin.TPumpUpdateSelect - allows DBAs to manage individual TPump jobs. User Tools View

• SysAdmin.TPumpStatusX - allows users to view their own TPump jobs. Macro

• TPumpMacro.UserUpdateSelect - allows users to manage their own jobs.


Application Utility Checklist Feature

BTEQ

FastLoad

FastExport

MultiLoad

TPump

DDL Functions

ALL

LIMITED

No

ALL

ALL

DML Functions

ALL

INSERT

SELECT

Multiple DML

Yes

No

Yes

Yes

Yes

Multiple Tables

Yes

No

Yes

Yes

Yes

Multiple Sessions

Yes

Yes

Yes

Yes

Yes

Protocol Used

SQL

FASTLOAD

EXPORT

MULTILOAD

SQL

Conditional Expressions

Yes

No

Yes

Yes

Yes

Arithmetic Calculations

Yes

No

Yes

Yes

No

Data Conversion

Yes

1 per column

Yes

Yes

Yes

Error Files

No

Yes

No

Yes

Yes

Error Limits

No

Yes

No

Yes

Yes

User-written Routines

No

Yes

Yes

Yes

Yes

INS/UPD/DEL INS/UPD/DEL


Summary •

Allows near real-time updates from transactional systems into the warehouse.

Performs INSERTs, UPDATEs, and DELETEs to more than 60 tables at a time.

Alternative to MultiLoad for low-batch maintenance of large databases; replacement for BulkLoad.

• • •

Uses row-hash locks, allowing concurrent updates on the same table.

No arithmetic functions or file concatenations.

Can always be stopped and locks dropped with no ill effect. User can specify how many updates occur minute by minute; can be changed as the job runs.


Review Questions Match the item in the first column to its corresponding statement in the second column. _____ 1. TPump purpose

A. Query against TPump status table

_____ 2. MultiLoad purpose

B. Concurrent updates on same table

_____ 3. Row hash locking

C. Low-volume changes

_____ 4. PACK

D. Use to specify how many statements to put in a multi-statement request

_____ 5. MACRO

E. Large volume changes

_____ 6. Statement rate change

F. Used instead of DML


Review Question Answers Match the item in the first column to its corresponding statement in the second column. __C__ 1. TPump purpose

A. Query against TPump status table

__E__ 2. MultiLoad purpose

B. Concurrent updates on same table

__B__ 3. Row hash locking

C. Low-volume changes

__D__ 4. PACK

D. Use to specify how many statements to put in a multi-statement request

__F__ 5. MACRO

E. Large volume changes

__A__ 6. Statement rate change

F. Used instead of DML


Lab Exercises Lab Exercise Purpose In this lab, you will perform an operation similar to lab 7-2, using TPump instead of MultiLoad. For this exercise, use a PACK of 20 and a RATE of 2400. What you need Data file (data8_1) created from macro AU.Lab8_1. Tasks 1. Delete all rows from the Accounts Table and use the following INSERT/SELECT to create 100 rows of test data: INSERT INTO Accounts SELECT * FROM AU.Accounts WHERE Account_Number LT 20024101 ; 2. Export data to the file data8_1 using the macro AU.lab8_1. 3. Prepare a TPump script which performs an UPSERT operation (INSERT MISSING UPDATE) on your Accounts table as a single operation. Use the data from data8_1 as input to the UPSERT script. If the row exists, UPDATE the Balance_Current with the appropriate incoming value. If not, INSERT a row into the Accounts table. In your script, be sure to set a statement rate. 4. Run the script. 5. Validate your results. TPump should have performed 100 UPDATES and 100 INSERTS with a final return code of zero.


Lab Solutions cat lab813.tpp .LOGTABLE Restartlog813_tpp ; .LOGON u4455/tljc30,tljc30 ; .BEGIN LOAD SESSIONS 4 PACK 40 RATE 4800; .LAYOUT Record_Layout_813; .FIELD in_accountno 1 INTEGER KEY; .FIELD in_number * INTEGER; .FIELD in_street * CHAR(25); .FIELD in_city * CHAR(20); .FIELD in_state * CHAR(2); .FIELD in_zip_code * INTEGER; .FIELD in_balancefor * DECIMAL (10,2); .FIELD in_balancecur * DECIMAL (10,2); .DML LABEL Fix_Account DO INSERT FOR MISSING UPDATE ROWS ; UPDATE Accounts

SET WHERE

Balance_Current = :in_balancecur Account_Number = :in_accountno ;

INSERT INTO Accounts VALUES (:in_accountno, :in_number, :in_street, :in_city, :in_state, :in_zip_code, :in_balancefor, :in_balancecur); .IMPORT INFILE data8_1 LAYOUT Record_Layout_813 APPLY Fix_Account; .END LOAD; .LOGOFF; tpump < lab813.tpp > lab813.out


TPUMP