Showing posts with label DB2. Show all posts
Showing posts with label DB2. Show all posts

Friday, June 24, 2016

COALESCE function in DB2 . Alternative to NULL indicator variable in DB2 ; SQLCODE - 305.

If a table column is not defined with NOT NULL, then it implies that the column value can be NULL sometimes. For such columns,if we do not use null handling technique, then the application program will give sqlcode -305.
To avoid that we use indicator variables.However , we can also use the DB2 COALESCE  function.

COALESCE is a function that allows you to substitute any numeric or character value for a null, based on the column data type.
Suppose in DEPT table, i have one column for optional subject(optsubj). It can have a value of 'Y'or 'N'. However since the column is not defined with NOT NULL keyword,it can fetch null values.
If we simply use the query below, our application program will return sqlcode -305.
SELECT fname
              ,lname
              ,optsubj
into       :hv-fname
             :hv-lname
             :hv-optsubj
from DEPT

However, we can tweak the same query in the below way using COALESCE function to get rid of -305 without using indicator variable

SELECT fname
       ,lname
       ,coalesce(optsubj,space(1))
into   :hv-fname
       :hv-lname
       :hv-optsubj
from DEPT

What it will do is, it will replace NULL value with space of 1 byte whenever the column optsubj fetches a null value for any record;
This function can be used in many ways to get desired results. 

Sunday, March 27, 2016

FETCH FIRST N ROWS Vs OPTIMIZE FOR N ROWS in DB2

Very Often we use these two statements in our SQL query,ie, OPTIMIZE FOR N ROWS and FETCH FIRST N ROWS ONLY.
Although it may sound similar,but both of them have specific usage when it comes to the optimizer. This post we will try to see difference between the two.

As the name suggests FETCH FIRST 5 ROWS ONLY, it actually puts a limitation on the number of rows the query is returning.I am assuming here n=5. So, here user gets only 5 rows even though there can be 100 qualifying rows for that query.

OPTIMIZE FOR 5 ROWS will straight way influence the optimizer.It does not limit the number of rows SQL will return. So using this , we will get all the qualifying rows,may be incrementally.Intent here is to give priority to retrieve first few rows.Once optimizer understands this ,it will give preference to access plan that will minimize the response time.So,DB2 will send the rows in a single block. Consider the below scenario where this will come handy.

You have an online application where the screen can hold details of 5 customers at a time. Suppose you have the below query to fetch the customer details

Select Fname, Lname, Dept, Sal,Rollno
  From Dept table 
Order by Sal Desc;

We are assuming the index to be defined on Rollno. If a descending index also exists on SAL column, its likely to have low cluster ratio.
Without using the OPTIMIZE clause, DB2 will do a full table space scan and sort on sal column to fetch you the details.

Now on Adding  'ÓPTIMIZE FOR 5 ROWS' to the query, DB2 will use SAL index directly because it knows you need  the details of 5 highest paid employees.
So, it displays 5 rows to the screen and depending on user input will process the  next 5 or do something else.Here DB2 encourage matching index scan and would not prefer list or sequential prefetch. Access plan selected for 5 records may not be good for 100 rows.

Thus, this OPTIMIZE clause comes handy for queries where you can process incrementally.We can use it to get rows directly from table without going through any buffering operations like sorting. 

Saturday, March 19, 2016

DB2 unload utility using the program DSNTIAUL

IBM provides two methods of unloading data from a DB2 table. One is DSNTIAUL program that is run under IKJEFT01 or IKJEFT1A, or 1B and the other being  DSNUTILB program. DSNUTILB is a fairly new and is considered IBM’s Fast Unload Utility. Each has unique functionality that you may want to explore prior to choosing one. 

Outlined below are functions of DSNTIAUL process. 

1. DSNTIAUL Program: The DSNTIAUL program allows two type of unloads to be invoked against a DB2 table. When invoking DSNTIAUL you may specify an entire table unload or a selective SQL unload. The key DD statements that are being used are the SYSIN and the SYSTSIN. Both types of unloads are being run under IKJEFT1B.

Basic JCL example:
//UNLOAD EXEC PGM=IKJEFT1B, DYNAMNBR=20
//STEPLIB DD DSN=SYSDB2.DQA0.SDSNLOAD, DISP=SHR
// DD DSN=SUBSYS.DB2.DSNEXIT, DISP=SHR
//*************************************************************
//SYSPRINT DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SYSREC00 DD DSN=userid.tablerecords.dataset,
//          DISP=(,CATLG,DELETE),
//       UNIT=SYSDA,SPACE=(CYL,(300,300),RLSE)
//*
//SYSPUNCH DD DSN=user-id.tablelayout.dataset,
//     UNIT=SYSDA,SPACE=(CYL,(1,1),RLSE),DISP=(,CATLG,DELETE)
//SYSTSIN DD *
SYSTEM (SUBSYS) RUN PROGRAM (DSNTIAUL) PLAN (DSNTIAUL) -
LIBRARY (‘SUBSYS.DB2.RUNLIB.LOAD’)
END
//SYSIN DD *
DBCREATOR.TABLENAME 
/*
//*


For selective Unload, like when you are using a where clause or using select statements (like SELECT * from user_table;)  we need to use PARM('SQL') in SYSTSIN statement; else you may endup with SQLCODE -104.
Your SYSTSIN would look like

//SYSTSIN DD *
SYSTEM (SUBSYS) RUN PROGRAM (DSNTIAUL)
         PLAN (DSNTIAUL) -
LIBRARY (‘SUBSYS.DB2.RUNLIB.LOAD’) PARM('SQL')
END 
//SYSIN DD *
SELECT * FROM schema.Tablename ;

Tuesday, January 12, 2016

How to find the plan and package name in DB2 from program name

Suppose we have a cobol-db2 module named 'PGM-ABC'
We want to bind it and run the program. But we are not sure about the package and plan details.So where do we find it from? SYSIBM tables will give us the details we need.Lets check it out.
With the program name we can query SYSIBM.SYSPACKAGE table giving the name of the program(PGM-ABC) against the column 'NAME'.  We can use DB2 file-aid tool or use the below  query in SPUFI.

Select location, collid,name ,contoken,timestamp,bindtime from SYSIBM.SYSPACKAGE where NAME = 'PGM-NAME' ;

Once we get the COLLID,we need to query the SYSIBM.SYSPACKLIST table to get the plan name

SELECT PLANNAME,LOCATION,COLLID FROM SYSIBM.SYSPACKLIST WHERE COLLID='colld-id-name-we-got-above';

From experience, we should try to select the latest rows from the tables checking the TIMESTAMP fields; or check what other ppl have used in recent times.

Wednesday, November 25, 2015

COBOL DB2 Program from scratch - step by step guide

This post is just to help/guide to a new programmer in writing a cobol db2 program from scratch.
Let us write  a cobol program  to read the EMPLOYEE table and get  the details of the employee with name 'RYAN'.
I am assuming the table is already created in the user database like below:.

EMPLOYEE
EMPID EMPNAME     DEPARTMENT          SALARY              DESIGNATION
1000     XXXXXXX       XX                     10000                  SE
1001     YYYYYYY          YY                      9000                    SE
1002     ZZZZZZZ       ZZ                     20000                   TL


STEP1:  We need to declare the  Table structure in the Working Storage section. Ideally we should DCLGEN Tool to generate the structure for our DB2 table. The option to go to our DCLGEN tool depends on the ISPF settings. Generally it can be invoked using 'D' option on the ispf menu to display DB2I Default pannel.
DCLGEN Screen 
On pressing ENTER, the DCLGEN will be generated and will look like below.

*****************************************************************
EXEC SQL DECLARE DEV.EMPLOYEE TABLE
 ( EMPID CHAR(10) NOT NULL,
   EMPNAME CHAR(30) NOT NULL,
   DEPARTMENT CHAR(2) NOT NULL,
  SALARY DECIMAL(10,2) NOT NULL,
  DESIGNATION CHAR(4) NOT NULL )
)
END-EXEC.
*************** COBOL DECLARATION FOR TABLE DEV.EMPLOYEE *********
01 EMPOYEE-RECORD.
    05 HV-EMPID PIC X(10).
    05 HV-EMPNAME PIC X(30).
    05 HV-DEPARTMENT PIC X(2).
    05 HV-SALARY PIC S9(8)V99 COMP-3.
    05 HV-DESIGNATION PIC CHAR(4).
*********** THE NUMBER OF COLUMNS IN THIS DECLARATION IS 5 *****
This DCLGEN  needs to be included into the Working Storage Section of our cobol program in the following way:
EXEC SQL
INCLUDE  EMPLOYEE
END-EXEC.
Also, the most important copybook SQLCA needs to be included . Apart from this we wont be able to capture the SQL Return codes

EXEC SQL
INCLUDE  SQLCA
END-EXEC.
Also since our query in the program might return more than single row, we need cursors in our program. Read About cursor programming here
I am not going into the details of cursor programming here, since those are there in other posts.
Once the program is ready and compiled , we need to bind it to a plan in the test region. Once the bind is successful, we can run the program using  the IKJEFT01 utility as below.

cobol db2 run jcl

Wednesday, June 24, 2015

Identity Column in DB2

In order to maintain uniqueness in the values entered in db2, we use the UNIQUE constraint for one of the column in db 2. Creation of  unique index, primary key are also there in the list by which we can maintain data uniqueness. However there are methods to generate unique values in the columns.

1. Creating IDENTITY column while defining the table (introduced in DB2  version 7.1 and later)
2. SEQUENCE (from DB2 version  7.2 onward)

Lets check about IDENTITY column in this post.
Identity columns offer us the possibility to guarantee uniqueness of a column and to be able to automatically generate unique value. An identity column is a Numeric column, either SMALLINT, INTEGER, or DECIMAL with a scale of zero, or a user defined distinct type based on any of these data types, which is UNIQUE and NOT NULL by definition.
As such there are no rules when to use identity columns. Still we can consider the below scenario when we can use IDENTITY column.

When can we use IDENTITY COLUMN
Suppose we have  a multi-column primary key(composite key) and the table has several dependent tables, you have to ‘copy’ many columns to the dependent tables to build the foreign keys. This makes the keys very long. Having many columns in the index also makes the index grow quite large. Instead of using the long key as the primary and foreign key, you can use an artificial unique identifier for the parent table and use that generated value as a primary key and foreign key for the dependent table.
Another use for identity columns is when you just need a generated unique column for a table. If we do not need to reference the rows by this column, then there is no need to even create an index for it, and uniqueness is guaranteed by the system for generated values

The Identity Columns can be used in table in two ways :
    1. The value for identity column is always generated by the DB 2.(GENERATED ALWAYS keyword in the column declaration ; see below for syntax)
    2. The value is inserted explicitly by user. And if used don't specify any value then the value is generated by the DB2.(GENERATED BY DEFAULT)

Syntax :

CREATE TABLE <table_name>
(
    <column1> datatype
    GENERATED ALWAYS/GENERATED BY DEFAULT
    AS IDENTITY
    (
    START WITH <numeric constant>,
    INCREMENT BY <numeric constant>,
    NOMINVALUE / MINVALUE <integer constant>,
    NOMAXVALUE / MAXVALUE <integer constant> 
    NOCYCLE / CYCLE,
    NOCACHE / CACHE <integer constant>,
    NOORDER / ORDER
    ),

    <column2> datatype,
    <column3> datatype,
    ...............................
    ...............................
)

Running a query to create Identity column:

CREATE TABLE PROD_DEAL
( ROW_ID INT NOT NULL GENERATED ALWAYS AS IDENTITY 
                         (START WITH 1, INCREMENT BY 1, NO CACHE), 
 ORDER_NO CHAR(6), 
 INV_COUNT INT WITH DEFAULT 0 
);

Suppose we have run the INSERT QUERY 5 times for five different ORDER_NO.

INSERT INTO PROD_DEAL(ORDER_NO,INV_COUNT) VALUES('A00001',11);
INSERT INTO PROD_DEAL(ORDER_NO,INV_COUNT) VALUES('A00002',22);
INSERT INTO PROD_DEAL(ORDER_NO,INV_COUNT) VALUES('A00003',33);
INSERT INTO PROD_DEAL(ORDER_NO,INV_COUNT) VALUES('A00004',44);
INSERT INTO PROD_DEAL(ORDER_NO,INV_COUNT) VALUES('A00005',55);


Output:
ROW_ID   ORDER_NO  INV_COUNT
------------------------------------------------
11000001  A00001         11
11000002  A00002         22
11000003  A00003         33
11000004  A00004         44
11000005  A00005         55

We can see the ROW_Id generating unique values each time the insert query ran.

Wednesday, January 7, 2015

DB2 Explain and PLAN table column names in DB2

Whenever we run a query,DB2 creates an access plan that specifies how it will access the requested data. This is created whenevea the sql is compiled at bind time for static sql and before execution for dynamic sql.
DB2 bases the access paths on the SQL statements and also on the statistics and configuration parameters of the system.
Even when an sql is made efficient, it can become inefficient as data grows. So, we need to run DB2 runstats  so as to keep updated statistics . DB2 config and storage can change and plans can be rebound. Db2 explain gives us the info for the plan, package, or SQL statement when it is bound. The output of the EXPLAIN  is stored in user-created table called Plan table. Whenever we want to tune a query, we need to go and check the plan table so as to get an idea of the access path DB2 optimizer is using.

How can we populate the PLAN Table ?
EXPLAIN(YES) option on the BIND/REBIND plan/package command
EXPLAIN ALL keyword in SPUFI or while running the query in batch.

Step 1. The SQL statement in blue is the main Query for which we want to know the access path.
So we wrap the statement with the 'EXPLAIN ALL SET QUERYNO = 1 FOR'  like below

EXPLAIN ALL SET QUERYNO = 1 FOR
SELECT CUSTNO, CUSTLNAME                       
FROM CUST                                                        
WHERE CUSTNO LIKE '%0A';

Once we execute the above query, optimizer first writes the access path onto the Plan Table and then gives the output. Step below depicts how we can get the information from plan table


Now, Let us check few of the columns in the PLAN table and its significance: Given in Blue are the names of the table columns.
QUERYNO: Query number assigned by the user
QBLOCKNO:A number that identifies each query block within a query.
APPLNAME:The name of the application plan for the row.
PROGNAME:The name of the program or package containing the statement being explained.Applies for the explain as a result of SQL queries embeded in application program.
TSLOCKMODE: Identifes the Tablespace lock mode.

These columns relate to the index usage:
ACCESSTYPE:Type of table INDEX usage as as follows:
R -Full table scan (uses no index) when the query is executed
I -Use an index. Data will be retrieved from index and not from table,
I1 -one-fetch scan (MIN or MAX) functions
N -Index scan (predicate uses an IN )
M -Multi-index scan followed
   MX By an index scan on the index named in ACCESSNAME
   MI By an intersection of multiple indexes
   MU By a union of multiple indexes
MATCHCOLS: For ACCESSTYPE I, I1, N or MX, the number of index keys used in an index scan; otherwise, 0.
ACCESSCREATOR:For ACCESSTYPE I, I1, N, or MX, the creator of the index; otherwise, blank.
ACCESSNAME: For ACCESSTYPE I, I1, N, or MX, the name of the index; otherwise, blank.
INDEXONLY: Whether access to an index alone is enough to carry out the step, or whether data too must be
accessed. Y=Yes; N=No

The plan table columns that relate to SORT usgae are as follows:
METHOD:
A number (0, 1, 2, 3, or 4) that indicates the join method used for the step:
0 First table accessed, continuation of previous table accessed, or not used.
1 Nested loop join. For each row of the present composite table, matching rows of a new table are
found and joined.
2 Merge scan join. The present composite table and the new table are scanned in the order of the
join columns, and matching rows are joined.
3 Sorts needed by ORDER BY, GROUP BY, SELECT DISTINCT, UNION, a quantified predicate, or an
IN predicate. This step does not access a new table.
4 Hybrid join. The current composite table is scanned in the order of the join-column rows of the
new table. The new table is accessed using list prefetch.

SORTN_UNIQ: Whether the new table is sorted to remove duplicate rows. Y=Yes; N=No.
SORTN_JOIN: Whether the new table is sorted for join method 2 or 4. Y=Yes; N=No.
SORTN_ORDERBY: Whether the new table is sorted for ORDER BY. Y=Yes; N=No.
SORTN_GROUPBY: Whether the new table is sorted for GROUP BY. Y=Yes; N=No.
SORTC_UNIQ: Whether the composite table is sorted to remove duplicate rows. Y=Yes; N=No.
SORTC_JOIN: Whether the composite table is sorted for join method 1, 2 or 4. Y=Yes; N=No.
SORTC_ORDERBY: Whether the composite table is sorted for an ORDER BY clause or a quantified predicate. Y=Yes;
N=No.
SORTC_GROUPBY: Whether the composite table is sorted for a GROUP BY clause. Y=Yes; N=No.
PREFETCH : Whether data pages are to be read in advance by prefetch.  If we dont want to use the sequentail prefetch for a particualr query,we need to add the clause
OPTIMIZE FOR 1 ROW to it.

 Read about basic DB2 Prefetch  

What we should be looking at:

1. Indexes enhance performance and and reduce costs. We need to look the ACCESSTYPE to see if
an index is being used.. An ACCESSTYPE of "R" means all the data must be scanned. and no
indexes are being used.
2. Look for MATCHCOLS to see how many  index keys are being ueed. The more the better.
3. Check for the column INDEXONLY . Value of 'Y' means data being retrieved from index and no table is involved. This is no doubt good in terms of performance. Booster will be to have the columns used in 'Where predicate' as indexes.
4. Avoid unnecessary sorts as auch as possible.
5.PREFETCH is good and will be in action when mostly the table space scan is used. Very effective when the table data is in clustered sequence.

Some Learnings
1. Don't misuse select statements
2. Use IN instead of multiple ORs
3. Join with as any of the index columns as possible.
4. Avoid Arithmetic expressions in where clause
5. Use NOT EXISTS instead of NOT IN for a suvbquery

Tuesday, August 5, 2014

DB2 Plan +Package+ and Collection ID ? Frequently Asked questions in interview

Having Covered the Basic steps in a cobol-db2 compilation process, lets have a bit of more clarification & brain storming to check our understanding :)

Q 1: Syntax Checking of the SQL statements are done in Precompilation time or Bind time?

Both. Let me explain it in details a bit. We all are aware of DCLGENS and we use INCLUDE statements in our cobol module to include the DCLGENS. Precompilation process only uses DCLGEN as the reference for table description (provided we use DCLGEN).It does not go and check the DB2 catalog for the table structure. If no DCLGEN is used, no synatx checking is done in precompilation time.
But,  During BIND time, the check is more rigorous and thorough because the DB2 catalog is accessed to check the table structure along with other information like access permission and others

Q2: Can 1 package have more than 1 DBRM ?
From 1 program, we get one DBRM and then we bind the DBRM into a package; in the sense, 1 package is the DB2-optimized version of the that single  DBRM. Each Package is nothing but a single bound DBRM. There is direct one to one relationship with application program and package.
So, 1 DBRM=1 PACKAGE

Q3: Are packages executable?
No. In DB2, only plan is the executable module.

Will keep on adding as and when i come across!!..  suggestions most welcome!

Wednesday, July 2, 2014

Basics of Indexing-Use of index with Example

In the performance tuning posts (db2 performance issues and tuning part 1part 2) , we have discussed how indexing can help tuning database performance to a great level but, today we are going to discuss about the proper design of indexes.

As indexes provide easy access to the data rows in a table just like a book’s index, it speeds up the query execution process and thus helps to tune the database performance.
But, what it does actually and how it does?
An example will help us to understand it in a better way.
Say we have a tblOrder table with 100 rows and three simple columns: OrderId, ItemName and ItemPrice.
We will try to find out the orders within a specific price range:
The query would look like:
SELECT OrderId, ItemName, ItemPrice
FROM tblOrder
WHERE ItemPrice>10 AND ItemPrice<20

Currently, we assume that this table does not have any index. So, to execute the above query, the processor starts searching each and every row one by one to get the required data.
The below diagram depicts the actual processing done to do so. It definitely becomes an overhead to search all the 100 rows only to get a few set of records.

Now, let us add an index on the ItemPrice data column of this table.
The query will look like:
CREATE INDEX IX_tblOrder_ItemPrice
ON tblOrder (ItemPrice ASC)
It acts just like the index in a book. Adding an index to this column means, it stores the copy of the column’s data and a reference to the row where the actual data is stored. These index entries are also sorted in ascending order by SQL.
The pictorial representation will be somewhat similar to below:
 
Now, when we execute the above query again with an index added to the table, it will first find the value in the index and then it will take the reference stored in that index to point the actual data row.  
Thus index makes the query processing fast and wipes out the overhead of scanning each and every row in a table.
 
Index Structure
An index is a set of pages or index nodes which are organized in a structure called B-tree structure. It is a hierarchical structure with the root node placed at the top and the leaf nodes at the bottom of the hierarchy as demonstrated in the below figure.


The indexing works in the following way:
When a query is fired against an indexed column, the query processor starts scanning from the top which is the root node. Slowly it navigates down the intermediate levels or the next levels. The more we go down through the intermediate level, the more granular division comes into picture. The processor continues scanning the index nodes until the leaf node is arrived.
Let us take an example of searching the value of 87 in an indexed column. As it is said earlier, the processor starts scanning from the root node and then gradually it goes down through the intermediate levels till the leaf nodes are reached. Here, we are trying to find the value of 87 which leads to the 1st page of the 1st intermediate level, i.e. the page with 1-100 values. Now it further goes down to get the exact value and it understands that the value can be there in the 2nd page of the 2nd intermediate level which is 51-100. As 51-100 is also not the leaf node, it searches down further and finally reaches to the leaf node and points to the page 76-100 which is the 4th page of the leaf node level.
The leaf node may contain the full data row or the reference to that row. It solely depends on the type of index that we declare such as, clustered index or non-clustered index.

We will discuss the types of indexes in our next post…Stay tuned…