Friday, January 30, 2015

SAS in Mainframes(z/Os) Tutorial with examples - Use of CARD and DATALINES in SAS

The CARD or the DATALINES  statement is valid in the data step and is used to Read the real data values into the program. Generally while running the sas programs, we use an input file(a physical DASD file)  and give a reference to the Data step using INFILE keyword.
DATALINE is nothing but Reading the Data inline. We can also use the keyword CARD. Both are equivalent in terms of function. CARDS were used long back when punch cards were used in mainframe system.
Have a look into the sas program below.
//SAS       EXEC SAS                     
//WORK      DD SPACE=(CYL,(50,10),RLSE)  
//SYSIN     DD *                         
                                         
OPTION NOCENTER;                         
OPTION SORTLIB='';                       
                                         
  DATA COV;                              
  INPUT @23  ACCTTYPE $CHAR01.           
    @10  NMBER  $CHAR02.                 
    @24  ACCTNO   $CHAR10.               
     ;                                   
 DATALINES;                              
  111111111111111111112222222222233333333
  111111111111111111112222222222233333333
  111111111111111111112222222222233333333
  111111111111111111112222222222233333333
    ;                                    
 PROC PRINT DATA=COV;                             
   
 A very small program, but there is no input file in the JCL. We are reading the input data using the DATALINES keyword. IF we print the data we will get the same expected output.

Wednesday, January 7, 2015

DB2 Explain and PLAN table column names in DB2

Whenever we run a query,DB2 creates an access plan that specifies how it will access the requested data. This is created whenevea the sql is compiled at bind time for static sql and before execution for dynamic sql.
DB2 bases the access paths on the SQL statements and also on the statistics and configuration parameters of the system.
Even when an sql is made efficient, it can become inefficient as data grows. So, we need to run DB2 runstats  so as to keep updated statistics . DB2 config and storage can change and plans can be rebound. Db2 explain gives us the info for the plan, package, or SQL statement when it is bound. The output of the EXPLAIN  is stored in user-created table called Plan table. Whenever we want to tune a query, we need to go and check the plan table so as to get an idea of the access path DB2 optimizer is using.

How can we populate the PLAN Table ?
EXPLAIN(YES) option on the BIND/REBIND plan/package command
EXPLAIN ALL keyword in SPUFI or while running the query in batch.

Step 1. The SQL statement in blue is the main Query for which we want to know the access path.
So we wrap the statement with the 'EXPLAIN ALL SET QUERYNO = 1 FOR'  like below

EXPLAIN ALL SET QUERYNO = 1 FOR
SELECT CUSTNO, CUSTLNAME                       
FROM CUST                                                        
WHERE CUSTNO LIKE '%0A';

Once we execute the above query, optimizer first writes the access path onto the Plan Table and then gives the output. Step below depicts how we can get the information from plan table


Now, Let us check few of the columns in the PLAN table and its significance: Given in Blue are the names of the table columns.
QUERYNO: Query number assigned by the user
QBLOCKNO:A number that identifies each query block within a query.
APPLNAME:The name of the application plan for the row.
PROGNAME:The name of the program or package containing the statement being explained.Applies for the explain as a result of SQL queries embeded in application program.
TSLOCKMODE: Identifes the Tablespace lock mode.

These columns relate to the index usage:
ACCESSTYPE:Type of table INDEX usage as as follows:
R -Full table scan (uses no index) when the query is executed
I -Use an index. Data will be retrieved from index and not from table,
I1 -one-fetch scan (MIN or MAX) functions
N -Index scan (predicate uses an IN )
M -Multi-index scan followed
   MX By an index scan on the index named in ACCESSNAME
   MI By an intersection of multiple indexes
   MU By a union of multiple indexes
MATCHCOLS: For ACCESSTYPE I, I1, N or MX, the number of index keys used in an index scan; otherwise, 0.
ACCESSCREATOR:For ACCESSTYPE I, I1, N, or MX, the creator of the index; otherwise, blank.
ACCESSNAME: For ACCESSTYPE I, I1, N, or MX, the name of the index; otherwise, blank.
INDEXONLY: Whether access to an index alone is enough to carry out the step, or whether data too must be
accessed. Y=Yes; N=No

The plan table columns that relate to SORT usgae are as follows:
METHOD:
A number (0, 1, 2, 3, or 4) that indicates the join method used for the step:
0 First table accessed, continuation of previous table accessed, or not used.
1 Nested loop join. For each row of the present composite table, matching rows of a new table are
found and joined.
2 Merge scan join. The present composite table and the new table are scanned in the order of the
join columns, and matching rows are joined.
3 Sorts needed by ORDER BY, GROUP BY, SELECT DISTINCT, UNION, a quantified predicate, or an
IN predicate. This step does not access a new table.
4 Hybrid join. The current composite table is scanned in the order of the join-column rows of the
new table. The new table is accessed using list prefetch.

SORTN_UNIQ: Whether the new table is sorted to remove duplicate rows. Y=Yes; N=No.
SORTN_JOIN: Whether the new table is sorted for join method 2 or 4. Y=Yes; N=No.
SORTN_ORDERBY: Whether the new table is sorted for ORDER BY. Y=Yes; N=No.
SORTN_GROUPBY: Whether the new table is sorted for GROUP BY. Y=Yes; N=No.
SORTC_UNIQ: Whether the composite table is sorted to remove duplicate rows. Y=Yes; N=No.
SORTC_JOIN: Whether the composite table is sorted for join method 1, 2 or 4. Y=Yes; N=No.
SORTC_ORDERBY: Whether the composite table is sorted for an ORDER BY clause or a quantified predicate. Y=Yes;
N=No.
SORTC_GROUPBY: Whether the composite table is sorted for a GROUP BY clause. Y=Yes; N=No.
PREFETCH : Whether data pages are to be read in advance by prefetch.  If we dont want to use the sequentail prefetch for a particualr query,we need to add the clause
OPTIMIZE FOR 1 ROW to it.

 Read about basic DB2 Prefetch  

What we should be looking at:

1. Indexes enhance performance and and reduce costs. We need to look the ACCESSTYPE to see if
an index is being used.. An ACCESSTYPE of "R" means all the data must be scanned. and no
indexes are being used.
2. Look for MATCHCOLS to see how many  index keys are being ueed. The more the better.
3. Check for the column INDEXONLY . Value of 'Y' means data being retrieved from index and no table is involved. This is no doubt good in terms of performance. Booster will be to have the columns used in 'Where predicate' as indexes.
4. Avoid unnecessary sorts as auch as possible.
5.PREFETCH is good and will be in action when mostly the table space scan is used. Very effective when the table data is in clustered sequence.

Some Learnings
1. Don't misuse select statements
2. Use IN instead of multiple ORs
3. Join with as any of the index columns as possible.
4. Avoid Arithmetic expressions in where clause
5. Use NOT EXISTS instead of NOT IN for a suvbquery