To start with lets check some basic sas concepts which comes into play when ever you run a simple SAS program.
The two primary steps in a sas program
1. SAS DATA step
2. SAS PROC step
DATA steps typically create or modify SAS data sets. They can also be used to produce custom-designed reports. For example, you can use DATA steps to
put your data into a SAS data set
- compute values
- check for and correct errors in your data
- produce new SAS data sets by subsetting, merging, and updating existing data sets.
PROC (procedure) steps are pre-written routines that enable you to analyze and process the data in a SAS data set and to present the data in the form of a report
For example, you can use PROC steps to
- create a report that lists the data
- produce descriptive statistics
- create a summary report.
SAS program flow |
Feature of a SAS program:
- It usually begins with a SAS keyword.
- It always ends with a semicolon.
In the DATA step, we introduce the input file,ie the external file (supplied in the DD name in JCL in case of mainframe) to SAS. Data step begins with DATA keyword.Also we take declare the layout of the field. As an example,
DATA CUST;
INFILE CUSTPOL;
INPUT @23 ACCTNO $CHAR01.
@60 POLNO $CHAR10.
@76 STAT $CHAR01.
@224 POLEFFDT 8.
@240 APPRCDT 8.
;
Here, i have highlighted the SAS keywords in blue. CUST will be the name of the sas Dataset which SAS will prepare internally once this step is executed.
INFILE CUSTPOL : Here the CUSTPOL is the name of the physical(external) file from which the data is to be read. INPUT will take only those fields from the specific positions and only those fields will be present in the SAS dataset CUST.
The above SAS DATA step is processed in 2 phases.
A) Compilation phase: Each of the statements are checked for syntax errors. Once it completes,execution begins.
B) Execution phase: Data is read and executed unless otherwise coded.
Some of the terms which comes with SAS data processing (Just a bit of knowledge is good)are:
Input Buffer:
During Compilation phase, an input buffer(memory area) is created to hold the records from file. It is created when the raw data is read. It is just a logical concept.
Program Vector Data:
When the data is read, SAS builds a data set in the memory(which is very much internal to SAS) known as SAS data set.
This Program Vector contains automatic variables that can be used to track the number of observations,and comes handy in many ways.
1. _N_ counts the number of times that the DATA step begins to execute.
2 . _ERROR_ signals the occurrence of an error that is caused by the data during execution.
The default value is 0, which means there is no error. When one or more errors occur, the value is set to 1.
At the beginning of the execution phase, the value of
_N_
is 1. Because there are no
data errors, the value of _ERROR_
is 0.When we define the DATA step, we should try to use the minimum variables. Unnecessary declaration of the variables makes the SAS internal dataset bigger which can lead to more execution time.
During execution, each record in the input raw data file is read, stored in the program data vector, and then written to the new data set as an observation.
At the end of the DATA step, several actions occur. First, the values in the program data vector are written to the output data set as the first observation.
Log Messages
Each time SAS executes its step, it writes log . In z/os environment, it will be written to SASLOG.Looks like below. It shows the number of records read, the number of records which gets selected in the criteria and finally goes into sas dataset.
NOTE: 17430 records were read from the infile CUSTPOL.
The minimum record length was 600.
The maximum record length was 636.
NOTE: The data set WORK.CUSTPOL has 5818 observations and 12 variables.
NOTE: The DATA statement used the following resources:
CPU time - 00:00:00.07
Elapsed time - 00:00:02.99
EXCP count - 5998
Task memory - 4904K (148K data, 4756K program)
Total memory - 17710K (3488K data, 14222K program)
Timestamp - 12/19/2014 2:43:29 AM
NOTE: The address space has used a maximum of 876K below the line and 1