Tuesday, November 30, 2010

Indexing in SAS.. In a baby's language

A question at the KBC (Indianized version of Who wants to be a Millionire)

If your answer happens to be anything other than C then this might not be a relevant post for you :-)

There have been innumerable number of posts on indexing in SAS. One such brilliant post is found in the below link..

http://blogs.sas.com/sastraining/index.php?/archives/55-To-INDEX-or-not-to-INDEX....html

To me, the working on a simple index is understood this way...

Note: I've considered a B-Tree index. I'm sure SAS uses other better ways to maintain the index table.

Lets consider an example where we have a dataset with 2 variables ID and Age, and having 100 observations with Id values ranging from 1 to 100.


   Now assume that i would want to execute a query as given below:
     
        Proc Sql;
           Select Id, Age
           From Tab
           Where ID = 77;
       Quit;

Now we will understand what happens to the query in two scenarios: without indexing and with indexing

Case 1 - Without indexing:

Nothing unusual happens :-) Query runs to completion! Just that the SAS Complier starts searching the number 77 sequentially from the beginning and would need to make 77 passes to make the right hit!




Case 2 - With Indexing:

If the Table is indexed, (on ID in this example) then the complier first compares if the key variable (77) is less than 50 or greater than 50. Then in compares if the key variable is greater than 75 or less than 75, and so on.. Until a match is found.


 
Thus the number of iterations here is reduced drastically in comparison to the non indexed table.
 
Note: The example considered above has just 100 observations. This was just for the ease of explaination. In reality, indexing for 100 observations would not change the performance much. However, indexing are found to be pretty efficient when the dataset size is huge.

Wednesday, November 24, 2010

Booooom!!! proc explode!!!

I came across an interesting procedure today.. Its called as proc explode!

This procedure enables the user to blow up the text in nice formatting and display it on the output screen (I think only in listing).

This reminds me of the 'banner' command in the ol' UNIX boxes.

You can try the explode procedure yourself by copy pasting the below code..


proc explode;
parmcards;
HELLO WORLD

;

And the output is as shown below:



Just a word of caution! Please be mindful about the space before HELLO WORLD in the proc explode step. It throws up an error if we forget that space because SAS expects a numeric or some specific characters in that place (some options for changing the formats of the display).

Also if you are using versions <= SAS 9.1, then you may have to execute this filename statement before the proc explode (Some bug i found!!! )
FILENAME FT15F001 '~/file1.txt';

More information about the proc explode can be got at:

http://www.sfu.ca/sasdoc/sashtml/proc/z0146882.htm

Password encryption

Many a times we come across a situation where we may need to encrypt the password which we use in the SAS programs (Example: Using a password to access a database like db2). This can be acheived by the pwencode procedure. See the example below:

filename fileref "C:\MyFolder\Pwd.txt";
proc pwencode in="My Passwd" out=fileref;
run;

The contents of the Pwd.tx is as follows:
{sas001}TXkgUGFzc3dk

Here the {sas001} denotes the method of encryption. More information on the encryption methods available/used in SAS can be found at:
http://support.sas.com/documentation/cdl/en/secref/62092/HTML/default/viewer.htm#a002595992.htm

Now the encoded password can be used everywhere by reading the contents of the file Pwd.txt.

Tuesday, November 23, 2010

Error Codes for libname

Whenever we assign a libname (especially a libname to a database), we would want to be sure that the library path should be valid/existing. When it comes to assigning a libname to a database, example oracle, we may also want to confirm if the user-id and the password are valid.

Below is the syntax for doing a error check immediately after assigning a libname and before proceeding to access the lib reference:


libname dbase oracle user="USER_ID" pass="PASSWORD" path='@PATH' schema=MYSCHEMA;

%macro code_area;
filename sendmail email to=("Pramod.R@xyz.com");
%if &syslibrc = 0 %then %do;
/* RUN THE ACTUAL CODE HERE*/
data _null_;
file sendmail subject="Success";
put / "The code ran into completion.";
run;
%end;
%else %do;
/* THROW THE ERROR MAIL */
data _null_;
file sendmail subject="Failed";
put / "The code ran into ran into problems due to Oracle connection problems.";
run;
%end;
%mend code_area;

%code_area;



Similarly we can also do a check for the filename by using the automatic macro variable - %sysfilrc, which returns a 0 value for successful filename statement and a non zero value if the filename statement failed.

.netrc in UNIX

I came across this interesting feature when i was trying to FTP a file from a FTP server to my Unix box. I used this convetional method to FTP a file earlier:

ftp -n $SERVER
quote user $LOGIN
quote pass $PASSWORD
cd ..
ls BKP files.txt




The problem with this method is that I would need to store my user id and password in the variable LOGIN and PASSWORD and read it everytime. This can be done by saving these variables in another read only file and then reading them everytime we FTP a file. Also we may need to keep updating our passwords as and when we change it.

.netrc file helps us overcome all these shortcomings. All you need to do is create a file named .netrc in your home directory and paste the following text in it:

machine login password


Then give the following permissions: -rw-------

This would help us FTP the files without having us to provide any login credentials anytime.