Tuesday, November 30, 2010

Indexing in SAS.. In a baby's language

A question at the KBC (Indianized version of Who wants to be a Millionire)

If your answer happens to be anything other than C then this might not be a relevant post for you :-)

There have been innumerable number of posts on indexing in SAS. One such brilliant post is found in the below link..

http://blogs.sas.com/sastraining/index.php?/archives/55-To-INDEX-or-not-to-INDEX....html

To me, the working on a simple index is understood this way...

Note: I've considered a B-Tree index. I'm sure SAS uses other better ways to maintain the index table.

Lets consider an example where we have a dataset with 2 variables ID and Age, and having 100 observations with Id values ranging from 1 to 100.


   Now assume that i would want to execute a query as given below:
     
        Proc Sql;
           Select Id, Age
           From Tab
           Where ID = 77;
       Quit;

Now we will understand what happens to the query in two scenarios: without indexing and with indexing

Case 1 - Without indexing:

Nothing unusual happens :-) Query runs to completion! Just that the SAS Complier starts searching the number 77 sequentially from the beginning and would need to make 77 passes to make the right hit!




Case 2 - With Indexing:

If the Table is indexed, (on ID in this example) then the complier first compares if the key variable (77) is less than 50 or greater than 50. Then in compares if the key variable is greater than 75 or less than 75, and so on.. Until a match is found.


 
Thus the number of iterations here is reduced drastically in comparison to the non indexed table.
 
Note: The example considered above has just 100 observations. This was just for the ease of explaination. In reality, indexing for 100 observations would not change the performance much. However, indexing are found to be pretty efficient when the dataset size is huge.

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete