Saturday, April 9, 2011

Creating a search engine in SAS

Its the age of the search engine! I remember people "yahoo"ing during the late 90's and "Google"ing till the late 2k's and now "Bing"ing.

I just wondered.. Why not SAS? So I started off by doing some reading on the yahoo search engine API's. They have new API released, called as the BOSS. Its documentation is provided here: http://developer.yahoo.com/search/boss/boss_guide/

Next step was to fetch a api key which was generated after i filled out their form. Using this, i could start accessing their BOSS api...

I used the proc http to access the BOSS api using the program below:

filename in "C:\test\curr_in";
filename out "C:\test\curr_out.txt";


data _null_;
title;
if (_N_ eq 1) then do;
 file stdout;
 infile stdin;
 put @1 "Enter the search text:";
 input n $;
 var='appid=xxxxxxx&query='||compress(n)||'&results=1';
 file in;
 put var $;
end;
run;



proc http in=in out=out url="http://search.yahooapis.com/WebSearchService/V1/webSearch" method="post" ct="application/x-www-form-urlencoded";
run;

The above program picks the input from the user, which would be the text that needs to be searched; creates a file curr_in which contains the parameter that needs to be sent out to the BOSS api and posts it to the api using the proc http procedure.

Note that the api key has been typed as xxxx which can be replaced by the api key that you would generate from the developer.yahoo.com site.

Once the program is executed, we can see that the output of the api has been dumped into the curr_out file, which contains the search result in the form of XML. This xml is then parsed using the suitable mechanism to fetch the needed fields and then output it to the stdout. This is accomplished by the below code:

data new;
infile out lrecl=10000 truncover;
input @1 rec $1000.;
if(index(rec,'<Summary>')>0) then do;
 title= substr(rec,index(rec,'<Title>')+7,index(rec,'</Title>')-(index(rec,'<Title>')+7));
 summary=substr(rec,index(rec,'<Summary>')+9,index(rec,'</Summary>')-(index(rec,'<Summary>')+9));
 url = substr(rec,index(rec,'<Url>')+5,index(rec,'</Url>')-(index(rec,'<Url>')+5));
 output;
end;
run;

data _null_;
set new;
file stdout;
put "Title: " title;
put "Summary: " summary;
put "Url: " url;
run;


This produces the output as shown below:


Let me know your feedback/comments!

Sunday, March 20, 2011

Using SAS as my makeshift alarm: Using sound function

I have trouble waking up early everyday. And the days when i forget to set alarm in my mobile, I might wake up two days later!!

I've overcome this problem after i discovered this the sound function in SAS. This function produces a beep sound for the specified frequency and time which are to be supplied as the arguments.

Now, lets see how we make the conventional alarm sound which would ring for 20 times:

data _null_;
do j=1 to 20;
   do i=1 to 4;
      sound(550,500);
   end;
   sleep(1000);
end;
run;

Now, if we schedule this code in a windows scheduler to run everyday at some designated time, SAS would take up the task of waking you up everyday.

I use this sound function more often at the end of my long running programs. It would beep after the completion of the program (yes.. pretty similar to the oven) so that we could do the needful actions.

Just to note that this is possible only with the Windows SAS and the sound function does not work in UNIX/Mainframes.

I would like to end this post by sharing with you, a brilliant application of this sound/sleep function: the composition of the "Ol Mac Donald" Song:

http://www2.sas.com/proceedings/sugi29/048-29.pdf

So.. Let the music begin!!!

Saturday, March 19, 2011

SASopedia gets a face lift

I know it was high time I did it... Hope this is more viewable...

Watch out for more posts in this space...

Sunday, March 6, 2011

Making SAS Interactive (Part 2): Using %window and %display

Its the world of GUI. And you are left nowhere if you don't provide the users, the luxury (or rather fulfill the basic needs) of giving a Graphical User Interface. This holds good even for SAS.

SAS came up with the tool called SAS/AF and SAS EG which provides the users with a brilliant GUI and thus making their lives so eeaasssy.. However, not many of us would have bumped into this brilliant macro tool called %WINDOW and %DISPLAY which is present in BASE SAS9 which satisfies the basic needs of having a GUI. Here I take a small dive into the SAS ocean again, exploring the functionalities of these two macros.

%WINDOW: This creates the basic window that needs to be popped up upon execution. We can specify the window attributes here like the color/width/height,etc. You could also specify the position of the text that is to be displayed and the input parameters that are to be read.

%DISPLAY: This actually invokes the window that has been defined in the %window program.

Below is a simple illustration of the %window and %display invocation:

%window test_win color=blue

/*dimension for the window*/
icolumn=15 irow=10
columns=90 rows=45


/*Content of the window*/
#3 @25 "My Window"
attr = rev_video
#5 @10 "Hello Pramod"
attr = underline
;


%display test_win;

The above code displays the output as shown below:



Below, I've demonstrated how we can use this to make SAS interactive. Most of the code is self explanatory, of course with a bit of help from the support.sas.com documentation available at: http://support.sas.com/documentation/cdl/en/mcrolref/61885/HTML/default/viewer.htm#a000206734.htm

/* Main window */

%window final color=gray

/*dimension for the window*/
icolumn=15 irow=10
columns=90 rows=75


/*Content of the window*/
#1 @25 "&error_msg"
#3 @15 "Hi &sysuserid." attr = rev_video color=blue @60 "Date: &sysdate9." attr = rev_video color=blue
#5 @25 "Welcome to the Reporting World" attr = rev_video

#8 @35 "Report List"
#11 @15 "1. Class listing" @50 "2. Class Report by Gender"
#13 @15 "3. Class Report by Age" @50 "4. Class freq"
#16 @15 "Enter your Choice:" @35 choice attr=underline
;


/* End of Window Final */

%macro execute;
%let choice=;%display final;

%if &choice=1 %then %do;
 proc print data=sashelp.class noobs;
 run;
%end;
%else %if &choice=2 %then %do;
 proc report data=sashelp.class nowd;
 columns sex height weight;
 define sex / group;
 define height /analysis;
 define weight / analysis;
 run;
%end;
%else %if &choice=3 %then %do;
 proc report data=sashelp.class nowd;
 columns age height weight;
 define age / group;
 define height /analysis;
 define weight / analysis;
 run;
%end;
%else %if &choice=4 %then %do;
 proc freq data=sashelp.class;
 tables sex*age /nocum nopercent;
 run;
%end;
%mend execute;

%execute
In the above code, I first display a list of things the use might be interested to see in the %window, and then %display this inside the macro execute. I also read his input into the macro variable choice and then based on his selection, I call the required procedure inside the macro execute.

The output of the above code is as shown below:


Upon entering the value 2 and hitting the enter button, we get the output as shown:




You could experiment more on this and let me know your suggestions/thoughts/ideas...

Monday, February 28, 2011

Making SAS Interactive (Part 1): Using stdin and stdout

Many a times, we may come across a need for having a dynamic programs. Meaning, we may need the user to key in the input and run the code accordingly, based on his input. This can be achieved in SAS by using the automatic file descriptors: stdin and stdout. This is more widely used in UNIX environment, especially when we batch submit the code in the command line.

In the below code, I illustrate the use of stdin and stdout by implementing a simple calculator, which takes in the numbers and the operators as the arguments and outputs the results.

data test;
if (_N_ eq 1) then do;
 file stdout;
 infile stdin;
 put @1 "Enter the first variable:";
 input X @;
 put @1 "Enter the second variable:";
 input Y @;
 put @1 "Choose the operator: + - * / **:";
 input op $;
end;
retain X Y op;
select (op);
 when ('+') result=X+Y;
 when ('-') result=X-Y;
 when ('*') result=X*Y;
 when ('/') result=X/Y;
 when ('**') result=X**Y;
 otherwise ;
end;
put "The result is:" result;
run;


In the above code, I've redirected the infile and file statements to the stdin and stdout respectively. So the input is always read through the terminal key and the output is always written to the terminal.

When we run the above code in the batch mode, we get the following output:


We can also route the output of a procedure into the terminal using the proc printo as shown below:

proc printto print=stdout;
run;


The below code would output all the details of the student whose name is keyed in the terminal for the sashelp.class dataset:

data name;
title;
if (_N_ eq 1) then do;
 file stdout;
 infile stdin;
 put @1 "Enter the student name:";
 input n $;
end;
retain n;
call symput('name',n);
run;

proc printto print=stdout;
run;
options nodate nonumber;
proc print data=sashelp.class noobs;
where name="&name";
run;


This would give us the below output:



Let me know if you guys have any thoughts or other approaches.

More to come: Making SAS Iinteractive (Part 2): Using window prompts

Saturday, February 19, 2011

Reading a table from a website into a SAS dataset

Many a times we may want to read a table from the webpages into our datasets. This may be a requirement especially when I would want to analyse the stock market shares and their corresponding trends over the past. This can be done in many ways depending on the web application that is in consideration.

Here I discuss the filename url and the other related methods to access the static webpage which we see in the browser. However, there are many other different methods like FTPing the webpage through some mechanism and then parsing the html/aspx source tags to get the required data, etc.

Now that the Cricket world cup is here, i've decided to use the http://www.espncricinfo.com/ to show how we can read the scorecard into our datasets. I've used a match score card which appears like this in the website:



To begin with, we need to assign a filename to the url where the table resides, by using the filename url syntax:

filename fn url "http://www.espncricinfo.com/icc_cricket_worldcup2011/engine/match/473333.html";

Now that the fileref has been added, we try reading the file into the dataset using the infile/input statements in a datastep:

data _null_;
infile fn lrecl=30000;;
input col1 $10000.;
file "~/test.txt";
put col1 $10000.;
run;

Note: The default value for the lrecl (where we specify the maximum record length in the file), is 256 characters. However, one can specify a value upto 32767. I've specified a length of 30000 just assuming that the html file max length would be 30000.

The above code reads one entire line from the url specified (which essentially contains a html file) into the SAS as a single character variable of length 1000 and writes it into a file named test.txt.

When we open the test.txt to read the contents of the html, we see a lot of html tags which needs to be parsed into a dataset to get the table of our choice.

In the test.txt, we see the following tags appearing in the file as shown below :

<td width="192"><a class="playerName" href="http://www.blogger.com/icc_cricket_worldcup2011/content/player/35263.html" target="" title="view the player profile for Virender Sehwag">V Sehwag </a>&nbsp; </td>

<td class="battingRuns">23</td>

<td class="battingDetails">30</td>

<td class="battingDetails">0</td>

Now, all we need to do is look out for the occurrences of  the text tag: 'class="playerName"' to fetch the player name; 'class="battingRuns">' to fetch the player score; 'class="battingDetails">' to fetch the player matches, and so on and so forth..

This can easily be done by using the following data step code:

data inp;
infile "~/test.txt" lrecl=30000;
input @'class="playerName"' name1 $300. @'class="battingRuns">' runs1 : $20. @'class="battingDetails">' matches1 : $20.;
run;


The above code searches for the occurrences of the text 'class="playerName"' and reads 300 characters following it into the variable name1. Similarly, it also searches for the occurrence of the text 'class="battingRuns"' and 'class="battingDetails"' and reads upto the next 20 characters until it encounters a space (the default delimiter).

Now the first observation of the dataset inp contains the following values:


Name1
Runs1
Matches1
href="http://www.blogger.com/icc_cricket_worldcup2011/content/player/35263.html" target="" title="view the player profile for Virender Sehwag">V Sehwag </a>&nbsp; </td>
23</td>
30</td>


Now, we need to extract the name (V Sehwag) from the name1 variable. This can be done by picking the index of </a> and the string '">' from the value of the variable name1. The function is as follows:

x=index(t,'</a>');
y=index(t,'playerName');
name=substr(t,x+4,y-x+4);


The above set of functions calculate the x and y indices which is the beginning point and the ending point of the string V Sehwag. Then i do a substr of the string knowing the beginning position and the ending position.

To extract the numeric value from the character value, we use the following function:

runs=input(compress(lowcase(runs1),'abcdefghijklmnopqrstuvwxyz<>/'),8.);

Now, coming all the above set of functions, a data step can be built which would give us the final set of data as follows:

data final;
set inp;
x=index(name1,'</a>');
y=index(name1,'playerName');
name=substr(name1,x+4,y-x+4);

runs=input(compress(lowcase(runs1),'abcdefghijklmnopqrstuvwxyz<>/'),8.);
matches=input(compress(lowcase(matches1),'abcdefghijklmnopqrstuvwxyz<>/'),8.);
run;

Thats it for now.. Let me know your experiences or suggestions on doing this in a better way...

Monday, January 31, 2011

Dark Secrets of %sysfunc

I've been using the %sysfunc for quite sometime now. I must say that its a pretty handy tool.. Especially when you would want to do some heavy wieght lifting in macros. However, one must be mindful of the following... (can I say shortcomings?? Naa.. not quite...)


Quoting fails in %sysfunc:

It is starkly visible when you try using intnx function in a macro. For example, you would want to build a month incrementor in a loop. You would do something like this:

%macro doit;
%let dt='01JAN2011'd;
%do i=1 %to 12;
%put mon=%sysfunc(month(%sysfunc(intnx('month',&dt,&i))));
%end;
%mend doit;

%doit

(Note: We usually do not need a semi-colon (;) after invoking a macro.. But habits die-hard. I'm trying my best to aviod it..)

The above code fails to run as expected. %sysfunc throws up tantrums saying:

WARNING: An argument to the function INTNX referenced by the %SYSFUNC or %QSYSFUNC macro function is out of range.

This is purely because the %sysfunc does not like the quotes around the month inside the intnx function. Just remove them and SAS becomes your ever-loyal-man Friday!

Some basic funnctions like put and input doesn't work!!

I was taken aback when i discovered this. (Ya.. I know... I know that i din do my documentation reading properly when i was asked to..)

When I submit this program inside a macro,

%put %sysfunc(put('31Jan2011'd,worddate.));

SAS slaps me with this error message:

ERROR: The PUT function referenced in the %SYSFUNC or %QSYSFUNC macro function is not found.

But this is unexpected... Now.. How on the earth am i supposed to do the type casting which is an integral part of my life (of course apart from my wife!!)

SAS says that you should be using putn, putc, inputn and inputc instead of the regular put/input functions for numeric/charater variables respectively. Wow.. Problem solved!!!The above code can now be written as:

%put %sysfunc(putn('31Jan2011'd,worddate.));

%Sysfunc can be used with a format specifyer (similar to the put function)

This was one of those 'learnings' I had from my SAS Advanced Certification perparation. %sysfunc could be used along with a format speicifer (thus avoiding the necessity to have put function).

The above function can be further reduced as given beow:

%put %sysfunc(today(),worddate.);

So much for now.. Please let me know your thoughts on this or if you have found anything more interesting to do with the %sysfunc

Saturday, January 29, 2011

Am back!!

Hey Guys... I know this blog was asleep for a month or so without any activity.. I owe it to my Advanced SAS Certification preparaion...

And I got Certified!!!!

Now that I'm back.. You can expect more posts on the new things that i found out while preparing for the certification!

Cheers!!