Saturday, April 9, 2011

Creating a search engine in SAS

Its the age of the search engine! I remember people "yahoo"ing during the late 90's and "Google"ing till the late 2k's and now "Bing"ing.

I just wondered.. Why not SAS? So I started off by doing some reading on the yahoo search engine API's. They have new API released, called as the BOSS. Its documentation is provided here:

Next step was to fetch a api key which was generated after i filled out their form. Using this, i could start accessing their BOSS api...

I used the proc http to access the BOSS api using the program below:

filename in "C:\test\curr_in";
filename out "C:\test\curr_out.txt";

data _null_;
if (_N_ eq 1) then do;
 file stdout;
 infile stdin;
 put @1 "Enter the search text:";
 input n $;
 file in;
 put var $;

proc http in=in out=out url="" method="post" ct="application/x-www-form-urlencoded";

The above program picks the input from the user, which would be the text that needs to be searched; creates a file curr_in which contains the parameter that needs to be sent out to the BOSS api and posts it to the api using the proc http procedure.

Note that the api key has been typed as xxxx which can be replaced by the api key that you would generate from the site.

Once the program is executed, we can see that the output of the api has been dumped into the curr_out file, which contains the search result in the form of XML. This xml is then parsed using the suitable mechanism to fetch the needed fields and then output it to the stdout. This is accomplished by the below code:

data new;
infile out lrecl=10000 truncover;
input @1 rec $1000.;
if(index(rec,'<Summary>')>0) then do;
 title= substr(rec,index(rec,'<Title>')+7,index(rec,'</Title>')-(index(rec,'<Title>')+7));
 url = substr(rec,index(rec,'<Url>')+5,index(rec,'</Url>')-(index(rec,'<Url>')+5));

data _null_;
set new;
file stdout;
put "Title: " title;
put "Summary: " summary;
put "Url: " url;

This produces the output as shown below:

Let me know your feedback/comments!