Wednesday, March 21, 2007

Find the number of observations in the dataset

I work with huge daatsets - more than 100,000 observations followed up for 20 years. In the process of merging datasets read in through macros, I have lost track of the number of observations.

Following code gives the number of observations in the dataset - dataname:
%let dsid=%sysfunc(open(dataname));
%let num=%sysfunc(attrn(&dsid,nobs));
%let rc=%sysfunc(close(&dsid));
%put There are &num observations in dataset dataname.;


This is from the SAS samples.

Monday, March 19, 2007

Centering around mean or calculating standard deviation

data original;
set original;
var=1; /*creates a constant variable*/

/*creates means ,standard deviation and no of obs and puts them in dataset called starwars which has only one observation*/

proc means data=original;
var ahbmi ah98 ah99 ah9900 ;

OUTPUT OUT=starwars MEAN=avbmi av98 av99 STD=stbmi stah98 stah99 N=nbmi n98 n99 ;
run;


data starwars;

set starwars;
var=1; /*creates constant variable for merging with original dataset*/
drop _freq_ _type_;

data original ;
merge original starwars;
by var;

centerbmi=ahbmi-avbmi; /*centers bmi*/

bmisd=ahbmi/stbmi;/*creates variable to do regression with each unit increment of standard deviation*/



/**************Alternate way***************************/
data original;
set original;
proc means data=original;
var ahbmi ah98 ah99 ah9900 hipcr;
OUTPUT OUT=starwars MEAN=avbmi av98 av99 STD=stbmi stah98 stah99 N=nbmi n98 n99 ;
run;
/*creates means ,standard deviation and no of obs and puts them in dataset called starwars which has only one observation*/


data _null_;
set starwars;
call symput("bmibar",avbmi); /*creates macro var bmibar that has the value of avbmi*/
call symput("a98bar",av98);
call symput("a99bar",av99);
call symput("s98",stah98);
call symput("s99",stah99);
call symput("sbmi",stbmi);
run;

%put mean of bmi is &bmibar;
%put mean of ah98 is &a98bar;
%put mean of ah99 is &a99bar;

data original;
set original;
ceterbmi=avbmi-&bmibar;
/*centers bmi*/
bmisd=ahbmi/&sbmi;
/*creates variable to do regression with each unit increment of standard deviation */
run;

/**************Alternate way***************************/
/**************Standardized Coefficients***************************/

proc reg;
model dependent= independent1 independent2 independent3/stb;
run;

This gives standardized estimates i.e. when all variables in the
models (including dependent variable) are standardized to zero
mean and unit variance. Each coefficient indicates the number
of SD change in the dependent variable with a SD change in the
independent variable holding constant all other variables constant.
This is useful to compare the relative importance of independent
variables independent of the scales.