Sunday, January 7, 2007

Using Proc Genmod for logistic, poisson and log binomial regression

PROC GENMOD is a procedure for fitting generalized linear models. This procedure is flexible and offers various advantages.

Indicator variables do not have to be constructed in advance because it uses a class statement for specifying categorical (classification) variables.
Interactions can be fitted by specified by using asterisk, for example, batch*gender.
In some procedures, variables necessarily have to be numerical. However, in proc genmod, the variables (both outcome and explanatory) can be character.
Proc genmod reports log likelihood ratio for each variable in the model.
Because of the generalized nature, different models can be fitted with one procedure.

data file11;
/* generate data using random numbers. details here*/
DO MINUTE=0 TO 1000 BY 1;
X=UNIFORM (0);
X1=UNIFORM (15452);
X2=UNIFORM (29561);
OUTPUT ;
END;
run;
data filea;
set file11;
if x>0.5 then gender=1;
if x<=0.5 then gender=0; if x1>0.8 then emmig=1;
if x1<=0.8 then emmig=0; if x1>0.8 then emmig1='yes';
if x1<=0.8 then emmig1='no'; if x2>0.4 then cat=1;
if x2<=0.4 then cat=0; id=_n_; /* following creates a compressed/collapsed dataset (fam8)
with the same information as original dataset*/
if cat=0 then do;
if gender=1 and emmig=1 then index=1;
if gender=0 and emmig=1 then index=2;
if gender=1 and emmig=0 then index=3;
if gender=0 and emmig=0 then index=4;
end;
else if cat=1 then do;
if gender=1 and emmig=1 then index=5;
if gender=0 and emmig=1 then index=6;
if gender=1 and emmig=0 then index=7;
if gender=0 and emmig=0 then index=8;
end;
PROC MEANS DATA=filea NWAY NOPRINT ;
CLASS index gender emmig cat emmig1;
VAR index ;
OUTPUT OUT=fam8 SUM=number;
RUN;

data file;
set fam8;
drop _type_ number;
numb=_freq_;
id=_n_;

/* Calculate odds ratio using logistic regression */
proc genmod data=file descending ;
class cat ;
freq numb; /* method to analyze aggregate data */
model emmig1 = cat/ dist=binomial link=logit ;
estimate 'Beta' cat 1 -1/ exp;
run;

/* Calculate risk ratio using log binomial regression */
proc genmod data=filea descending ;
class cat ;
model emmig = cat/ dist=binomial link=log ;
estimate 'Beta' cat 1 -1/ exp;
run;

/* Calculate risk ratio using Poisson Regression with Robust Error Variance*/
proc genmod data=filea ;
class cat id;
model emmig = cat/ dist=poisson link=log ;
repeated subject = id/ type = unstr;
estimate 'Beta' cat 1 -1/ exp;
run;

/*logic of using risk ratio vs odds ratio and details of log-binomial and
poisson regression is here*/