Saturday, December 30, 2006

Finding the number of times a threshold has exceeded cutoff

I was approached by a student who had a time series data (120 individual animals observed at 250 time points). She want to find the following:

1)How man times an animal's outcome has been above a threshold for 5, 6,7 ... j consecutive times?

2) An animal can be above a threshold for 'j' consecutive times and then go below it for 'n' consecutive time points and go above it for 'k' consecutive time points . I would prefer this pattern to be counted distinctly and also as only once.


/*this solution was offered through SAS-L */
/*Generate test data */
data test;
do animal = 1 to 120;
do time = 1 to 250;

outcome = floor(10*ranuni(123) );

output;

end;

end;

run;


/*First step is to create a variable indicating whether the threshold (3, for example) is exceeded*/


data step1 /*/ view=step1/*;

set test;

over = (outcome > 3);

run;

/* Next, reduce to one observation for each series of consecutives */


data step2(drop = outcome) /* /view=step2 */;

do consecutive = -1 by -1 until (last.over);

set threshold;

by animal over notsorted;

end;

run;


proc freq data=step2;
tables consecutive / nopercent;

where over;

run;

Finding the largest observation

There might be many ways to do this. A simple way would be to as follows:

data new;
Infile '/udd/n2man/epimarks/classmarks.txt';
input roll 4. ExamDate MMDDYY10. @17 MidFinal $2. Marks 4.;
obsno=_n_;
/* creates a new variable obsno whose value is the same as observation number in the dataset */


proc univariate;
var marks;
run;

/* see the observation number for the largest variable(s) in the output. Run the program again with following added*/

proc print;
where obsno=15; /* 15 is the obsno from the output of proc univariate */

A sophisticated way

data new2;
Infile '/udd/n2man/epimarks/classmarks.txt';
input roll 4. ExamDate MMDDYY10. @17 MidFinal $2. Marks 4.;

proc sort data =new2;
by descending marks;

data _null_ ; /* null datasets only exist for the particular datastep where they are called */
set new2 ;
If _n_=1 then call symput("IDNumber",Roll);
/* this creates a macro variable IDNumber whose value is the roll of the first observation which is the one with largest marks because of sorting*/
else stop;

proc print data=new2;
where Roll="&IDnumber";
format ExamDate WORDDATE18.;
title "Student &IDNumber Had the highest marks ";
run;

Redirecting sas log and output files

When sas program mal or mal.sas is run on unix, it would produce output files sql.log and sql.lst. It is possible to redirect these files as follows:

sas mal.sas -log mallog1 -print mresults &

where mallog1 and mresults could be any valid Unix file names. The ampersand (&) ensures that the sas program is run in the background (see earlier posts).

Monday, December 25, 2006

Running SAS jobs in background

If you want to continue editing programs while SAS jobs run in the background, you can do that by placing an ampersand ("&") after program name. For example:

   sas myprog &

Running SAS on UNIX in Interactive mode

If you have got used to running SAS on Windows and now you are stuck with using SAS on UNIX, accept my sympathies ;)

Anyways, you can run SAS on UNIX with X11 tunelling. The X-Windows interface is similar in appearance and functionality to SAS for Windows.

To do this you need two programs:

  • Putty or Secure CRT to connect to the UNIX server from Windows PC.
  • An X-Windows server running on your computer.
Installing X-Windows server is easy.
Download the program from
here. Unzip this folder and run setup. You will get a window titled MicroImages X Server.

To run SAS, you will need to configure it to allow XWindows "tunneling."

In SecureCRT, after starting click on Properties.

In the Properties window click on X11 tab and check Forward X11 Packets.

Connect to the server as usual and once connected type
sas

After a pause of 4 seconds or so you would be able to see windows similar to what you see in Windows in the MicroImages X Server window.

The response of SAS in this environment can be a little slow.

Convert character format to numeric format

In the previous post, if you use the substr function to tease out a number e.g. roll, the number is in character format. Mathematical functions can not be performed on character variables.

To convert a variable from character to number format, there are following ways:

rollno=roll*1;

rollno=input(roll,4.0)

Saturday, December 23, 2006

Parsing names using scan and substr functions

Most of the times the raw data is received which requires processing before analysis. Recently, I assisted a researcher in analyzing this school data which was in the following format. The names in (brackets) is the maiden name. We have to find the everyone's name before marriage among other things.

DATA dataset ;

INPUT obs 1 name $ 2-45 idno $ 46-58 state $ 59 ;

LENGTH obs 3 ;

cards;

6RED MURAR: DID NOT COMPLETE 09/2251/0018Â 2

7MARLA (WOHL) RANDH 08/3559/0074Â 3

8RED (KRAKAS) MOHLK 07/1452/0025Â 6

9MNIT K. RAJHU 02/1758/0069Â 1

; RUN ;

/*notice the patterns which will be used to identify different variables*/
data grit;
set dataset;
roll1= SCAN(idno,-1,'/');

roll= substr(roll1,1,4);

name1= SCAN(name,1,':');

first1= SCAN(name1,1,' ');
len_temp=indexc(name,'(');
if len_temp gt 0 then do;

last1=scan(name,-1,'(');

last2=scan(last1,1,')');
last3=compress(last2," ");
maiden=trim(first1)||" "||last3;
end;
if maiden=" " then maiden=trim(name1);
proc print;
var roll name1 first1 last3 maiden;

run;

/*Pitfalls detailed here */

/* A very useful program is available here */

Thursday, December 21, 2006

Elegant plotting without ODS

One file for all the graphs

If you want to plot all the graphs in one file without using ODS, you should add the following line at the beginning of your program.

filename mygraph 'usrroom.ps';
goptions reset=global DEVICE=pscolor gsfmode=append gsfname=mygraph;

**this would generate 3.ext - a postscript file that contains the boxplots ****;
**note that ‘device=PSLEPSFC’ and filename mygraph 'usrroom.eps' ; would result in a eps file with colors**;
**note that ‘device=pscolor’ would result in a graph with colors**;

**if you do not use gsfname option, then sas would name the file by default sasgraph.ps. **;

This would generate a single file postscript(ps) file ‘usrroom.ps’ with all your plots .

inserting ps files in word

Convert ps files to enhanced post script files for inserting into word documents in unix/hsph by typing

convert file.ps file.eps

making a pdf file from ps
ps files can be made into a pdf file in unix/hsph by typing
ps2pdf usrroom.ps graphs.pdf

If you run the proc goptions or other advanced graphics would this line you would get the following prompt
"No device name has been given--please enter device name:"

If you enter ‘ps’ or ‘pscolor’, you would get a postscript file.

Separate file for each graph

If you want to make separate files for each graph it can be done by using goptions before each plot as shown in the following code

gif files

If you want to plot separate file for each graph, you should add the following line at the beginning of your program.

filename mygraph '/usr2/users/student/username/graphs';
goptions reset=global DEVICE=html gsfmode=append gsfname=mygraph;

This would generate gif files in files in folder /graphs. gif files are easy to insert in word .

pdf files

If you want to plot separate pdf file for each graph, you can add the following line at the beginning of each proc gplot command.

filename grafout '1.pdf';
goptions device=pdf gsfname=grafout gaccess=sasgastd gsfmode=replace;
symbol value=dot color=purple;
proc gplot data=weights;
plot bwt*age;
run;
**this would generate 1.pdf file that contains the plots of birth weight with age****;
**However, SAS would automatically map colors to greyscale in this file**

A good guide on plotting (pdf) is available here.

Using WinSCP to run SAS programs


In the screenshot of WinSCP, Novell files are on the left, Unix files are on the right. You can drag and drop to transfer files between systems.

  • Create a new SAS program file using WinSCP.
    • From the top menu bar, choose Files/Edit new file.
    • Name it your_pgm.sas. (NB: No spaces allowed in Unix file names.)
  • Edit an existing file using WinSCP.
    • Right click on a file. You will see some preset Unix commands (Edit, Copy, etc.)
    • Choose Edit to edit the file
  • 3 ways to execute SAS as a Unix command through WinSCP
    • From the top menu bar, choose Commands/Terminal. Type sas your_pgm.sas
    • From the top menu bar, choose Options/Command line. In the command line box that appears at the bottom of the window, type sas your_pgm.sas
    • Set up a custom command, as described below
  • Add a SAS command to the custom menu.
    • Right click anywhere on the Unix side of the window.
    • Choose Custom Commands/Customize/Add
    • Enter SAS in the Description box.
    • Enter sas “!” in the Custom Command box
    • The command SAS will be added to the Custom Commands
  • Some useful commands at HSPH. (If you set them up as Custom Commands, you won’t have to remember them.)

Description (becomes the menu choice)

Command (this format for custom commands uses “!” to refer to the file you clicked)

SAS [Version 9]

sas “!”

SAS8 [In case you need V8]

sas8 “!”

mpage [multi-page print in the Microlab to printer L17a (2 pages on one side)]*

mpage –Pmlab58 -2 “!”

lp [stream print to printer LL17a in Microlab]*

lp “!”

SAS Sudaan

sas8 -path /usr/local2/sudaan90/sas-callable "!"

*For best results, set “options linesize=78;” in your SAS program.

  • Run the SAS program using the custom SAS command
    • Click the SAS program file
    • Choose SAS from Custom Commands
    • Select SAS your_pgm.sas and execute
  • Managing your SAS session
    • Use Edit to examine .log and .lst files
    • Use mpage to print .sas, .log, and .lst files
    • If SAS is installed on your PC, consider using it to write your programs using the enhanced SAS editor. Drag and drop your program from the PC side of the window to the Unix side. (Be sure to save the editor contents before you do so). Execute the program using one of the methods, above. Drag and drop your .log file back to PC SAS for viewing in color.

This page was created by Amy Cohen.

Reading log files

Sometimes SAS error log would show the line number. Simple text editors like pico and text readers such as less or more do not show the line number.
We can use nl (number line) command in such situation. If the file is big, to make the file the readable through scrolling, pipe can be used.
This would be the output.

The default behavior of nl is to skip blank lines, as shown above. The treatment of blank lines can be modified with the -b switch. Some -b options are -ba (number all lines), -bt (number only text lines -- the default behavior)

Running SAS on Unix

The format of the command to run SAS is:

hsph% sas filename

SAS would print the log file in filename.log and the results in filename.lst.

Following is a sample sas program in unix is

options pagesize=59 ls=64;
libname new '.'; ** sets the library to the directory in which the program resides**
data ex;
infile '/usr2/users/student/mkaushik/la/tumor' TRUNCOVER;
/** reads the .txt file. Note the way file is referenced using backslash.**/

input id treat y;
proc print;
run;
proc genmod ;
class treat;
model y=treat/dist=binomial link=logit;

ESTIMATE 'thiopeta vs. Placebo' treat 1 -1 / e exp;
run;

This program can be written using text editors such as pico, emacs or vi. Alternatively, it can be written in notepad in windows and then transferred to UNIX account using WinSCP.

You can download tumor from here and transfer the file to your UNIX account.

Aggregate analysis in SAS

If there are aggregate data rather than individual data, analysis in SAS is less intuitive compared to STATA.

/* the aim of this analysis is to assess if the proportion of infection is changing across different years */

data xtrct;
input yr inf mech upgr other;/*infection has number of infection patients */

cards;
2000 12 7 0 40
2001 10 7 1 52
2002 34 13 4 24
2003 30 27 6 21
;

/* We need to convert this wide data set into long dataset*/
data trite;
set xtrct;
ninf=mech+upgr+other; /*total number of non infection patients */
ARRAY a(1:2) inf ninf;
DO t = 1 to 2 ;
num= a(t) ;
indicinf=t; /*indicator if num represents infections or non infections */
OUTPUT ;
END ;

proc print;
run;

/* The data table should be as follows. Note num corresponds to number of cases with infection and without infection.*/

Obs yr inf mech upgr other ninf inf infi num
1 2000 12 7 0 40 47 12 1 12
2 2000 12 7 0 40 47 12 2 47
3 2001 10 7 1 52 60 10 1 10
4 2001 10 7 1 52 60 10 2 60
5 2002 34 13 4 24 41 34 1 34
6 2002 34 13 4 24 41 34 2 41
7 2003 30 27 6 21 54 30 1 30
8 2003 30 27 6 21 54 30 2 54

/* To assess the proportion of patients with infections is same over the 3 year period */

proc freq order=data;
title3 '(Two one-way tables followed by a 2x2 table)';
table yr*indicinf / chisq;
weight num;
run;

******************************************************************************
The output is here.

Create a format library from SAS dataset

To create SAS format library, SAS requires a datset in following format and with some of these variables.


To create a format or informat from a dataset using the CNTLIN option of PROC FORMAT, you need atleast the following variables in the

FMTNAME – Contains name of format or informat to be created
START – The value of the variable that we want to be formatted
LABEL - The variable containing the format value we want displayed..

There are many optional variables (see online documentation for complete list), the most useful of which is

TYPE - character variable that indicates the type of format. Possible values are
C character format
I numeric informat
J character informat
N numeric format (excluding pictures)
P picture format

The CNTLIN dataset must have these characteristics:

* For both numeric and character formats, the data set must contain the variables FMTNAME, START, and LABEL. The remaining variables are not required.
* If you are creating a character format, a character informat, or a PICTURE statement format, you must specify a TYPE variable with the value that indicates the type of informat or format you are creating.
* If range values are to be noninclusive, the variables SEXCL and EEXCL must each have a value of Y. Inclusion is the default.

You can create more than one format from an input control data set if the observations for each format are grouped together.


Following is the way to create a format from a dataset:



/*************************************SAS CODE *************************************/
options nodate pageno=1 linesize=80 pagesize=60;
libname manas '/udd/n2man/tkk';
libname library '/udd/n2man/'; /* creates a format library in the location /udd/n2man/ */
/* READ THE ORIGINAL DATASET */
/* VARIABLE DIAGCO CONTAINS THE VALUES TO WHICH A FORMAT (IN VARIABLE RISK) SHOULD BE ASSIGNED */
data dummy;
infile '/udd/n2man/tkk/surgpro.txt' missover;
input diagco $ risk desc $;
KEEP diagco risk;
length diagco $8;
run;
proc sort data=sample; by risk diagco;
run;

data new;
set dummy;
start =diagco; /* CREATE NEW VARIABLES NEEDED TO MAKE THE FORMAT */
label=risk; /* CREATE NEW VARIABLES NEEDED TO MAKE THE FORMAT */
/* to make a format from a range of values include the following */
/* end= endrange; *here values from start-endrange will be assigned */
/* a value */
fmtname='$random';
/* CREATES THE FORMAT NAME. IN THIS CASE ALL VARIABLES NEED TO HAVE THE SAME FORMAT */
/* format names can also be in the dataset but in this dataset they had to be created */

PROC SORT DATA = new OUT = FMT NODUPKEY;
BY START;
/* removes duplicate values of start that should only one label. Having two labels for the same value can be dangerous */

proc sql;
create table fmt as
select distinct start, label, fmtname
from new;
quit;
data finalfmt;
set fmt;
/**** This step converts the dataset fmt to format library */
PROC FORMAT LIB=library fmtlib CNTLIN=finalFMT;
run;

/*********************************************************************************/


I have used/copied some segments from http://www.uchsc.edu/coho/SAS_Uers/The%20Many%20Ways%20to%20use%20Formats.doc.

Tuesday, December 19, 2006

Reading CSV files in SAS on UNIX

Each individual Excell work sheet can be converted into CSV or tab delimited files for being read in SAS.

CSV files are in following format (note that commas separate different data points). This format is especially suitable if you have missing data.

12,10,34
6,9,0
78,89,46
2,3,2


To read csv files, use one of the following statements

data new;
infile '/usr2/users/student/mkaushik/shh/data1.csv' dsd;/* change the location of the file */
INPUT rake note pgg ; /* these are variable names of your choice*/
run;

OR

data new;
infile '/usr2/users/student/mkaushik/shh/data1.csv' delimiter=','; /* change the location of the file */
INPUT rake note pgg ; /* these are variable names of your choice*/
run;

Table 1 Analysis Macro

I found this great macro for Table 1 analysis. This macro could be polished further but it works as is and is quite helpful.

Channing Labs have their own macro for Table 1 analysis but it adjusts for age which is not required in all analysis.

Monday, December 18, 2006

Making Indicator(dummy) variables

Following code can be used to make indicator variables.

Method 1

if state1 ne 20 & status ne 4 & sex ne .;
else if state1 in (1,3,4,19) then reg=1; /* Bread basket */
else if state1 in (8,16) then reg=2; /* maharashtraAP */
else if state1 in (9,10,12,13,17) then reg=3; /* bimaru states */
else if state1 in (5,11,21) then reg=5; /* north eastWB */
else if state1 in (2) then reg=6; /* Delhi */
else if state1 =. then reg=7; /* Missing */

ARRAY dummys {*} 3. reg_1 - reg_7;

DO i=1 TO 7;
dummys(i) = 0;
END;
dummys( reg ) = 1;

This is similar to the method given here.

Method 2

/* if variable values start from some arbitrary level */
/* following makes indicator for years */
/* Here year values can be from 1988 to 2000 */

ARRAY dummyb {1988:2000} 3. bat_1 - bat_13;

DO i=1988 TO 2000;
dummyb(i) = 0;
dummyb(i) =(year=i);
END;

Macros for creating indicator (dummy) variables are available from 1.
and SAS-L archives
Thanks for the comments Amy!

Exporting a SAS dataset to Microsoft Excel

There are atleast two ways to export a SAS dataset to Microsoft Excel.

Proc Export



PROC EXPORT DATA=datasetname
OUTFILE="/usr2/users/student/mkaushik/ names9900.csv"
DBMS=csv
REPLACE;
delimiter=','; */
RUN;

The program (say atlantis) that includes this, should be submitted with the -noterminal option to i.e.:

sas atlantis -noterminal


ODS
ODS html FILE='/usr2/users/student/mkaushik/AIIMS/becker/try2.xls';
PROC PRINT DATA =&dataset NOOBS;
RUN;
ODS html CLOSE;


Use it when a few observations need to be extracted and sent it for review, additional data entry, data confirmation.

Calculating percentiles in SAS

Following code can be used to find the percentiles.

proc univariate data=food noprint;
var omega sat;
output out=pctl pctlpts=20 40 60 80 pctlpre=omega sat;
run;
proc print data=pctl;
run;


Output would be in the following format
omega20 omega40 omega60 omega80 sat20 sat40 sat60 sat80

(from http://www.ats.ucla.edu/STAT/sas/faq/percentiles.htm)

Directing SAS output to a pdf file

ods pdf file= "/usr2/users/student/mkaushik/bio213/testd.pdf";
*** would direct all the output to testd.pdf file*

*ods pdf file ="%sysfunc
(reverse(%sysfunc(substr(%sysfunc(reverse(%sysfunc(reverse(%scan(%sysfunc(reverse(%sysfunc(getoption(sysin)))),1,/))))),5)))).pdf";
goptions reset=global device=gif ftitle=swissb ftext=swissb htitle=2 htext=2;


data weights;
infile '/usr2/users/student/mkaushik/ne/lbw1';
****change the above line to reflect where you have saved lbw.dat**;

input id low age lwt race smoke ptl ht ui ftv bwt;
run;

proc means;
var age lwt bwt;
run;

proc univariate plot;
var age lwt bwt;
run;

proc gplot;
plot bwt*lwt;
proc corr pearson spearman;
var bwt lwt;

proc gplot data =results;
plot bwt*lwt= '.' pred1*lwt='*' lowerm*lwt='L' upperm*lwt='U'/ overlay;
run;

quit;
ods pdf close;

Making nice plots & name ods html files after SAS files

Output Delivery System (ODS) is a preferred way of making nice tables and graphs.
I recommend using following statement which automatically directs the ODS output to a file with the same name and in the same directory as the program. The following code makes a html file. HTML files can be viewed in UNIX (using lynx) as well as Windows , open quickly and cut pasted easily in Document and Spreadsheet programs.

(remove spaces from following to make this work)
ods html file ="%sysfunc
(reverse(%sysfunc(substr(%sysfunc(reverse(%sysfunc(reverse
(%scan(%sysfunc(reverse(%sysfunc(getoption(sysin)))),1,/))))),5)))).html"
STYLE=MINIMAL;

gpath="/usr2/users/student/mkaushik/web/";

data=
.
.
proc anova;
.
run;

ods html close;



For an html file with frames, you can enter the following code spanning the proc statements

ods html body="/usr2/users/student/mkaushik/web/mixed_output.html"
contents="/usr2/users/student/mkaushik/web/mixed_contents.html"
frame="/usr2/users/student/mkaushik/web/mixed_frame.html";

proc
.
.
run;

ods html close;



In this ODS statement there are three files:

1. BODY which specifies the file to contain the output from the PROC.
2. CONTENTS which specifies the file to contain the table of contents.
3. FRAME which specifies the file to contain reference to the table of contents and the output from the PROC.

Reading compressed data

Sometimes large datasets are available as ASCII files. However, they are too large to be put on HSPH unix system because of limited disk space. These files can be compressed and put on Unix system and uncompressed ‘on the go’. While 7Z and gzip compressed files can be read as well, the simple compressed files are most easily read. There are two ways to compress a file,

compress filename &

Replace "filename" with the name of the data file you wish to compress. This creates a new file with the extension ".Z". For example, if you compress a file called "unit.dat," a compressed file called "unit.dat.Z" would be created, replacing the original file (unit.dat).

gzip filename &

This creates a new file with the extension ".gz". For example, if you compress a file called "unit.dat," a compressed file called "unit.dat.gz" would be created, replacing the original file (unit.dat).

To read the .Z compressed file into SAS without having to uncompress it beforehand, you should add the following to your program:

FILENAME pipedata PIPE 'zcat /usr2/users/student/mkaushik/ne/unit.z' ; /* pipedata is user choosen word */
DATA weights;
INFILE pipedata ; /* This pipedata is user choosen word */
input cogscore race hosmokin htn hochf age female charlson hrtdis pf36 mh36 pn36 enr36 sf36 nyha;
run;

/* Zcat decompresses the data of the input file, and writes the result on the standard output. This data is piped into SAS for being read in. */

Gzipped files can also be read in using following commands.

FILENAME pipedata1 PIPE 'gzcat /usr2/users/student/mkaushik/ne/page.gz' ; /* pipedata1 is user choosen word */
DATA weights;
INFILE pipedata1;
input cogscore race hosmokin htn hochf age female charlson hrtdis pf36 mh36 pn36 enr36 sf36 nyha;
run;
/* This would only work on HSPH system and not on Channing system */

 

FILENAME pipedata1 PIPE 'gzip –dc /usr2/users/student/mkaushik/ne/page.gz' ; /* pipedata1 is user choosen word */
DATA weights;
INFILE pipedata1;
input cogscore race hosmokin htn hochf age female charlson hrtdis pf36 mh36 pn36 enr36 sf36 nyha;
run;
/* This would work on both HSPH and Channing system */

Reading Tab delimited files into SAS

Tab delimited files are in following format (note that the tabs separate different data points). Missing data in this format can cause error.

12 10 34

6 6 9

78 89 2

45 84 84


To read tab delimited files, use one of the following statements

INFILE "'/usr2/users/student/mkaushik/shh/cars6.dat" DELIMITER='09'x; /* change the location of the file */
INPUT rake note pgg ; /* these are variable names of your choice*/
run;

Reading CSV file

Each individual Excell work sheet can be converted into CSV or tab delimited files for being read in SAS. CSV files are in following format (note that commas separate different data points). This format is especially suitable if you have missing data.

12,10,34
6,9,0
78,89,46
2,3,2

To read csv files, use one of the following statements

infile '/usr2/users/student/mkaushik/shh/data1.csv' dsd;/* change the location of the file */
INPUT rake note pgg ; /* these are variable names of your choice*/

run;


or


infile '/usr2/users/student/mkaushik/shh/data1.csv' delimiter=','; /* change the location of the file */
INPUT rake note pgg ; /* these are variable names of your choice*/
run;