Saturday, December 30, 2006
Finding the number of times a threshold has exceeded cutoff
1)How man times an animal's outcome has been above a threshold for 5, 6,7 ... j consecutive times?
2) An animal can be above a threshold for 'j' consecutive times and then go below it for 'n' consecutive time points and go above it for 'k' consecutive time points . I would prefer this pattern to be counted distinctly and also as only once.
/*this solution was offered through SAS-L */
/*Generate test data */
data test;
do animal = 1 to 120;
do time = 1 to 250;
outcome = floor(10*ranuni(123) );
output;
end;
end;
run;
/*First step is to create a variable indicating whether the threshold (3, for example) is exceeded*/
data step1 /*/ view=step1/*;
set test;
over = (outcome > 3);
run;
/* Next, reduce to one observation for each series of consecutives */
data step2(drop = outcome) /* /view=step2 */;
do consecutive = -1 by -1 until (last.over);
set threshold;
by animal over notsorted;
end;
run;
proc freq data=step2;
tables consecutive / nopercent;
where over;
run;
Finding the largest observation
data new;
Infile '/udd/n2man/epimarks/classmarks.txt';
input roll 4. ExamDate MMDDYY10. @17 MidFinal $2. Marks 4.;
obsno=_n_;
/* creates a new variable obsno whose value is the same as observation number in the dataset */
proc univariate;
var marks;
run;
/* see the observation number for the largest variable(s) in the output. Run the program again with following added*/
proc print;
where obsno=15; /* 15 is the obsno from the output of proc univariate */
A sophisticated way
data new2;
Infile '/udd/n2man/epimarks/classmarks.txt';
input roll 4. ExamDate MMDDYY10. @17 MidFinal $2. Marks 4.;
proc sort data =new2;
by descending marks;
data _null_ ; /* null datasets only exist for the particular datastep where they are called */
set new2 ;
If _n_=1 then call symput("IDNumber",Roll);
/* this creates a macro variable IDNumber whose value is the roll of the first observation which is the one with largest marks because of sorting*/
else stop;
proc print data=new2;
where Roll="&IDnumber";
format ExamDate WORDDATE18.;
title "Student &IDNumber Had the highest marks ";
run;
Redirecting sas log and output files
sas mal.sas -log mallog1 -print mresults &
where mallog1 and mresults could be any valid Unix file names. The ampersand (&) ensures that the sas program is run in the background (see earlier posts).
Monday, December 25, 2006
Running SAS jobs in background
If you want to continue editing programs while SAS jobs run in the background, you can do that by placing an ampersand ("&") after program name. For example:
sas myprog &
Running SAS on UNIX in Interactive mode
Anyways, you can run SAS on UNIX with X11 tunelling. The X-Windows interface is similar in appearance and functionality to SAS for Windows.
To do this you need two programs:
- Putty or Secure CRT to connect to the UNIX server from Windows PC.
- An X-Windows server running on your computer.
Download the program from here. Unzip this folder and run setup. You will get a window titled MicroImages X Server.
To run SAS, you will need to configure it to allow XWindows "tunneling."
In SecureCRT, after starting click on Properties.
In the Properties window click on X11 tab and check Forward X11 Packets.
sas
After a pause of 4 seconds or so you would be able to see windows similar to what you see in Windows in the MicroImages X Server window.
The response of SAS in this environment can be a little slow.
Convert character format to numeric format
To convert a variable from character to number format, there are following ways:
rollno=roll*1;
rollno=input(roll,4.0)
Saturday, December 23, 2006
Parsing names using scan and substr functions
DATA dataset ;
INPUT obs 1 name $ 2-45 idno $ 46-58 state $ 59 ;
LENGTH obs 3 ;
cards;
6RED MURAR: DID NOT COMPLETE 09/2251/0018Â 2
7MARLA (WOHL) RANDH 08/3559/0074Â 3
8RED (KRAKAS) MOHLK 07/1452/0025Â 6
9MNIT K. RAJHU 02/1758/0069Â 1
; RUN ;
/*notice the patterns which will be used to identify different variables*/
data grit;
set dataset;
roll1= SCAN(idno,-1,'/');
roll= substr(roll1,1,4);
name1= SCAN(name,1,':');
first1= SCAN(name1,1,' ');
len_temp=indexc(name,'(');
if len_temp gt 0 then do;
last1=scan(name,-1,'(');
last2=scan(last1,1,')');
last3=compress(last2," ");
maiden=trim(first1)||" "||last3;
end;
if maiden=" " then maiden=trim(name1);
proc print;
var roll name1 first1 last3 maiden;
run;
/*Pitfalls detailed here */
/* A very useful program is available here */
Thursday, December 21, 2006
Elegant plotting without ODS
If you want to plot all the graphs in one file without using ODS, you should add the following line at the beginning of your program.
goptions reset=global DEVICE=pscolor gsfmode=append gsfname=mygraph;
**note that ‘device=PSLEPSFC’ and filename mygraph 'usrroom.eps' ; would result in a eps file with colors**;
**note that ‘device=pscolor’ would result in a graph with colors**;
**if you do not use gsfname option, then sas would name the file by default sasgraph.ps. **;
Convert ps files to enhanced post script files for inserting into word documents in unix/hsph by typing
convert file.ps file.eps
ps files can be made into a pdf file in unix/hsph by typing
ps2pdf usrroom.ps graphs.pdf
"No device name has been given--please enter device name:"
If you want to plot separate file for each graph, you should add the following line at the beginning of your program.
goptions reset=global DEVICE=html gsfmode=append gsfname=mygraph;
pdf files
If you want to plot separate pdf file for each graph, you can add the following line at the beginning of each proc gplot command.
goptions device=pdf gsfname=grafout gaccess=sasgastd gsfmode=replace;
symbol value=dot color=purple;
plot bwt*age;
run;
**this would generate 1.pdf file that contains the plots of birth weight with age****;
**However, SAS would automatically map colors to greyscale in this file**
Using WinSCP to run SAS programs
In the screenshot of WinSCP, Novell files are on the left, Unix files are on the right. You can drag and drop to transfer files between systems.
- Create a new SAS program file using WinSCP.
- From the top menu bar, choose Files/Edit new file.
- Name it your_pgm.sas. (NB: No spaces allowed in Unix file names.)
- Edit an existing file using WinSCP.
- Right click on a file. You will see some preset Unix commands (Edit, Copy, etc.)
- Choose Edit to edit the file
- 3 ways to execute SAS as a Unix command through WinSCP
- From the top menu bar, choose Commands/Terminal. Type sas your_pgm.sas
- From the top menu bar, choose Options/Command line. In the command line box that appears at the bottom of the window, type sas your_pgm.sas
- Set up a custom command, as described below
- Add a SAS command to the custom menu.
- Right click anywhere on the Unix side of the window.
- Choose Custom Commands/Customize/Add
- Enter SAS in the Description box.
- Enter sas “!” in the Custom Command box
- The command SAS will be added to the Custom Commands
- Some useful commands at HSPH. (If you set them up as Custom Commands, you won’t have to remember them.)
Description (becomes the menu choice) | Command (this format for custom commands uses “!” to refer to the file you clicked) |
SAS [Version 9] | sas “!” |
SAS8 [In case you need V8] | sas8 “!” |
mpage [multi-page print in the Microlab to printer L17a (2 pages on one side)]* | mpage –Pmlab58 -2 “!” |
lp [stream print to printer LL17a in Microlab]* | lp “!” |
SAS Sudaan | sas8 -path /usr/local2/sudaan90/sas-callable "!" |
*For best results, set “options linesize=78;” in your SAS program.
- Run the SAS program using the custom SAS command
- Click the SAS program file
- Choose SAS from Custom Commands
- Select SAS your_pgm.sas and execute
- Managing your SAS session
- Use Edit to examine .log and .lst files
- Use mpage to print .sas, .log, and .lst files
- If SAS is installed on your PC, consider using it to write your programs using the enhanced SAS editor. Drag and drop your program from the PC side of the window to the Unix side. (Be sure to save the editor contents before you do so). Execute the program using one of the methods, above. Drag and drop your .log file back to PC SAS for viewing in color.
Reading log files
We can use nl (number line) command in such situation. If the file is big, to make the file the readable through scrolling, pipe can be used.
This would be the output.
The default behavior of
nl
is to skip blank lines, as shown above. The treatment of blank lines can be modified with the -b
switch. Some -b
options are -ba
(number all lines), -bt
(number only text lines -- the default behavior)
Running SAS on Unix
The format of the command to run SAS is:
hsph% sas filename
SAS would print the log file in filename.log and the results in filename.lst.
libname new '.'; ** sets the library to the directory in which the program resides**
data ex;
infile '/usr2/users/student/mkaushik/la/tumor' TRUNCOVER;
/** reads the .txt file. Note the way file is referenced using backslash.**/
input id treat y;
proc print;
run;
proc genmod ;
class treat;
model y=treat/dist=binomial link=logit;
ESTIMATE 'thiopeta vs. Placebo' treat 1 -1 / e exp;
run;
You can download tumor from here and transfer the file to your UNIX account.
Aggregate analysis in SAS
If there are aggregate data rather than individual data, analysis in SAS is less intuitive compared to STATA.
/* the aim of this analysis is to assess if the proportion of infection is changing across different years */
data xtrct;
input yr inf mech upgr other;/*infection has number of infection patients */
cards;
2000 12 7 0 40
2001 10 7 1 52
2002 34 13 4 24
2003 30 27 6 21
;
/* We need to convert this wide data set into long dataset*/
data trite;
set xtrct;
ninf=mech+upgr+other; /*total number of non infection patients */
ARRAY a(1:2) inf ninf;
DO t = 1 to 2 ;
num= a(t) ;
indicinf=t; /*indicator if num represents infections or non infections */
OUTPUT ;
END ;
proc print;
run;
/* The data table should be as follows. Note num corresponds to number of cases with infection and without infection.*/
Obs | yr | inf | mech | upgr | other | ninf | inf | infi | num |
1 | 2000 | 12 | 7 | 0 | 40 | 47 | 12 | 1 | 12 |
2 | 2000 | 12 | 7 | 0 | 40 | 47 | 12 | 2 | 47 |
3 | 2001 | 10 | 7 | 1 | 52 | 60 | 10 | 1 | 10 |
4 | 2001 | 10 | 7 | 1 | 52 | 60 | 10 | 2 | 60 |
5 | 2002 | 34 | 13 | 4 | 24 | 41 | 34 | 1 | 34 |
6 | 2002 | 34 | 13 | 4 | 24 | 41 | 34 | 2 | 41 |
7 | 2003 | 30 | 27 | 6 | 21 | 54 | 30 | 1 | 30 |
8 | 2003 | 30 | 27 | 6 | 21 | 54 | 30 | 2 | 54 |
/* To assess the proportion of patients with infections is same over the 3 year period */
title3 '(Two one-way tables followed by a 2x2 table)';
table yr*indicinf / chisq;
weight num;
run;
******************************************************************************
The output is here.
Create a format library from SAS dataset
To create a format or informat from a dataset using the CNTLIN option of PROC FORMAT, you need atleast the following variables in the
FMTNAME – Contains name of format or informat to be created
START – The value of the variable that we want to be formatted
LABEL - The variable containing the format value we want displayed..
There are many optional variables (see online documentation for complete list), the most useful of which is
TYPE - character variable that indicates the type of format. Possible values are
C character format
I numeric informat
J character informat
N numeric format (excluding pictures)
P picture format
The CNTLIN dataset must have these characteristics:
* For both numeric and character formats, the data set must contain the variables FMTNAME, START, and LABEL. The remaining variables are not required.
* If you are creating a character format, a character informat, or a PICTURE statement format, you must specify a TYPE variable with the value that indicates the type of informat or format you are creating.
* If range values are to be noninclusive, the variables SEXCL and EEXCL must each have a value of Y. Inclusion is the default.
You can create more than one format from an input control data set if the observations for each format are grouped together.
Following is the way to create a format from a dataset:
/*************************************SAS CODE *************************************/
options nodate pageno=1 linesize=80 pagesize=60;
libname manas '/udd/n2man/tkk';
libname library '/udd/n2man/'; /* creates a format library in the location /udd/n2man/ */
/* READ THE ORIGINAL DATASET */
/* VARIABLE DIAGCO CONTAINS THE VALUES TO WHICH A FORMAT (IN VARIABLE RISK) SHOULD BE ASSIGNED */
data dummy;
infile '/udd/n2man/tkk/surgpro.txt' missover;
input diagco $ risk desc $;
KEEP diagco risk;
length diagco $8;
run;
proc sort data=sample; by risk diagco;
run;
data new;
set dummy;
start =diagco; /* CREATE NEW VARIABLES NEEDED TO MAKE THE FORMAT */
label=risk; /* CREATE NEW VARIABLES NEEDED TO MAKE THE FORMAT */
/* to make a format from a range of values include the following */
/* end= endrange; *here values from start-endrange will be assigned */
/* a value */
fmtname='$random';
/* CREATES THE FORMAT NAME. IN THIS CASE ALL VARIABLES NEED TO HAVE THE SAME FORMAT */
/* format names can also be in the dataset but in this dataset they had to be created */
PROC SORT DATA = new OUT = FMT NODUPKEY;
BY START;
/* removes duplicate values of start that should only one label. Having two labels for the same value can be dangerous */
proc sql;
create table fmt as
select distinct start, label, fmtname
from new;
quit;
data finalfmt;
set fmt;
/**** This step converts the dataset fmt to format library */
PROC FORMAT LIB=library fmtlib CNTLIN=finalFMT;
run;
/*********************************************************************************/
I have used/copied some segments from http://www.uchsc.edu/coho/SAS_Uers/The%20Many%20Ways%20to%20use%20Formats.doc.
Tuesday, December 19, 2006
Reading CSV files in SAS on UNIX
CSV files are in following format (note that commas separate different data points). This format is especially suitable if you have missing data.
12,10,34
6,9,0
78,89,46
2,3,2
To read csv files, use one of the following statements
data new;
infile '/usr2/users/student/mkaushik/shh/data1.csv' dsd;/* change the location of the file */
INPUT rake note pgg ; /* these are variable names of your choice*/
run;
OR
data new;
infile '/usr2/users/student/mkaushik/shh/data1.csv' delimiter=','; /* change the location of the file */
INPUT rake note pgg ; /* these are variable names of your choice*/
run;
Table 1 Analysis Macro
Channing Labs have their own macro for Table 1 analysis but it adjusts for age which is not required in all analysis.
Monday, December 18, 2006
Making Indicator(dummy) variables
Method 1
if state1 ne 20 & status ne 4 & sex ne .;
else if state1 in (1,3,4,19) then reg=1; /* Bread basket */
else if state1 in (8,16) then reg=2; /* maharashtraAP */
else if state1 in (9,10,12,13,17) then reg=3; /* bimaru states */
else if state1 in (5,11,21) then reg=5; /* north eastWB */
else if state1 in (2) then reg=6; /* Delhi */
else if state1 =. then reg=7; /* Missing */
ARRAY dummys {*} 3. reg_1 - reg_7;
DO i=1 TO 7;
dummys(i) = 0;
END;
dummys( reg ) = 1;
This is similar to the method given here.
Method 2
/* if variable values start from some arbitrary level */
/* following makes indicator for years */
/* Here year values can be from 1988 to 2000 */
ARRAY dummyb {1988:2000} 3. bat_1 - bat_13;
DO i=1988 TO 2000;
dummyb(i) = 0;
dummyb(i) =(year=i);
END;
Macros for creating indicator (dummy) variables are available from 1.
and SAS-L archives
Thanks for the comments Amy!
Exporting a SAS dataset to Microsoft Excel
Proc Export
PROC EXPORT DATA=datasetname
OUTFILE="/usr2/users/student/mkaushik/ names9900.csv"
DBMS=csv
REPLACE;
delimiter=','; */
RUN;
The program (say atlantis) that includes this, should be submitted with the -noterminal option to i.e.:
sas atlantis -noterminal
ODS
ODS html FILE='/usr2/users/student/mkaushik/AIIMS/becker/try2.xls';
PROC PRINT DATA =&dataset NOOBS;
RUN;
ODS html CLOSE;
Use it when a few observations need to be extracted and sent it for review, additional data entry, data confirmation.
Calculating percentiles in SAS
proc univariate data=food noprint;
var omega sat;
output out=pctl pctlpts=20 40 60 80 pctlpre=omega sat;
run;
proc print data=pctl;
run;
Output would be in the following format
omega20 omega40 omega60 omega80 sat20 sat40 sat60 sat80
(from http://www.ats.ucla.edu/STAT/sas/faq/percentiles.htm)
Directing SAS output to a pdf file
*** would direct all the output to testd.pdf file*
*ods pdf file ="%sysfunc
(reverse(%sysfunc(substr(%sysfunc(reverse(%sysfunc(reverse(%scan(%sysfunc(reverse(%sysfunc(getoption(sysin)))),1,/))))),5)))).pdf";
goptions reset=global device=gif ftitle=swissb ftext=swissb htitle=2 htext=2;
data weights;
infile '/usr2/users/student/mkaushik/ne/lbw1';
****change the above line to reflect where you have saved lbw.dat**;
input id low age lwt race smoke ptl ht ui ftv bwt;
run;
proc means;
var age lwt bwt;
run;
proc univariate plot;
var age lwt bwt;
run;
proc gplot;
plot bwt*lwt;
proc corr pearson spearman;
var bwt lwt;
proc gplot data =results;
plot bwt*lwt= '.' pred1*lwt='*' lowerm*lwt='L' upperm*lwt='U'/ overlay;
run;
quit;
ods pdf close;
Making nice plots & name ods html files after SAS files
I recommend using following statement which automatically directs the ODS output to a file with the same name and in the same directory as the program. The following code makes a html file. HTML files can be viewed in UNIX (using lynx) as well as Windows , open quickly and cut pasted easily in Document and Spreadsheet programs.
(remove spaces from following to make this work)
ods html file ="%sysfunc
(reverse(%sysfunc(substr(%sysfunc(reverse(%sysfunc(reverse
(%scan(%sysfunc(reverse(%sysfunc(getoption(sysin)))),1,/))))),5)))).html"
STYLE=MINIMAL;
gpath="/usr2/users/student/mkaushik/web/";
data=
.
.
proc anova;
.
run;
ods html close;
For an html file with frames, you can enter the following code spanning the proc statements
ods html body="/usr2/users/student/mkaushik/web/mixed_output.html"
contents="/usr2/users/student/mkaushik/web/mixed_contents.html"
frame="/usr2/users/student/mkaushik/web/mixed_frame.html";
proc
.
.
run;
ods html close;
In this ODS statement there are three files:
1. BODY which specifies the file to contain the output from the PROC.
2. CONTENTS which specifies the file to contain the table of contents.
3. FRAME which specifies the file to contain reference to the table of contents and the output from the PROC.
Reading compressed data
Sometimes large datasets are available as ASCII files. However, they are too large to be put on HSPH unix system because of limited disk space. These files can be compressed and put on Unix system and uncompressed ‘on the go’. While 7Z and gzip compressed files can be read as well, the simple compressed files are most easily read. There are two ways to compress a file,
compress filename &
Replace "filename" with the name of the data file you wish to compress. This creates a new file with the extension ".Z". For example, if you compress a file called "unit.dat," a compressed file called "unit.dat.Z" would be created, replacing the original file (unit.dat).
gzip filename &
This creates a new file with the extension ".gz". For example, if you compress a file called "unit.dat," a compressed file called "unit.dat.gz" would be created, replacing the original file (unit.dat).
To read the .Z compressed file into SAS without having to uncompress it beforehand, you should add the following to your program:
FILENAME pipedata PIPE 'zcat /usr2/users/student/mkaushik/ne/unit.z' ; /* pipedata is user choosen word */
DATA weights;
INFILE pipedata ; /* This pipedata is user choosen word */
input cogscore race hosmokin htn hochf age female charlson hrtdis pf36 mh36 pn36 enr36 sf36 nyha;
run;
Gzipped files can also be read in using following commands.
FILENAME pipedata1 PIPE 'gzcat /usr2/users/student/mkaushik/ne/page.gz' ; /* pipedata1 is user choosen word */
INFILE pipedata1;
input cogscore race hosmokin htn hochf age female charlson hrtdis pf36 mh36 pn36 enr36 sf36 nyha;
run;
/* This would only work on HSPH system and not on Channing system */
FILENAME pipedata1 PIPE 'gzip –dc /usr2/users/student/mkaushik/ne/page.gz' ; /* pipedata1 is user choosen word */
DATA weights;
INFILE pipedata1;
input cogscore race hosmokin htn hochf age female charlson hrtdis pf36 mh36 pn36 enr36 sf36 nyha;
run;
/* This would work on both HSPH and Channing system */
Reading Tab delimited files into SAS
12 10 34
6 6 9
78 89 2
45 84 84
run;
Reading CSV file
6,9,0
78,89,46
2,3,2
To read csv files, use one of the following statements
infile '/usr2/users/student/mkaushik/shh/data1.csv' dsd;
run;
or
infile '/usr2/users/student/mkaushik/shh/data1.csv' delimiter=',';
run;