Saturday, December 30, 2006

Finding the number of times a threshold has exceeded cutoff

I was approached by a student who had a time series data (120 individual animals observed at 250 time points). She want to find the following:

1)How man times an animal's outcome has been above a threshold for 5, 6,7 ... j consecutive times?

2) An animal can be above a threshold for 'j' consecutive times and then go below it for 'n' consecutive time points and go above it for 'k' consecutive time points . I would prefer this pattern to be counted distinctly and also as only once.

/*this solution was offered through SAS-L */
/*Generate test data */
data test;
do animal = 1 to 120;
do time = 1 to 250;

outcome = floor(10*ranuni(123) );

output;

end;

end;

run;

/*First step is to create a variable indicating whether the threshold (3, for example) is exceeded*/

data step1 /*/ view=step1/*;

set test;

over = (outcome > 3);

run;

/* Next, reduce to one observation for each series of consecutives */

data step2(drop = outcome) /* /view=step2 */;

do consecutive = -1 by -1 until (last.over);

set threshold;

by animal over notsorted;

end;

run;

proc freq data=step2;
tables consecutive / nopercent;

where over;

run;

DavidAtCWRU said...

I'm sorry, but the output in the PROC FREQ provides the incorrect information to the client. The WHERE OVER; statement is insufficient.
The client needs only the counts (frequencies) of the ending run length where OUTCOME=1 (consecutive runs above threshold).
Solution: First, add one line to data step 3 after the BY statement:
LASTOVER = last.over;
Next, change the WHERE statement in the PROC FREQ as follows:
WHERE OVER and LASTOVER;

The resulting frequency table correctly gives the frequency of the LENGTH of the overruns where OVER=1(outcome over threshold).

This is a terrific example for my more advanced SAS students. Thanks for posting.
David Bruckman
Case Western Reserve Univ., Cleveland,OH.

Jian Zhuo said...