Detection And Estimation Of The Increasing Trend Of Cancer

Detection and estimation of trends in cancer diseases’ rates are important tasks for health authorities engaged in planning forthcoming needs. For that purpose, the annual percentage change (APC) is estimated by fitting a log-linear (Joinpoint) regression model to the data.

In reality, the trend may not be log-linear, and even if it is, the observed data may not comply with linearity when the incident numbers are really small. Indeed, the regression procedure is known to be inefficient when applied to data of small incidence. Nevertheless, this procedure is frequently applied to such data.

We evaluated the efficiency of a procedure (the CUSCORE test) that does not involve any assumption regarding the trend and compared it with the efficiency of regression. Since, unlike the regression method the test does not provide an estimate of the APC, we suggested a simple procedure that provides an estimate of the APC (to be used when elevation is detected).

The CUSCORE test is aimed at detecting an elevation that starts at an unknown point in time with no specific pattern.  It is based on the time interval during which one event is observed.  An event in our analyses includes 4 diagnoses, which is about the expected incidence.  The observed interval is translated to the expected number of diagnoses in that time interval. That “translation” enables to use the same procedure for any baseline value and to control for possible changes in the incidence during the relevant period.

Each of the two statistical techniques was applied to each of 300 datasets simulated to match published results of a study in which regression analysis was applied to 61 (CMML) leukemia cases diagnosed during a 15-year period in a population (Girona province, Spain) size of about 700,000 inhabitants. In that study, an increasing trend of APC=3.3% was indicated but was not found to be significant.  It should be noted that the size of the trend and that of the population, are plausible values in real situations.

Just 22% of the simulated datasets were significant by the regression approach, compared to 39% by the CUSCORE test. Hence, although the efficiency of the CUSCORE was clearly better (by about 77%) than that of the regression, it is still not adequate.

In reality in similar situations, the power of the regression approach is likely to be even poorer, because in real data the trend may not be log-linear, and even if it is log-linear it may not start at the beginning of the studied period. Also, the simulated data are free from “noise” (e.g., missing data) involved in recording real data which will attenuate the efficiency of both procedures. Hence, it is reasonable to assume that the efficiency of both procedures is inadequate, even for quite larger data sets.

Thus, it seems that using strict statistical rules (with respect to the significance level and/or to linearity) is inherently problematic when the incident numbers are small. We suggest basing our conclusions on a somewhat exploratory statistical approach by considering the temporal pattern of the events. That pattern may point to an increased rate when a sequence of events consistently indicate elevation. For that purpose, we suggest inspection of the cumulative q-interval’s curve.

The q-interval is a statistic that reflects the incidence rate between every two consecutive events. It is expected to be 0.5 when the rate is stable and larger than 0.5 when the rate is elevated. Accordingly, the slope of the q-intervals accumulated over consecutive events is expected to be 0.5 under stable conditions and steeper under elevated incidence. Larger elevation will be reflected by larger slopes.  Figure 1 presents examples of that curve under a trend of elevated rates and under a stable situation.  Fig. 1A presents the observed and expected cumulative q-intervals of one of our simulated datasets, in which APC of 3.3% was implemented.  Fig. 1B presents the observed and cumulative q-intervals of a data set simulated under the same conditions as that of Fig. 1A, albeit under stable incidence.

The cumulative curve in Fig. 1A indicates a clear elevation from event 8 on, since, except for one interval (that between events 13 and 14), the slope of the curve between every two consecutive events is larger than 0.5. Thus, indicating consistent elevation in the rates. There is no indication for elevated rates during the occurrence of the first 7 events (that were observed during about the first 7 years) apparently because the rates during that period were quite close to the baseline rate. It is worth noting that none of the two formal statistical procedures yielded a significant result for that data set. The fluctuations of the intervals around 0.5 in Fig. 1B indicate a stable rate over the entire 15 years’ period.

Figure 1-Cumulative q-intervals in simulated datasets. Credit: Rina Chen

These findings are described in the article entitled Detection and estimation of the increasing trend of cancer incidence in relatively small populations, recently published in the journal Cancer Epidemiology.  This work was conducted by Rina Chen from Bioforum Applied Knowledge Center, Ness-Ziona, Israel, and Enrique Y. Bitchatchi from University of Gerona, Spain.