In most cases (thus, this is the general case!), some sites are not sampled in some years. In technical words, monitoring data include missing points. The problem with missing points is that eliminating incomplete time series reduces too much the data set to the expense of precision of the estimate and representativity. Fortunately, statistically accounting for missing points is not that complicated in most cases. This page provides some basic advice to account for missing data points.
The general case. Ideally, measures are taken every year at every site. Achieving such a complete coverage of sites and years simplify statistical analyses. This is usually far from being achievable in schemes that rely on a large number of observers, sites, and years. All monitoring sites do not start and end on the same year, and some sampling visits cannot be performed for unpredictable reasons. As a consequence, in most cases (thus, this is the general case!), some sites are not sampled in some years. In technical words, monitoring data include missing points (Figure 1). The problem with missing points is that eliminating incomplete time series reduces too much the data set to the expense of precision of the estimate and representativity.
Figure 1. Example of typical monitoring data collected for a species, at two sites (A and B) that differ in average number of counts, and with missing counts.
Solutions - background. Fortunately, statistically accounting for missing points is not that complicated in most cases. Exceptions are cases, in which data are serially or spatially autocorrelated and in which the missing data make it difficult to estimate the degree of autocorrelation. Here we do not consider such cases.
If data are not autocorrelated, a solution could be to work on the counts per sampling site rather than the sum over sites. For instance if 100 skylarks are counted on year 1 over 10 sites, and 80 on year 2 over 5 sites, the index would be 10 skylarks per site in year 1 and 16 skylarks per site in year 2. Here, a second problem arises: even with the same number of sites, the level of abundance per sampled sites (i.e., the expectancy on each site) likely varies from site to site. This is because abundance depends on habitat (via, habitat preferences of species and differences in habitat quality), and because the probability to detect species may vary among habitats and observers (Figure 1; see also Recommendations regarding detection probability. This makes sites not interchangeable.
Presence/absence data. In logit regression models for presence/absence data the time interval between monitoring surveys do not need to be constant and thus absence of monitoring in a particular year does not cause statistical problems. However, occupancy estimates may be biased if the number of sites monitored during each monitoring survey varies and if the missing sites are not a random selection of previously surveyed sites. In that case, one should subdivide the monitoring data in such a way that each subsample contains only sites for which detection probability is likely to be the same. Hence, the survey must be carefully standardized or preferable designed such that detection probability can be estimated. Then separate trend analyses can be performed for each subsample. An overall trend can be obtained by calculating a weighted mean percentage of occupied sites, using the number of sites falling into each site class as weights. The same principle could be used to account for observer effects.
Count data - Standardization. A general solution for count data to solve both problems of missing points and of site-specific abundance is to, at first, standardise (i.e. dividing or subtracting) site-specific time series by the average abundance over years observed per site (Figure 2). Then, it is legitimate to average these standardised counts over sites per year (which are year specific deviation from the site specific average, and thus account for local nuisance effects, such as variations among habitats or observers - see What's the point about detectability - what is it and why is it important?). The average temporal trend in abundance can thus be estimated as the regression line of yearly sums of standardized counts against time (Figure 2). The structure of log-linear Poisson regression models) can directly account for missing values.
Figure 2. Temporal trend of the abundance index accounting for missing counts and for among-site variations in abundance and detection probability.
Capture-Mark-Recapture (CMR) data. In monitoring schemes using CMR methods, missing data points in a time series are generally not a problem. When estimates of population size are missing for some locations or periods of time, one can standardize the estimates in the same way as explained above for count data.
If, in addition to abundance, survival and recruitment are also of interest, one needs to be aware that the resulting estimates of mortality/emigration and immigration/recruitment apply to different time periods, e.g. two years instead of one year. The estimates need to be converted to the same unit of time for comparisons. Also, if too many sequential data points are missing, the number of marked animals still in the population may get too low for adequate estimates. And it may be more complicated to construct simplified models, in which parameters are kept constant in time. Such simplified models usually provide more precise estimates than models, in which variable parameters need to be estimated separately for each time period.
- Pannekoek, J. & van Strien, A.J. 2001. TRIM 3 Manual (TRends & Indices for Monitoring data), CBS, Voorburg, The Netherlands.
- Williams, B.K., Nichols, J.D. & Conroy, M.J. (2002) Analysis and Management of Animal Populations. San Diego, Academic Press.
EuMon core team; May 2013