State Spaces and the wide open spaces

Speaker: Peter Caley (Data61, CSIRO)

Topic: Some issues of inference in abundance trends for wide-ranging wildlife species

Abstract:
Highly mobile species that form aggregations can present special challenges for inferring trends in abundance where aggregations are spatially sparse, highly localised and sometimes transient in nature. Drawing from Australian examples of waterbirds, flying foxes and cockatoos, this talk explores how practitioners have grappled with some of these issues, and appeals to the statistical brains trust to get involved.

Biography:
Peter Caley is a research scientist with CSIRO Data61. He has a background in applying quantitative methods for addressing contemporary problems in the environmental sciences. Topics have included wildlife & human disease epidemiology, vertebrate pest ecology & management, plant & insect biosecurity and extinction inference.

In the lead up to the next Australian Statistics Conference (ASC2018, Melbourne 26-29 August) I have been refreshing my reading around surveys, state spaces and inference. Peter Caley has been using a state space approach to measure rates of decline or growth of populations of wild water birds within the Murray Darling and Lake Eyre catchments based on 30 years of random transect surveying. The thirty years has been punctuated by two major flooding events – early 70s and 80s so the challenge is to filter out a message on the ecological health of inland Australia from expert observations within a fixed collection design – effectively repeated surveys. As birds move readily in response to climate variation, and do not necessarily stay within one catchment there is much scope for leakage, but there has been some success in devising robust inference, using the strengths of this ‘observational experimental’ scheme – namely ability to split numbers by species, with separate counts for 50 or so water birds encountered; consistent method of collection (single wing small planes able to fly along transects at a low altitude; and now accumulated 30 years data series split between two comparable regions with contrasting climate histories. His conclusions were phrased in terms of the individual species – whether or not they were in decline – and after SS correction could say that there was not a case for ‘steep decline’ across the board, and while it is possible to rank species on vulnerability to decline, on the whole most species were holding their own. I wonder if it may be possible to robustify this commentary further by using a multivariate filter and a new diversity index. This may be a better measure: if species presence was weighted as much as breeding numbers. Routine or cyclic Environmental changes will advantage some species at the expense of others; long term trends connected with global phenomena can affect all species adversely both in numbers and support for diversity, with most vulnerable disappearing first. A further opening may be in counting ‘roosts’ that is congregations of birds using a water resource for feeding, breeding, refuge or lay over. Roost numbers may be detectable at a distance even remotely; or by on site observation without disturbing the birds for the purpose of counting. Roost behaviour may be easier to monitor over time, and there may be a way of enduring identification.

Peter’s talk included the fate of the white necked ibis – ubiquitous in the cities as bin scavengers. I have heard a talk on its cousin, the strawnecked ibis of the plains, where radio tracking of individual birds demonstrated how widely they move and their ability to find their way to surface water over long distances. It is also apparent that flocks and individuals give different profiles. This dynamic structure to the populations of free moving creatures is interesting in itself, and would inform any design; it can interfere with inference in a common design like Caley’s but may also be a useful correlate for the overall diversity question. Noone pretends that the system is easily abstracted, making a state space a good choice for inference model.

It strikes me that there is much to be gained by working in step with the scientists monitoring the health of our natural systems subject to global climate challenge. This can only increase confidence in what we are attempting to sustain.

The omics of Official Statistics

Professor Terry Speed’s AMSI-SSAI Lecture today at the Knibbs theatre provokes the following reflection.

Nuisances crowd out the signal – this is as true in genomics (or any of the bioinformatical omics spawned therefrom – proteomics; metabolomics; transferomics) as it is in modern official statistics, hand maiden to policy and socio-econometric modelling.

Nuisance however deserves attention. In an ideal world all data provided in statistical returns is simultaneously correct and perfectly recorded and transmitted. Furthermore the design of this ideal collection is itself perfect: the data collected is sufficient to answer the questions posed by users in their collectivity, without altering the inclination of respondents to cooperate, nor altering their behaviour in so doing. That is, the measurement process is dimensionless.

No one pretends that these conditions hold, or even approximately hold.

Instead the data resulting from the collection effort is conditioned by a quality framework that allows it to recede to the background. Official releases thus come with two crutches: formal rules of population inference – what can be inferred; its accuracy – centring on a true value, and precision – the width of the interval around an estimate containng the true value with certain confidence; and adherence to the nuisance-containing practices embodied in the collection operation.

These practices comprise the design. And this explains why official statistics is stubbornly design-based, even as statistics proper has struck out into the protean world of model building and model-based inference.

Both model-based and design-based approaches have been compromised by nuisance effects despite the loud and redundant appeals to ‘scientific method’ or ‘quality assurance’ respectively. In the one case data richness (and sample size) and spurious replicability obscured the real limitation of data acquisition; the other the drag induced by quality assurance required a stability in underlying processes which has patently been compromised in an external context of open data borders.

Can the negative control method elegantly applied to bioinformatics save official statistics too? Or rather if we take nuisance more seriously may we be inspired to find a more solid platform for the presentation of statistics used in public discourse?

If we restate the issue slightly differently – how to extract a consistent, reliable and useful signal of bearing to social governance from a multiplicity of data frames, where the criterion for signal quality (analogous to the deeper scientific truths underpinning bioinformatics or statistical investigation of physical or chemical phenomena) is encoded in the legislative ethos of government itself.

This not only allows nuisance but assumes it: the act of reducing an uncontrolled flow to a signal under metastatistical protocols (such as pre-existant or circumstantially imposed indicator series; or standards) is the badge of official statistics, best expressed by appeal to design. Certainly it is possible to improve on theory; most transparently by reviewing how deviations from design (for instance dealing with overlapping discordant collections) build a core assurance mechanism.

It happens that the methods put forward by Professor Speed in bioinformatics; and the discordancy accepting extension results that can be built from the geometric basis to sampling theory of Paul Knottnerus’ text play similar roles in the respective contexts. In both cases a fresh appraisal of the context in which statistics is applied has lead to results with immediate application as well as great generality.

References
Knottnerus, P., Samnple Surveys – a Euclidean view, Springer 2003