Limitations of statistics

The purpose of this note is to look at the limits of statistics, using a sampling statistician as illustration and drawing on opening remarks by the author of a newly published, treatment of sampling theory (Singh, 2004).


In the context of setting out the theory of sampling it is useful to keep in mind the grounds on which it operates. Sampling theory is based on a few simple concepts – populations, variables, sampling units, variability, estimators and qualities of estimators, sample spaces and Borel sets of probability measures defined on sample spaces. And so on. Armed with these tools it is possible to construct procedures that fulfil the primary purpose of any branch of statistics: to extend understanding of quantifiable but inherently stochastic phenomena. The statistician stands between an expert who has command of a theoretical apparatus or controls or owns or has a proprietary interest in a data generating concern, and an experimentalist or field manager who makes controlled and verifiable measures reflecting on the organisational or theoretical construct. How do these measures relate to this construct? If the measures are made of the apparatus itself, ‘without error’, there is no need for intervention. Because constructs and ability to understand them have parted company while imperative of governing remains, the terrain for statistical work, by (let us call them) approximaticians, has opened.


An amusing exercise may be to classify the various branches of statistics by the sociological properties of this A-X-B relationship. Who owns the knowledge, who initiates the collection, who controls access to the source, who owns the resultant data, who judges the outcome, and who pays for the exercise. A commands the priors, B the evidence – the posteriors. X improves on the priors using the posteriors. We leave this particular endeavour for another occasion, to look more closely at the idea of statistical knowledge – if it is not a contradiction – as justifying the science of statistics, and distinguishing it from technique pure and simple – to look from the inside out.[1]


Statistics is founded on observations of random phenomena. The randomness is subjective; the immediate observer cannot predict the outcome of any observational episode, an ultimate observer may. It is assumed that this however bears on some underlying process about which it is desired to draw some inference.


What are its limits?


a) Statistics does not deal with individual measurements

The randomness may be in the selection of what is to be measured; in the measurement itself; in what is being measured. Coupled with randomness is incompleteness – we have at best limited access to what is under study. While any quantified study can be framed in these terms, there is a sensible domain restriction based on tolerance. At what point do individual measures interpreted one at a time become unreliable in predicting the behaviour of the whole?


This dealing with the collectivity of measurement marks out the domain of statistics. Likewise the specificity of measures are left to psychologists, to anthropologists, to physicists, biologists, lawyers, politicians or policemen. That is not to say that cognitive or thermal or political properties may not be important in the transformations to which statisticians are party. Questionnaire design is informed by how people perceive, interpret and respond to questions; the ultimate purpose of a survey does not revolve around how individuals respond but in advancing a discrete understanding of the state of the population under study. It is of profound indifference to the statistician if a respondent is truthful provided the estimators used are efficient.


b) Statistics deals exclusively with what is quantifiable

Is the world a better place; is a model correct; is a result important? For a statistician only in so far as there is an objective measure attached, and then only in so far as it gives interesting results. Surveys are useless for gauging feelings or preferences without these being discretised, synchronised and equivalated. This divides the statistics profession from others whose investigations are guided in different ways: for an economist maximising utility is not something that requires a quantification to be implementable; for a medical researcher manifest cause may lie within a postulated chemical pathway inferred from known mechanisms rather than observation; for a lawyer a whole chapter in legal doctrine will flow as much from a single case. Statistics as a body of reasoning begins with repeated measurements.


Sampling theory institutionalises repetition in schemes for randomising and systematising repeated measures toward some predetermined informatical goal. It deals with an economy of collection under design; and an efficiency of estimation drawing on aggregated and repeated measures external to the design and from the experience of collection. Because measures vary (over individuals, time, circumstances of collection) statistical judgement is required; without repetition there is no variation.


Sampling theory deals with all manner of ways in which this quantification of repeated measures can be formalised toward decision procedures for which control over the collection mechanism is retained. Research goals that cannot be translated to a sample design are ipso facto outside its ambit. Thus if the goal is to rid the world of cholera a sampler would be at a loss: not so an epidemiologist. She would know what to look for to make this fundamental biological conjuncture quantifiable, to make a survey sensible.


c) Statistics results are only true on average

Are they true at all? Statistics as a means of guiding decisions are never true. Truth lies in the relation between the theory (A) and its realisations (B1, B2, B3… ). Can we talk about what causes Cholera in a way that will lead to actions that in conception with some certitude will advance a policy of eradication? Certainly we can use exemplary statistical techniques to show cholera prevalence dropping after we reduce reporting funding, or after we do nothing and populations are wiped out by the disease, but that is through no advancement in knowledge. If we are superior statisticians, and have undergone a thorough training in Bayesian methods, we may well induce positive insight by the scientists who have engaged us.


The truth statistics shares with other branches of mathematics lies in the application of functional (in this case stochastic) relations to actual situations; or rather the translation of realistic indeterminacy to a logical calculus standing outside the observable world but with some claims to extend initial assignment of value to statements which hold in that world. The version of truth involved is ‘stochastic truth’. Truth with an element of indeterminacy. Or associative truth rather than logically closed reductive causal truth that empiricists search for: This goes with this more often than not; which direction does the evidence pull in?


Sample theory is built around a decision model: information to make a particular decision is incomplete, for the purposes of the problem; how can what I now know contribute to a ‘good’ decision – not the right one, the one I would have made if exposed to full information, and perfect judgement (not to speak of an adequate ethical construction), but the one that makes best use of what I can know or come to know using means at my disposal. In this regard truth is neither here nor there. On average a ‘good’ decision will resemble the ‘right’ decision. That is, in expectation, over all possible samples, the data informed good decision is the right decision, the only decision that could be made consistent with what has been observed.


d) Statistics can be misused or misinterpreted

The reason statistical data are collected is rarely disinterested. A researcher may seek an argument for advancing or dismissing a theory; a department may wish to target a given population for some action, assistance or retribution. The value of statistical intervention (for an expert or a data analyst) is in uncovering interpretable patterns in the data. Whether the data can sustain interpretation should be the first concern of a competent statistician. In the absence of such an assurance, statistics invite misuse.


Sampling theory furnishes context – ‘design’, ‘process quality’, ‘estimation’ – to the assemblage and manipulation of data into statistical form, that is as functions of sampled data which throw light on the character of the underlying population. Misapplication of statistics results from disengagement of these design elements from the analytical knowledge pool; or more commonly from the investment of the sampled elements with the qualities of their population counterpart. We interpret the sample – not as one realisation of many possible, but as an archetype of the phenomenon under study.


As antidote to the worst of this abuse sampling theory informs how data is to be assembled, processed and interpreted, entirely free of what it ‘means’. Data informs a researcher or subject analyst to the extent that the collection design faithfully reflects the research designs as conveyed to the statistician responsible for it and its implementation. Things go wrong – disastrously for the reputation of otherwise respectable branches of science – when the statistical artifice is mistaken for epistemic fact. Eysenck’s use of quantitative genetics to derive an organic theory comes unstuck because the quantitative givens (regression, correlates .. ) are treated as the elements of a theory of heritability, whereas they take meaning irrevocably in a statistical frame. He mishandles statistics badly in the course of imparting authority to a tendentious opinion. A reappraisal of the evidence correctly employing the statistical constructs gives strong evidence conducive to doubt in relation to his primary hypothesis.



Singh, S., Advanced Sampling Theory with Applications, Kluwer, 2004

Matthews, R. A. J., Facts versus Factions: the use and abuse of subjectivity in scientific research, The European Science and Environment Forum, Cambridge, 1998

Velden, M., Vexed Variations, Review of Intelligence, by Hans J. Eysenck, in the Times Literary Supplement, April 16 1999.

Lindley, D. V., Seeing and doing: the Concept of Causality, International Statistical Review (2000), 70,2 191-214

Quaresma Goncalves, S. P., A Brave New World, Q2004, Mainz, May 2004


[1] But for the other side of the picture See: Quaresma, ‘These options [filtering incorrect records or their translations to other codes] must be presented to the statisticians responsible for data production and who should choose which solution to adopt. Data ownership is always respected and ensured, and the data analyst role is only to help and assist statisticians along the process.’

Why performance indicators fail

PIs fail because they succeed! They are designed to separate a normal from an abnormal state in a ‘system under management’. Yet because they are brought into the system, into the way the system is managed, not merely to give an outward measure, they reduce or distort the capacity of the system to adapt; perhaps in inconsequential ways; perhaps in collusive state dependency.

Examples are easy to find, whether in relation to mechanical systems – the malfunctioning probe that overrides normal system adjustment; or more diffuse systems such as a finanicial system running on normal activity but with artificially maintained valuations. In such cases the detachment of the system of measurement from the nature of the system being measured is obvious after the event: shocking perhaps, but there is in fact no guarantee that certitude in the performance measure translates to macroscopic performance: to the quality of governance. Nor that an absence of evidence of performance translates to an absence of performance.

Yet ‘high performance’ is the currency of work contracts, of individuals as of organisations. It is how we judge managers, and how managers regulate their own behaviour. KPIs are the public face of managerial ability, how rewards are determined; how strategic pathways mapped; and how political programs framed. The ‘gap’ in public discourse is as real as any moral imperative, and in fact the more to be trusted because it lies outside moral or intellectual failures of the past. Closing the gap has moral urgency, because it has subsumed the debate on responsibility for the past, and the continuing failures of comprehension in the policy frameworks adopted.

The gaps – the contrast in outcomes according to social state, a form of social state determinism so relic of class consciousness perhaps if one were to enter into a psychoanalytical interpretation – show up social performance in a large sense. What happens then is outside of any effort or ingenuity in the construction. The ‘gap’ so revealed can be interpreted as a managerial lever. How to most effectively repair policy shortcomings, is to act on the elements of the indicator – reading rates, school performance scores, income poverty levels, crowding and so forth. And as such they have the quality of moving forward, while keeping intact the apparatus that lead to the gap.

That is the danger. Of course how is the citizenry or their body of servants and representatives to know that the system is healthy or not? The levers of government function as legislated; proximate effects are manifest.

Enter official statistics. Without fear or favour, it reflects the nation to itself. What is important, what is simply activity? OS rests on consent – on the authority resting in published measures outside performance within the programs of government – and on privileged access. OS, as expounded by NSOs, labours under a cloud of irrelevence if not illegitmacy, ironically a complement of the fatal success of the KPIs on which much management theory now seems to rely.

My presentation at the forthcoming ASC-IMS conference relates this heuristic to an emerging foundational account of inference within the reality of multiple data sourcing, drawing on the ever fecund concept of a learning organisation from the engineering literature. It nevertheless is foundationally and linguistically statistical.