- 12:00-12:45 – Sandwich lunch available in the workshop room CGU2
- 12:45-1:00 – Welcome
- 1:00-1:40 – Deborah Mayo: Multiplicity and Unification in Frequentist (Error) Statistics: Learning from D.R. Cox. Slides: [chair: Nancy Cartwright]
Long standing controversies in the foundations of frequentist statistics stem from failure to appreciate the multiplicity of frequentist methods and roles, as well as from erroneous conceptions of the unifying core that links together and directs their interpretation and justification in science. My presentation considers how the work of statistician D.R. Cox serves to explicate the multiplicity of frequentist methods and goals, as well as identify the unification of principles that direct their valid use in realistic scientific inquiries. Moving away from the tendency of critics to focus on oversimplified caricatures of the standard methods (e.g., significance tests, confidence intervals) the tools emerge as satisfying piecemeal learning goals within a multiplicity of methods, models, and experimental designs, and as part of a series of interconnected checks, and reports, of error. The assumption that the overall unified principle for these methods must be merely controlling the low-long run error probabilities of methods (behavioristic rationale), is replaced by an epistemological principle that renders hypothetical error probabilities relevant for learning about particular phenomena. An underlying current in Cox’s work is to unify the goal of controlling frequentist error probabilities on the one hand, with the desire to use probabilistic methods to characterize warranted inference, on the other. I propose a unification that seems to emerge (and which I favor) and contrast it with current Bayesian attempts at unifications of frequentist and Bayesian methods. Finally I discuss some implications for the general problems of error-prone evidence and inference in philosophy of science.
Further reading: Mayo&Spanos-2006-BJPS, Mayo&Spanos-2006a, Mayo&Spanos-PS2004, Mayo-2008, Mayo-Spanos-2008, Mayo&Cox-2006
- 1:40-2:20 – Aris Spanos: Error Statistics and the Frequentist Interpretation of Probability. Slides: [chair: George Gaskell]
The main objective of the paper is to revisit the frequentist interpretation of probability as it relates to the model-based inductive inference associated with Error Statistics.
It is argued that the prevailing views in philosophy of science pertaining to probability and induction are unduly influenced by the enumerative induction perspective which differs in crucial respects from model-based induction dominating current practice. The most crucial difference being that the latter is anchored on probabilistic assumptions that are testable vis-à-vis the data in question – comprising the underlying statistical model – and not on a priori stipulations like the ‘uniformity of nature’ and the ‘representativeness of the sample’.
Moreover, error-statistics brings out the important role of error probabilities in model-based induction, both pre-data and post-data, rendering the traditional reliance on asymptotic arguments of secondary importance. Indeed, it is shown that the traditional enumerative induction, by ignoring the notion of the underlying statistical model, does not utilize the underlying (implicit) assumptions in enhancing the reliability and precision of the resulting inference.
The appropriateness of the frequentist interpretation stems primarily from its capacity to facilitate the task of bridging the gap between phenomena of interest and the mathematical set up of the axiomatic approach to probability, as well as elucidate a number of issues pertaining to modeling and inference. Indeed, model-based induction is used to shed very different light on a number of charges leveled against the frequentist interpretation of probability, including: (i) the circularity of its definition, (ii) its reliance on ‘random samples’, (iii) its inability to define ‘single event’ probabilities, and (iv) the ‘reference class’ problem.
It is argued that charges (i)-(ii) are misplaced primarily because the frequentist interpretation of probability is often misleadingly identified with von Mises’s rendering, instead of the ‘stable long-run frequencies’ variant associated with the Strong Law of Large Numbers. Moreover, by defining its inductive premises explicitly in terms of a pre-specified statistical model, model-based induction demarcates the events of interest within the model’s intended scope most clearly. This renders charges (iii)-(iv) as misleading attempts to assign probabilities to events outside the particular model’s intended scope, i.e. they (indirectly) allude to the inappropriateness of the statistical model in question. In principle, this problem can be addressed by specifying a more appropriate statistical model which includes such events within its intended scope. In this sense the prevailing view that charges (iii)-(iv) constitute a variant of the initial vs. final precision problem is wide of the mark.
The frequentist interpretation of probability, as viewed in the error-statistical framework, is also related to the objective epistemic probability, interpreted as reflecting the degree of “reasonableness of belief”. It is argued that the evidential interpretation of inference based on severe testing renders the epistemic interpretation of probability redundant.
Further reading: Mayo&Spanos-2006-BJPS, Mayo&Spanos-2006a, Mayo&Spanos-PS2004, Mayo-Spanos-2008, Spanos-2007, Spanos-aicmodel-selection
- 2:20-3:00 – Jon Williamson: Bridges between Frequentist Statistics and Objective Bayesianism. Slides: [chair: John Worrall]
Bayesianism – and objective Bayesianism in particular – is usually thought of as diametrically opposed to frequentist statistics. While there have been some attempts to forge connections, e.g., by Edwin Jaynes and by Jim Berger, the two approaches nevertheless appear to be offering competing means to the same ends, statistical reasoning. In this paper I show how frequentist statistics and objective Bayesianism can be construed as complementary.
Objective Bayesian epistemology is concerned with the mapping from evidence and language to rational belief: given an agent’s stock of evidence – which I take to include her background knowledge, theory and assumptions as well as observations – and her language, objective Bayesian epistemology seeks to advise the agent as to how strongly she should believe the various propositions expressible in her language. Broadly speaking, objective Bayesian epistemology imposes three norms on this mapping. First, the strengths of the agent’s beliefs should be representable by probabilities. Second, the agent’s probability function should satisfy constraints imposed by her evidence; in particular her degrees of belief should be calibrated with physical probabilities where they are known. Third, the agent should otherwise equivocate as far as possible: she should not adopt degrees of belief that are more extreme than the evidence warrants. (It is the third norm that sets objective Bayesianism apart from subjective Bayesianism.)
The first norm is self-explanatory and the third – though controversial – is well-studied, with equivocation cashed out via the maximum entropy principle or by appealing to symmetry considerations. It is the second norm, concerning fit with evidence, that is the focus of this paper. Clearly, it is no mean feat to elucidate the constraints imposed by qualitative evidence. But even quantitative evidence poses a challenge. For example, how exactly does evidence of a sample frequency constrain degree of belief? It is here that frequentist statistics enters the picture. Frequentist statistics is concerned with determining a probability model, or a set of such models, that is appropriate given evidence. Here evidence includes, e.g., sample frequencies, modelling assumptions and qualitative knowledge about the structure of the domain. In essence, frequentist statistics concerns the impact of evidence on probability and such a theory is just what is required by the second norm of objective Bayesian epistemology. Under this construal, objective Bayesian epistemology can be implemented by first applying frequentist statistics to isolate those models that are compatible with the evidence, and then eliminating those models which are not maximally equivocal.
To make this discussion concrete I consider one possible implementation in detail. In this example Henry Kyburg’s theory of evidential probability is used to isolate probability functions compatible with evidence; the maximum entropy principle is used to eliminate those functions that are not maximally equivocal. I present an inferential framework that appeals to credal networks in order to determine the probabilities deemed appropriate by this unification of statistical inference and objective Bayesianism. I close by suggesting that such a unification not only clarifies the conceptual foundations of objective Bayesian epistemology but it also clarifies the place of frequentist statistics in scientific reasoning.
Further reading: Philosophies of probability, Evidential probability and objective Bayesian epistemology
- 3:00-3:30 – Coffee
- 3:30-4:10 – David Corfield: Varieties of Justification in Machine Learning [chair: Sally Stares]
The field of machine learning has flourished over the past couple of decades. With huge amounts of data available, efficient algorithms can learn to extrapolate from their training sets to become very accurate classifiers. For example, it is straightforward now to develop classifiers which achieve accuracies of around 99% on databases of handwritten digits.
Now these algorithms have been devised by theorists who arrive at the problem of machine learning with a range of different philosophical outlooks on the subject of inductive reasoning. This has led to a wide range of theoretical rationales for their work. In this talk I shall classify the different forms of justification for inductive machine learning into four kinds, and make some comparisons between them.
With little by way of theoretical knowledge to aid in the learning tasks, while the relevance of these justificatory approaches for the inductive reasoning of the natural sciences is questionable, certain issues surrounding the presuppositions of inductive reasoning are brought sharply into focus. In particular, Frequentist, Bayesian and MDL outlooks can be compared.
- 4:10-4:50 – John Mingers: A Critique of Statistical Modelling in Management Science from a Critical Realist Perspective [chair: Damien Fennell]
Management science was historically dominated by an empiricist philosophy that saw quantitative modelling and statistical analysis as the only legitimate research method. More recently interpretive or constructivist philosophies have also developed employing a range of non-quantitative methods. This has sometimes led to divisive debates. “Critical realism” has been proposed as a philosophy of science that can potentially provide a synthesis in recognizing both the value and limitations of these approaches. This paper explores the critical realist critique of quantitative modelling, as exemplified by multivariate statistics, and argues that its grounds must be re-conceptualised within a multimethodological framework.
- 4.50-5.05 – Coffee
- 5:05-5:50 – Panel discussion on multiplicity and unification in foundations of statistics (Christian Hennig discussion leader)
- 5:50-6:35 – Panel discussion on multiplicity and unification in induction and confirmation (Federica Russo discussion leader)
- 20:00 – Dinner at The Goods Shed
- 9:00-9:40 – Nancy Cartwright: Reducing Errors in Predicting Effects: (Way) beyond statistics. Slides: [chair: Deborah Mayo]
This paper might instead be subtitled: NC v EBP. Evidence-based policy has been the rage for over a decade and there are now a vast number of advice guides, all much of a muchness, teaching us how to tell good evidence when we see it. The guides rank methods for the production of evidence for policy effectiveness; i.e. evidence that the proposed policy would produce targeted effects if implemented. In general all the methods ranked are statistical methods, and that is the problem. Indeed, even more narrowly they are all methods that strive to be as much like randomized controlled trials as possible. But this kind of statistical evidence can at best establish what I call ‘it-works-somewhere claims’ and the somewhere is never here where we aim to implement policy.
The usual label for this problem is ‘external validity’ and the popular fix on the problem currently pushed by philosophers and statisticians alike is invariance. This paper will argue that external validity is the wrong way to express the problem and that invariance is a poor strategy for fixing it. Statistical results are invariant under only the narrowest conditions, almost never met. What’s useful is to establish not the invariance of the statistical result but the invariance of the contribution the cause produces – i.e., to establish what, following JS Mill, I have long called a ‘tendency claim’. Tendencies are the conduit by which ‘it-works-somewhere’ claims supply support for ‘it-will-work-for-me’ claims. But, I shall argue, 1. we need lots more than statistics to establish tendency claims to begin with; 2. we need much different evidence than statistics provides to make tendency claims relevant to the ‘it-will-work-for-us’ claims we need to predict the effectiveness of our policies.
- 9:40-10:20 – George Gaskell: Food risk regulation: the legitimacy and discounting of different varieties of evidence [chair: Aris Spanos]
In the regulation of genetically modified (GM) food products sound science is taken as the sole arbitor of legitimate evidence. Toxicity, genotoxicity and allergenicity are considered to be sound science and constitute the evidence. That the science of toxicology and genotoxicity are somewhat problematic endeavours is not seen as a problem, although in a recent recommendation from the EC on nanoparticles in food products a strong steer towards precaution is advised in the absence of toxicological research.
Other types of risk that appear to exercise the public are discounted because they fall outside the remit of sound science. In this paper I will outline how the cognitive revolution led to some crucial misconceptions of the public, explain the origins of some of the non-sound scientific risks that inform public opinion, and argue that the discounting of these ‘other factors’ runs the risk of bringing regulation into disrepute.
- 10:20-10:45 – Coffee
- 10:45-11.25 – John Worrall: Evidential Reasoning in Science: One Size Fits All? [chair: Jon Williamson]
In an earlier paper (Worrall ), I compared my own views on theory-confirmation in science with those of Deborah Mayo. I argued that, while Mayo attempts to produce a ‘one size fits all model’, there are in fact (at least) two importantly different styles of evidential reasoning or theory-confirmation in science. This presentation continues the discussion especially in the light of Mayo’s  response. It also considers how to reconcile my dualist view with the underlying unifying view that I share with Deborah Mayo (and indeed with every half way decent account) – namely that real evidence for a theory is evidence that is not only ‘explained’/accounted for by that theory but is also inconsistent with (or at odds with) other plausible rivals to that theory.
Further reading: ErrorChapter5.pdf, MayoWorrall.pdf
- 11:25-12:15 – Panel discussion on multiplicity and unification of evidence (Damien Fennell discussion leader)
- 12:15-1:30 – Lunch
- 1:30-2:30 – Round-table discussion on multiplicity and unification in inference, testing, confirmation, and their relations to collecting, modeling, interpreting, using evidence in policy
- 2:30-3:15 – plans for future work