Tag Archives: probability

Evidence Integration in Systems Medicine

In an earlier post I suggested that systems medicine, a new approach to medicine which applies the ‘big data’ approach of bioinformatics, offers substantial promise, but also faces profound challenges, not least the question as to how integrate multifarious sources of evidence in order to discover new causal relationships.

Continue reading

Fun with reference classes

I’m currently working on a paper that I’m presenting at the Philosophy of Science Association biennial meeting in Chicago next month. While I’m making slow progress on the paper, I’ve discovered a couple of examples of practical reasoning about evidence that you might find interesting.

The paper itself deals with a (pretty hoary) philosophical problem known as the reference class problem. Briefly, this describes a difficulty about inferring the probability of individual events by relating those individual events to a group of similar events, or a reference class. That’s definitely in philosophese, so perhaps a nice example would make things clearer. My favourite owes to Connor Cummings (who wrote an excellent BSc dissertation on reference classes), and is about house fires. Say that I want to estimate the probability of my house burning down during the next year. Which statistics should I look at? Well, I could look at those statistics that describe the number of of houses that burn down each year in the UK as a whole. Or, perhaps, I could look at statistics that deal with houses that are built from bricks? Alternatively, perhaps I should look for figures that describe the chances of houses with blue front doors being consumed by flames? Or (even) statistics for brick-built houses in London that have blue front doors?

Each of these statistics are likely to provide different probability estimates. This means that – depending on our choice of reference class – we will come up with very different estimates of the probability of my house burning down next year. This is just the kind of thing that might make an insurance agent very unhappy. Worse, though, is to come: given that each might give different estimates of our individual probability, which should we prefer? None of them are straightforwardly wrong, because they all describe groups that are in some respects similar to my actual house. Assuming that we could generate reliable statistics for each one, choice between them seems to be a matter of subjective preference. In other words, there doesn’t seem to be an objectively correct choice of reference class.

This is the reference class problem, and it has engaged philosophers of science for at least 65 years (Reichenbach 1949). My aim while putting together my PSA paper, though, is not to try and formulate some novel solution to the problem, but instead to talk about some of the solutions to this problem that have been employed in scientific practice. I was very interested to learn that a recent piece of guidance from NICE’s had suggested that different prescription practices should be adopted for hypertension sufferers of different ages, and from different ethnic backgrounds. The usual first-line treatment for high blood pressure in people under 55 would be an ACEI or ARB:

1.6.6 Offer people aged under 55 years step 1 antihypertensive treatment with an angiotensin-converting enzyme (ACE) inhibitor or a low-cost angiotensin-II receptor blocker (ARB)…(NICE 2011: 17)

However, prescription practices should vary because of both age and ethnicity:

1.6.8 Offer step 1 antihypertensive treatment with a calcium-channel blocker (CCB) to people aged over 55 years and to black people of African or Caribbean family origin of any age. If a CCB is not suitable, for example because of oedema or intolerance, or if there is evidence of heart failure or a high risk of heart failure, offer a thiazide-like diuretic. (NICE 2011: 17)

I’d like to suggest that this difference in recommended prescribing practices is some interesting reference class work on the part of NICE. However, it seems hard to align this kind of thinking with the more philosophical approaches to the reference class problem that I know. Here, I’m largely thinking of Salmon’s (1971) suggestion that we should prefer homogeneous reference classes of one kind or another. However, we know that neither age nor ethnicity form homogeneous reference classes. Yet (as far as NICE is concerned) these groups are intended to behave like homogeneous reference classes, in that a) they are intended to give unequivocal guidance as to the reference class membership of an individual and b) in that membership of one of these reference classes changes individual probability estimates. So what are the grounds for this clinical guidance confidently picking out these groups?

While looking for possible solutions to this difficulty, which I’ll have to leave hanging for the time being, I ran into some very interesting work on the reference class problem in the law. That I had no idea that the reference class problem was something that lawyers argued about is probably more an indicator of my ignorance of the law than anything else, but I was surprised to find several different ways of resolving (or, at least, giving ways of arguing about) reference class difficulties in legal practice. One excellent introduction is the paper by Cheng (2009) in the Columbia Law Review. This also contains a brilliant example of the reference class problem as applied to international drug smuggling, which alone is worth reading the paper for. Anyway, the substances of Cheng’s argument is that inference based on reference classes is structurally very similar to regression analysis. This means, I think, that the reference class problem can be regarded as a special case of the model selection problem. In turn, this means that we can employ established techniques, developed to deal with the problem of model selection, to pick between different reference classes in a principled way. While the details of these techniques – the main one discussed in Cheng’s paper is Akaike’s Information Criterion (AIC) – are not something that I’m not terribly familiar with, this approach does appear to offer the advantage of providing practitioners (legal, in this case) the advantage of at least being able to pick between different reference classes in a consistent manner. I wonder if something similar might be developed for the medical context…


Cheng, EK. 2009. A Practical Solution to the Reference Class Problem. Columbia Law Review. 109(8): 2081-2105.

Hájek, A. 2007. The reference class problem is your problem too. Synthese, 156(3): 563-585.

NICE (2011). CG127: Hypertension: clinical management of primary hypertension in adults. National Institute for Health and Clinical Excellence, London. Available from: http://www.nice.org.uk/guidance/cg127/resources/guidance-hypertension-pdf

Reichenbach, H. 1949. A Theory of Probability. Berkeley University Press

Salmon, W. 1971. Statistical Explanation. In Salmon, W. (Ed.), Statistical Explanation and Statistical Relevance. University of Pittsburgh Press

Seminar announcement: Donald Gillies on Causality, Propensity, and Simpson’s Paradox, 30 Sept 2014

This is part of the (excellent) seminar series on Probabilities, Propensities, and Conditionals convened by Mauricio Suárez at the Institute of Philosophy. You can find more details at their website: http://philosophy.sas.ac.uk/about/ppc-seminar-donald-gillies-30-Sept

30 September 2014, 17:15 – 19:00

Objective Probability, and Conditional Reasoning Seminar: Room G34, Senate House, WC1
Causality, Propensity, and Simpson’s Paradox
Donald Gillies (UCL)

Contemporary medicine uses indeterministic causes, i.e. causes which do not always give rise to their effects.  For example, smoking causes lung cancer but only about 5% of smokers get lung cancer.  Indeterministic causes have to be linked to probabilities, but the nature of this link is problematic.  Seemingly correct principles connecting causes to probabilities turn out to be liable to counter-examples.  The present paper explores this problem by interpreting the probabilities involved as propensities.  This associates the problem of linking causality and probability closely with Simpson’s paradox, thereby suggesting a way in which the problem might be resolved.

What’s the difference between data and evidence

This is a question that came up while I was writing a talk about the difficulties that might be encountered when translating evidence policies from one context to another for my home department’s Annual Research Day a year or so ago. You can find a copy of the slides here.

The plan was to say something about the way that EBM has influenced non-medical decision-making. The original rationale for EBM was a) to de-emphasise individual judgement, based on clinical experience, as a sufficient foundation for making care decisions and b) to instead base care decisions on evidence, particularly that arising from clinical trials. To quote perhaps the most widely-cited paper on the subject, EBM is the:

“conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients” (Sackett et al 1996)

However, a cursory glance at the topics of article citing Sackett – all 9845 of them, at the time of writing – suggest that there is a growing interest in exporting this method of making decisions far outside the original context of medicine. These include papers on education policysocial work and – most interesting of all – dealing with architecture as a means of crime control. While an analysis of the reasons for this wide circulation are fascinating (and hopefully the subject of a later post), they’re a bit beyond what I want to talk about here. Instead, I want to simply claim that EBM’s tools and tactics have had a really wide circulation in the last 10 years or so, with the most visible new locus of practice in the evidence-based policy (EBP) movement.

Yet this change in application poses tough questions about translation. How should EBM – a method that depends on practices that are pretty specific to medicine – be modified to give useful answers to those making decisions in other contexts? A further puzzle concerns the role of philosophers of science in all this. While there are many questions here that might benefit from a philosophical treatment of of one kind or another, the contribution from philosophers have not been terribly helpful to this conversation.Given that I really believe that philosophers can and do meaningfully contribute to this kind of conversation, I will conclude by suggesting a few ways that we might provide a more useful (and more critical) contribution to the philosophy of evidence-based something. To illustrate this, I’d like to talk about one specific question thrown up by the circulation of practices from EBM to EBP. This starts with an ostensibly simple question: what’s the difference between data and evidence?

The data-evidence distinction

Why care about this distinction? Well, it appears to be one that gets made very frequently in EBP. We can find lots of examples of practitioners making distinctions between data and evidence. My quick web search this afternoon threw up examples including one by the UN’s Data Unity Network, or the South Downs National Park Authority or the Marine Management Organisation.

But it’s not very clear from these examples exactly how this distinction gets made. Is the distinction something that comes over to EBP from EBM? Well, I think the short answer here is ‘no’. I can’t find a detailed analysis of any such data/evidence distinction in the EBM literature. However, my intuition (and perhaps one that I might be able to defend if pushed) is something like this: EBM proponents typically claim that evidence alone should be used when making decisions about healthcare (look at the Sackett quote above). Yet this evidence often depends on data gathered during, for instance, clinical trials. Here then, data and evidence can be locally distinguished. Information about individual trial subjects is data. But once aggregated via appropriate statistical work, and reported as the result of a trial, it becomes evidence, which can then be used to address a clinical question.

This local distinction isn’t very helpful outside EBM. Perhaps because EBP decisions often involve looking at processes only measurable at a group level (in economics, for instance), the EBM distinction between individual data and group evidence is unlikely to be applicable. So the data/evidence distinction that is being made in the examples above can’t just be made in the same way as it is in EBM. Can we find some more general way of distinguishing data from evidence by looking at the literature on the philosophy of evidence?

Philosophers and the data-evidence distinction

Well, at the outset, looking to philosophers of science for help with this question appears promising. There is a great deal of philosophical work on evidence, and some of it contains distinctions between data and evidence. Perhaps it might be possible to translate some of this work to the EBP context? Let’s take a closer look at some of this philosophical work. I’ve picked a pair of ways of making the data-evidence distinction that have appeared in the philosophy of probability literature:

Mayo’s error-statistical philosophy of evidence

Mayo’s idea is that evidence describes a special sub-set of our data. More precisely, when a particular hypothesis is tested using a particular set of data (arising from a clinical trial, say), that data becomes evidence in relation to a particular hypothesis.

data x are evidence for a hypothesis H to the extent that H passes a severe test with x. (Mayo 2004: 79)

This seems a pretty plausible way of making the data/evidence distinction that might be suitable for either EBM or EBP.

Subjective Bayesian view of evidence

This view essentially distinguishes data from evidence by defining evidence in a way that (negatively) defines evidence. Here, the primitive concept is the acceptance of some evidential statement. Anything that leads to that statement is (basically) irrelevant, or at least not defined. For us, this might well include data.

The Bayesian theory of support is a theory of how the acceptance as true of some evidential statement affects your belief in some hypothesis. How you came to accept the truth of the evidence, and whether you are correct in accepting it as true, are matters that, from the point of view of the theory, are simply irrelevant. (Howson and Urbach 1993: 419)

Here, then, the idea is that evidence is constituted by those statements that affect belief in some hypothesis. Everything that leads to these statements – data, for example – is lumped together as an irrelevance. Like Mayo’s distinction, this also seems a pretty plausible way of making the data/evidence distinction that might be suitable for either EBM or EBP.

So what’s the problem?

Given that both ways of distinguishing data and evidence seem (at least) plausible, which should we prefer to use in practice? For the examples cited, this is where things start to get a bit tricky. As I’ve hinted above, each of these distinctions is rooted in a different theory of probability. Mayo’s distinction comes from the frequentist Neyman-Pearson tradition, while Howson and Urbach’s comes from subjective Bayesianism. Given that both methods appear to provide us with a means of making clear distinctions between data and evidence, the decision about how to make this distinction presumably follows from an earlier decision to adopt one or other general theory of probability.

But picking a general theory of probability is no small matter, either philosophically (see Gillies 2000 for background) or practically. At the very least, the choice of theory shapes the kinds of statistical methods that are appropriate, leading to all kinds of implications for experimental design and so on. And suggesting that we decide how to distinguish data from evidence by first deciding on a general theory of probability is not terribly helpful either (in any case, these kinds of discussions usually regress into ‘theory x is better than theory y‘ foot-stamping). So it is not clear to me just which way of making the distinction we should prefer. However, a more local conclusion is a bit more positive: any distinction that we draw between data and evidence should probably follow whichever general theory of probability is in use.


Gillies, D. 2000. Philosophical Theories of Probability. Routledge.

Howson and Urbach 1993. Scientific Reasoning: the Bayesian Approach. Open Court

Mayo 2004. “An Error-Statistical Philosophy of Evidence,” in Taper and Lele (eds.) The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations. University Of Chicago Press: 79-118.

Sackett, D., Rosenberg, W., Gray, J., Haynes, R., and Richardson, W. 1996. Evidence based medicine: what it is and what it isn’tBritish Medical Journal312(7023): 71-2.