Monthly Archives: March 2014

It’s not personal

This is personal. I’m a know-it-all nerd. When NHS Director for Patients and Information Tim Kelsey said about pseudonymous care.data “No one who uses this data will know who you are” there were two possibilities. Either he’d just massively misled the BBC Radio 4 audience, or I’d misunderstood something. And for me at my worst, either of those is equally bad.

So I sought to find out which it was. I wrote a piece “Your NHS data is completely anonymous – until it isn’t” explaining how I understood things: that even pseudonymous care.data runs a high risk of re-identification. The piece got published, it got tweeted 100s and read 1,000s of times, but it never got challenged on the central argument. Daniel Barth-Jones educated me on low probability re-identification scenarios (see here for intro), but no-one told me all my re-identification scenarios were in that category. Some people said I’d failed to highlight the benefits of care.data – but most of the potential benefits are beyond dispute.

This happened in a period of lively Twitter discussion on what was exactly going to happen in the care.data program. There was so much incomplete and contradictory information about – some of the things I heard then, from people who should know, are still being officially contradicted now, 2 months later. Yes I included “@tkelsey1” in a few tweets advertising my piece, and repeated this a few times when I realised his claims of anonymity were spreading into some newspaper (and all BBC) coverage of care.data. Many inquisitive tweets on care.data by me and others already got “@tkelsey1 @geraintlewis” added to them, and Geraint Lewis (NHS England Chief Data Officer) often did engage. At some point I found Tim Kelsey had blocked me on Twitter.

What? Blocked on Twitter, when I’d been using a “corporate”, “professional” account? I thought I had got flamewars out of my system on rec.music.misc in the early 1990s. I’d only been on Twitter for a few months, and I felt disappointed with myself at having failed to avoid a beginners’ mistake in this new internet context. That is, until I found out there were a few others, some no more aggressive than me, who had also been blocked by Kelsey. I then also realised: it’s not personal.

The next realisation hit me sometime during the week when all the HES data sharing scandals came out. Someone mis-briefed junior minister Jane Ellison, leading to her misleading parliament by saying that the SIAS insurance HES data had been “green” rather than “amber” pseudonymised. Whoever briefed her: ignorance, incompetence, or a calculated risk? Twitter found it out instantaneously, and Jane Ellison corrected it within a few days.

The other HES data leak stories were all about data companies: making HES data easier to manipulate, available on the cloud, on maps, and for promoting social marketing. One of the best known data analysis companies in the country was of course Dr Foster, of which the same Tim Kelsey was one of the founders. From that experience, would he have known about re-identifiability of pseudonymised data? It seems almost insulting to suggest that he wouldn’t.

It appeared with the dubious HES sales that it was particularly pseudonymised data which had escaped all scrutiny and detailed approval from HSCIC and its precursor NHSIC – while they have solid independent advisory groups in place for advice. There must have been a line to justify this. Well, here is my guess for that justification:

“It’s not personal”.

If the argument that pseudonymised information is anonymous can be sustained, then pseudonymous information cannot possibly be “personal information”. This removes it from protection through the Data Protection Act. This means no-one who holds or receives the data incurs any obligation towards the data subjects. It will allow them to process the data in any way they like, and take automated decisions on the basis of that processing that affect the data subjects – to mention a few things which are severely constrained by the DPA. Thus, avoiding the DPA makes the data much more powerful, and cheaper to deal with in financial and administrative terms.

This is an argument that Tim Kelsey cannot afford to lose on behalf of HSCIC. “Customers” of HSCIC, commercial or not, will want cheap no-strings data – even when they are not intending to abuse it by re-identifying first. The corporate risk register for HSCIC lists “failure to secure the full amount of budgeted income” as one of the highest impact risks, with a medium exposure. If this is at all considered variable, it can hardly relate to the income HSCIC gets from central government – more likely it relates to selling data. HSCIC needs its care.data customers.

I don’t think the argument is sustainable. Ben Goldacre appears to agree:

So, no matter how loudly Tim Kelsey shouts it isn’t personal, or even when HSCIC redefines anonymity “by virtue of the right controls” (page 19), we should keep insisting that all non-“green” care.data is personal, and ensure it gets the protection through the DPA and its enforcers that it warrants. I have heard no indication that the new European Data Protection Directive will make things any worse in this respect.

Sure, blocking me on Twitter wasn’t personal. Tim Kelsey may have felt a bit nagged by my tweets, but I was mainly nagging about an uncomfortable truth.

UPDATE (25 March 2014): FoI requests by Phil Booth and Neil Bhatia led to HSCIC responses that confirmed much of this and my previous speculations on care.data risks as facts. The essence of this blog post published at The Conversation. Maybe also interesting: slides of a talk I gave on care.data at the University of Bristol crypto group.

So where are we now on care.data?

Things have moved on rather dramatically since my last blog post on the topic. care.data has been delayed by another 6 months, there were some scandals on shared HES data, several Westminster meetings, the last of which produced amendments (and for once no ministerial errors of fact?). I have now also started giving lectures on care.data (to Kent students this week, to Bristol colleagues in a few weeks’ time). So it makes sense to provide an update of my views on the issues here. This in addition also to an article I wrote last week on the 3rd party use of the data.

In my first article, “Outdated laws put your health data in jeopardy” I described the system and listed a few worries, which I revisited a few weeks ago in a blog post. Following on from there …

The legal set up, and weakness of DPA: the parliamentary session on Tuesday 11 March considered and voted down an amendment to increase the penalties on abuse of the data. The contrast between 20 well-informed critical MPs from both sides of the house and a few health ministers discussing the issue, and a mob vote of 500 MPs is a bit shocking. See here for a sensible amendment and reasons why the government’s accepted amendment isn’t good enough. Of course, the new European data protection directive, once agreed by the Council of Ministers and effected in the UK, will allow more serious penalties than the current DPA – up to £100M or 5% of a company’s turnover.

Intelligence services: still a risk, no progress. Had a nice time talking at a Law Society debate on Mass Surveillance last week, where I did manage to drop in care.data as maybe the story that wakes up England on privacy.

Honeypot Value and security: HSCIC have declined to answer a Freedom of Information request from Julia Hippisley-Cox asking to report on the number of past data breaches and audits. Given that even NSA and the big tech companies have been shown unable to protect their own secrets, this worry will never go away completely.

On potential abuse by commercial companies: see my article “Time for some truth about who is feeding off our NHS data” for an overview and analysis.

Anonymity: let me write a separate post on that. I may have been naive so far.