The idea in a nutshell: I mean this as a compliment – Normal people don’t understand ‘Big Data’ and propensity modelling. They could, of course, they just haven’t been introduced to the concepts. The people behind the governments imposing new legislation all over the world, however, do. In establishing their right to ‘Metadata’ (in many cases without a warrant), they also establish their right to propensity model their results. This could give them the ability to do some Minority Report guessing. And that could be bad for people who like things like privacy and freedom.

Asimov’s Foundation Series

I am currently re-reading a series of books written by Isaac Asimov, one of the world’s most famous Sci-Fi authors if you happen not to know. In it, he postulates a mathematical facility which can be used to determine, from large data sets, what will happen next.

In the real world, correlating two or more data points to indicate the (hopefully statistically significant) possibility of a third is called propensity modelling.

One humorous way of internalising the process at it is shown on the website below. By correlating the divorce rate and consumption of margarine across many years, a statically correlation can be found. The graphs look the same.

Imagine you had done this in 2008 and that, in 2009, you knew the divorce rate in Maine. It would have been fair to assume that you would be able to estimate the level of margarine consumption based on the patterns you’d seen in the past.


It’s funny to me that those things are correlated, but it’s probably not the best example to give. By using large data sets, these people have belittled the idea by creating associations between unrelated things.

In the real world, however, businesses can use big data sets to determine things relevant to the commercial outcomes they seek to influence. There’s a lot of money in it

A real-life example

When I worked for a phone company, for example, a brilliant woman I worked with ran a data sciences team. They had substantial data sets on all of the companies customers. Her team had correlated many factors with individuals’ propensity (likelihood) to leave the company in the next few months. If a user was making fewer calls on the <phone company’s> network and more calls from Optus to other phone companies and they were in the last 3 months of their tenure, they were extremely likely to leave.

That’s incredibly valuable information to a company. Optus, the phone company involved could then call these people and offer them an incentive to stay.

Make no mistake, the commercial value of these insights are enormous. Telstra has just started using it to target ads. Companies will fight to keep hold of their data precisely because it is so valuable.

Imagine this on a Google scale

Based on all the data they have about you – age, gender, browsing history, time online and so on and so on, Google has thousands of data points about you every day you’ve been online. They use these to (amongst other things) show you relevant search results.

Similarly, as recent press reports show, Facebook can accurately determine the political party you will vote for based on what you have ‘liked’ on their site. Remember, you haven’t told them which way you vote, you’ve just liked a bunch of brands which seem cool to you. But, historically, they’ve profiled a bunch of voters and the brands they’ve liked. Now, when they see a pattern of likes (such as yours), they know which way you are likely to vote.

Similarly, companies can now analyses troves of data including times your security cards were swiped in, the files you access and how often to determine whether you are going to slack off. Interestingly, this company was put through an accelerator program by the UK snoopers, GCHQ.

These are important examples, but they are from the private sector. What about when the government does the same thing?

And here comes the government with a requirement for metadata

At the moment, the most demanding government in the world when it comes to snooping is the UK’s. Ultimately, the snooper’s charter in the UK could go here.

The UK government will have access to your browsing history, the time you spend online and a bunch of other data points. Without a warrant, they can work backwards – cross-referencing, for example, all terrorists and the websites they visited. When they see your browsing pattern, they know there is a likelihood that you MAY be a terrorist. Equivalently, they could see people likely to have ginger hair, vote conservative, like old films, give money to their political party …. Anything they like.

It strikes me that this is the discussion they want us to be having. They want us to be talking about metadata and the linear connection with ( the police ? as part of an investigation ? ) sifting through it to find whether we were at a place at a time or whether we visited a particular website. The truth is they could loosely pattern match the same data to determine our likelihood to have any attribute they sought.

Summing up the layers of danger

In Minority Report, Tom Cruise used future technology to estimate the people who were likely to commit murder. He would arrest them before it happened.

Soon, governments, starting with the UK, will have vast sets of data on people, to which their browsing history (and other data points) are being added. Using propensity modelling, they will be able to, without a warrant determine any group they want to – from people who will vote their way to people who might donate money to their party or who might become paedophiles.

Unfortunately, the use of data is not well understood by people. Some are reassured that the government ‘only’ wants metadata. But metadata is data and, with a lot of data, you can guess what people are going to do next. You can spot criminals, people about to leave phone companies and any other attribute you want to see.

Regrettably, even if we can understand and stop governments doing this, we almost certainly will not be able to stop the companies we work and/or interact with doing the same.