Saturday, October 19, 2013

Why Big Data Bugs Me

I've been talking to a bunch of serious social science people and it
appears there's a ground swell of backlash against naive big data hype

1. we did this paper to try and capture some of this which we call

2. kate Crawford (see also ted talk) continues a nuanced revision of
what big data's place should be  limited to, and here, how:-e.g.
Big Data Governance
also see Acquistis talk and I also really like Janna Malamud Smith's nuanced book on Private Matters

3. For any "interesting" big data (i.e. about people, not about
particles or weather:), we have major major problems with ground
truths of the data - just starting from selection bias (data mining
twitter or any online material ignores, systematically, the 25% people
who don't use the net - of course, these are in the main, but not
exclusively, the poor, so if you're setting social policy based on
what people online say, then you're ok if your a tory (sorry:) but
anyone else better worry....

but its much more insidious than

4.  On the NHS data, if you claim to be doing logitudinal, cross
population studies, and you don't compare the NHS records to (say) the
center for longitudinal studies' data, then you are missing out
healthy people - this is a bit of an issue if you are trying to do
public health, as the healthy people are the ones that might contain
clues about what to do constructively to do prevention
whereas people that get various syndromes will have a wide variety of
factors that may or may not have contributed...

However, to track the healthy people, you'll need lots of lifestyle data,
which the GP/NHS record wont have whereas a proper study (with random
trials of various things) would do, and would do with informed

Basically, for me, the rush to explit data (in these ways - no
criticism of the natural scientists who use big data all the time
perfectly reasonably) is motivated by a) laziness
and b) cheapskate attitude the above would suggest...

In government, of course, a big problem is that proper use of
logitudinal studies takes several electoral cycles to produce
results, so the "powers that be" are basically "has beens" by the time
the cool new knowledge start to roll in, so they don't get to claim

this is, of course, the biggest challenge to liberal democracies today
(how to make evidence based policy that has planning horizon of 1-10

Oxford Uni have some thoughts on this last problem, although my take is a bit more radical...


The reen and pirate  party in germany are causing the big parties to adopt a
thing called Liquid Democracy - this is crowd sourcing engagement, and
appears to generate long term stable decision making/sticking capabilities
in society that may turn out to be the eventual outcome of the careful
thinking in some parts behind the (less crazy parts of the) occupy and
Indignados and related moves (there's others in russia and some interesting
cover ones in China and in the Islamic countries (not just the obvious Arab
spring ones)....

I personally think that liberal democracy in its current form is incapable
of carrying out long term planning- but  some organisations (that aren't just 
Opus Dei or the Mafia) are able to do it in inclusive ways, and we should figure out how to construct
such systems as needed - long term planning (e.g. for reaching WHO targets
for eliminating diseases) appears to work in some cases - so why not in
others, and how does new tech help (or hinder)?

In my particular discipline, this is what is called "games on graphs" (think
Conways game of life played out over complex topological/topographic spaces)
- we have some ideas why some patterns (memes) are dynamically stable and others
collapse and some take over (endemic) so why not use this stuff....

how does a religion persist for 1000 years? why does a clearly broken idea like free market economics based on rational choice theory last 30 years? How come liberal democracy has survived so long?

time for syndicalist anarchism based on maths:)

