Mining Data

posted in: Random | 0
Overview – through Hollerith punch cards

»Overview –
through Hollerith Punch Cards«

Motherboard has an insightful article on the history and future of Diaspora (an open source, distributed alternative to Facebook). The article is very long and alongside the Diaspora-narrative there are several other issues it focuses on. One of them deals with data mining, which – with Facebook as a regular dinner table subject – is touched upon frequently around here.

The argument I hear most often when it comes to privacy issues is »I can’t see how this bit of information could possibly be vital or interesting to a third party«. The thing with data mining is of course the three-letter-word above »argument« lacks: now.

Sooner or later someone might (or might not) come along who can see a relevance, even long after the information is in the open. And often it is not the data in itself that turns out to be explosive, but a new way in which it is connected to other harmless data. Or a new place it’s brought to.

Here’s one fun example from the Diaspora-article:

Last week, the Financial Times reported that a newly uncovered deal between Facebook and the data firm Datalogix allows the site to track whether ads seen on Facebook lead users to buy those products in stores, which is highly attractive intelligence for advertisers. (Datalogix does this by buying consumer loyalty data from retailers, and tracks in-store purchases by matching email addresses in its database to email accounts used to set up Facebook profiles, along with other account registration information.)

The future implications of the email address mix-and-match is not fully clear yet (although for a start I think it’s of nobody’s business what I buy where). But there are other examples where the consequences are very clear. For instance someone disclosed his credit card number to both Apple and Amazon. Ultimately this led to the destruction of his digital life – email account takeover, twitter account takeover, phone wiped clean, computers wiped clean: proudly brought to you by small-scale data mining with a pinch of social engineering thrown in:

Amazon tech support gave them the ability to see a piece of information — a partial credit card number — that Apple used to release information. In short, the very four digits that Amazon considers unimportant enough to display in the clear on the web are precisely the same ones that Apple considers secure enough to perform identity verification.

And finally, my historian-friend’s favourite – and at the same time the most ghastly – data mining example of them all is how the 1933 census in Germany was later used to organise the deportation of Jews:

But Jews could not hide from millions of punch cards thudding through Hollerith machines, comparing names across generations, address changes across regions, family trees and personal data across unending registries. It did not matter that the required forms or questionnaires were filled in by leaking pens and barely sharpened pencils, only that they were later tabulated and sorted by IBM’s precision technology.

Edwin Black (2009): IBM and the Holocaust. Washington DC, p. 107.


Update 4 Aug 2013: I recently learned about the Rosa Liste (pink list). This list was kept by the German empire and subsequently the Republic of Weimar to monitor male homosexuals. In 1933 the list fell into the hands of the new government which used it to go straight from monitor to murder. Case in Point: you never know what the meaning of any given datum is going to be in the future.

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.