Two government officials meet at a training camp somewhere in the 1950s. “What brings you here?,” asks one the other. I’m tasked with data analysis, replies his pal. “What’s that about?” asks the gentlemen. “Oh, that’s all about collecting my DA and TA.” But data, and data analysis, have come a long way ever since, especially with the arrival and advancement of the internet. Data is the new oil. There’s a lot of digging and exploration online and offline as companies frantically collate information about individuals in an attempt to index, define and categorise them so that they could commercialise the stack and, in the process, raising myriad questions about the way these agencies — companies, governments and other similar entities — go about the exercise.

In We Are Data: Algorithms and the Making of Our Digital Selves , John Cheney-Lippod, a social scientist from University of Michigan, looks at how our identity is defined online based on the data we generate online, knowingly or unknowingly. He explores how this manufactured or customised identity often contradicts our offline self and life, putting us in danger because we are, more and more, getting identified and categorised based in this ‘other’ identity.

The other self

If you are a Google user, you’d know how, even though you may not be so sure about the why of it. Suppose you ‘Google’ for a product that you later thought you would not want to buy. In a regular offline scenario, the process ends then and there. Maybe in future you may rethink, in a very organic way, about it and may reconsider your decision. But you know this won’t happen so naturally in the online world. As you search for products, algorithms that track your activities collate data, in real time, about your search activity, location, the kind of device you perform the search on and many other things you may not be aware of, and builds up a definition of you.

This definition includes your gender, most importantly. When Google analyses your browsing data and assigns you an algorithmic gender, the search giant is least bothered of the authenticity of it because its priorities lie elsewhere — such as in the commercial viability of that particular gender definition (which means whether you can be a presentable to an advertiser), and whether that can be enhanced and improved upon. “Google’s gender is a gender of profitable convenience,” writes Cheney-Lippold.

And this is a ‘corrupt’ categorisation, the author elaborates, anchoring on legal scholar C Edwin Baker’s concept of corruption that occurs when such segmentations are driven by preferences of power or money rather than a particular group’s needs and values. When this corruption happens, what makes the users really hapless is the fact fact there is positive interactivity in this. Which means, you can’t tell Google that it’s definition of you is only it’s and not yours. Maybe, there is a way you can tell Google about this, but chances are you are not paid attention to.

But isn’t this a mere psychological or philosophical problem? At an age when your online and offline lives are forcefully merged and more attention and importance is given by companies and governments to your online avatars, such wrong definitions can have far-reaching consequences, especially given the fact that machines and machine learning are not infallible and can make mistakes in judgements that are cemented in correlations rather than contextual understanding of reality or truth.

Being human

Cheney-Lippold gives the interesting example of a robot-judge, which will evaluate a case only based on plain correlations of facts. These correlations, a result of big data analysis, can often miss the sociological, psychological and even historical context in which an act (crime) has been committed, and make an error of judgement. Whereas, a human judge, still aided by such analytical tools, can bypass the force of correlations and take a stand that may seem to be in contradiction with the system of machine justice but will reflect the mores of justice.

This is not to justify the existing pores in the judicial system or to discard the advantages that artificial intelligence or machine learning bring in. But in an era when the individual becomes a data-vending machine that has no particular control over the way her data life is created and manipulated, it is important that society adopts some checks and balances to make sure there won’t be a data caste system at work creating stacks of data untouchables. Google’s search preferences already pose this threat, as Cheney-Lippold suggests.

Here, the algorithms first define you and then rate you based on your online activities so as to customise your searches and bring you ‘better’ results. But the better is Google’s better, not necessarily yours. In a way, the search engine re-configures your identity. This isn’t an error, argues Cheney-Lippold, but a “freshly minted algorithmic truth that care little about being authentic but cares a lot about being an effective metric for classification.” In this world, the author explains, there is no “fidelity to notions of our individual history and self-assessment.”

In essence, the search giant or the data company is not particularly concerned about your past or even present but what matters to it how you are going to perform a commercial activity in the future. This is disturbing and should be controlled. As the recent data manipulation scandal involving research firm Cambridge Analytica and Facebook indicates, if there are no transparent and stringent rules to protect how individual data is used by various agencies, as a society we are exposing ourselves to the dangers of what Frank Pasquale called ‘The Black Box Society’, where “secret algorithms control money and information’ and manufacture new forms of consent courtesy big data manipulations.

Big data should not be allowed to become bad data. Cheney-Lippold relies on philosopher Antoinette Rouvroy to elaborate on the perils of big data becoming toxic data. Rouvroy says the big data ideology makes us feel that one does not have to produce knowledge about the world but could discover it directly from the world itself. The Cambridge Analytica scam stands testimony to this, telling us why the notion that algorithmic understanding matters more than other forms of judgement is plain wrong. We are more than the data that we are. John Cheney-Lippold’s meticulously researched and fabulously written work is a great handbook for all those who want to explore this fact. He reminds us that it is also cool and organic to humanly define data as DA and TA. Errors and guffaws make us human.

MEET THE AUTHOR

John Cheney-Lippold is an assistant professor at the University of Michigan. He writes on the relationship  between digital media, identity, and the concept of privacy.

comment COMMENT NOW