«Brief Description Just because data can be made more accessible to broader audiences does not mean that those people are equipped to interpret what ...»
When Mike Ananny downloaded the gay male dating app Grindr to his Android phone, he was given a few recommendations for other potentially interesting apps. To his surprise, one of the recommendations was for an app called “Sex Offender Search.” Knowing that there could be multiple explanations for the recommendation, he crafted an article in The Atlantic addressing this issue. Although unlikely, perhaps people were downloading both apps together? Perhaps there was something about the language used in the apps that produced the curious connection? Perhaps something about the limited Marketplace categories is what prompted the connection? Or maybe something more arcane, like the app creation date or the download frequencies of the apps? Although he wrote about this publicly, Ananny never got an answer from Google; they simply deleted the link so that future Grindr customers didn’t see the same thing that he had.
In a world of data, connections are algorithmically produced all of the time.
Sometimes, as in the case of recommendation systems, these connections are made visible to the public. The public doesn’t just brush off a connection made by an algorithm; these connections are taken seriously, assumed to be legitimate or at least accurate. This is precisely why recommendations systems are valued. Yet, what happens when algorithms produce connections that prompt people to interpret them as meaning more than simply an algorithmic connection? In the case of the “Sex Offender Search” recommendation, the reason that Ananny was rightfully upset is that there’s often an implicit homophobic assumption in American society that gay men are sex offenders, and Google’s algorithm Data &Society Research Institute datasociety.net seemed to reproduce that prejudice. As such, this recommendation seemingly reinforced a very bigoted notion. Perversely, historical data on criminal activity might indicate that gay sex was associated with sex offenders because gay sex was criminalized, or written about extensively in association with criminality. In an instance of algorithmic rational discrimination, previous data may well indicate that the two go together, even if the societal norms and laws surrounding the original collection of that data have since shifted.
How can individuals address offensive interpretations of what they are inferred to like, or with what they are associated with, without having a clear source for the root of the problem?
Algorithms are never neutral; they are shaped by the decisions made by their creators, the data that is introduced, and the cases that they are tested against. At the same time, statistical correlations and algorithmic inferences are not aware of the ways in which their mathematical results will be interpreted by the person on the receiving end. What happens when the output of data results in inaccurate, if not problematic, interpretations?
Questions to Consider
• What are the major social, cultural, and ethical tensions that emerge because of data misinterpretation? Should this issue be approached differently when thinking about the diverse ways in which individuals, professionals, organizations, and algorithms misinterpret data?
What conflicting values and tradeoffs are at stake? How do we understand relevant • actors, stakeholders, and "camps"?
How is the misinterpretation of data different in different domains (e.g., cosmology • data vs. personal health records vs. cell phone location data)?
What other salient case studies that highlight the tensions, tradeoffs, and issues?
• Who should be responsible for how data is (mis)interpreted? What is the role of the • government? Of data providers? Of tools that allow people to manipulate their own data? Of educational institutions? Of media? How should people and organizations be held accountable?
Who should serve as a data caretaker? What is the role of the government in • supporting, regulating, protecting data caretakers?
How do we balance people’s right to their own data versus the need to protect people • from inappropriately using data?
When and where should access to information be curtailed because of the potential for • misinterpretation or abuse? Who should get to decide when information should be curtailed? (Consider the differences across and within different sectors/fields. Are there differences between personal health data and virus outbreaks? Are there differences between educational data and national security data?) If data is subject to lots of potential misinterpretation, how can it be framed in a way • that makes its intent or purpose more publicly accessible? What can be done to minimize how often and in what contexts data is misinterpreted?
Data &Society Research Institute datasociety.net