FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 |

«Brief Description Just because data can be made more accessible to broader audiences does not mean that those people are equipped to interpret what ...»

-- [ Page 1 ] --

Workshop Materials: Interpretation Gone Wrong

The Social, Cultural & Ethical Dimensions of “Big Data”

March 17, 2014 - New York, NY

Brief Description

Just because data can be made more accessible to broader audiences does not mean

that those people are equipped to interpret what they see. Limited topical knowledge,

statistical skills, and contextual awareness can prompt people to read inferences into, be

afraid of, and otherwise misinterpret the data they are given. As more data is made more available, what other structures and procedures need to be in place to help people interpret what's available?

Detailed Topic Description:

Data is increasingly being made available for public consumption. Expressions like “information is power” and “information wants to be free” have gained enormous rhetorical traction, and concepts like “transparency” and “open access” dominate discussions of the governance of all types of data: public, private, commercially- and academically-generated, and scientific. But what are the ramifications of broad information accessibility? Who is collecting, structuring, analyzing, and distributing information? Who is interpreting what is made available, for what purpose, and to what end?

Just because data is more accessible to broader audiences does not mean that its recipients are sufficiently equipped to interpret what they receive. Even when people know that the data has a bias, they make decisions based on what it seems to represent about an item, object, or issue. They may place their trust in the institutions or organizations that disseminate it in the hopes that they have looked at the complicated data in some objective and decisive way. Most people do not have experience creating or structuring datasets. Many lack the statistical skills - or even basic fluency in probabilities - to draw meaningful inferences from the data at hand. And even when they do, only a few have sufficient knowledge or expertise to properly contextualize their findings and apply them appropriately. As a result, people can easily misinterpret data that they are given, leading to confusion, anxiety, and suboptimal decision-making.

This affects individuals in a wide range of domains, including knowing which goods to purchase, understanding personal health risks, making smart personal financial decisions, or even evaluating how best to receive, question, or concur with news items.

Organizations may also face challenges in making sense of the data they have access to--especially as information becomes available in huge quantities from a multiplicity of sources. They may not anticipate potential interpretations of the data they collect or use.

Despite a certain sense of inevitable disaster resulting from information overload, both individuals and companies regularly do receive, analyze, and use lots of information from divergent sources to make successful decisions everyday. How do we reconcile the potential (and actual) harms of informational abundance with some of these positive outcomes? What are the right structures to put into place to limit potential misinterpretations? Curtailing individual or organizational access to data does not seem to be the right approach.

People who are accustomed to accessing to information as a right get upset when the state intervenes to curtail access. Yet, there can be serious individual and social consequences when people misinterpret information because they lack the skills, knowledge, and context to do so adequately. For example, Reddit users mistakenly identified a missing person, Sunil Tripathi, as the Boston bomber, Dzhokhar Tsarnev, based on grainy photos released by the FBI of the suspect, Tsarnev, that were compared to photos released by the Tripathi family in their search for Sunil. The media ran with the story, despite the FBI’s assurances to the Tripathi family that Sunil was not a suspect, effectively derailing the search for the deceased Sunil by both the public and a private missing-person’s agency, and upsetting his family with media attention and false accusations. The crowdsourced search was spurred on by the notion that anyone with access to the ‘big data’ of publicly-available photos, municipal video feeds, and other sources of information could properly identify the Boston bomber. Crowdsourced engagement with publicly available data can have serious consequences beyond the targeted ideal outcome. Access to sensitive or highly fraught information can be problematic, as not all data-interpreters are made equal, whether they are researchers or unqualified internet users. Put another way, access to information is not the same thing as access to knowledge.

The challenges of data interpretation raise issues about who should (and should not) have access to data in the first place. For example, New York state law requires professional genetic counseling for anyone who gets access to their genetic information. Are such moves valuable educational interventions or paternalistic governance? Do individuals have moral rights to their own data? To what extent, and under what circumstances? When do such rights trump society’s right to intervene in order to allay fears, preserve the public trust, or achieve desired outcomes? When, if ever, should individuals be denied access to their own data? Based on what principles, and with what limits?

As more data becomes readily available, how do we collectively address the challenges of interpretation? What other educational structures and procedures need to be put in place to help people interpret information that affects their interests? By whom? For example, to help the public have a better framework for data interpretation, Google is offering a free online course on making sense of the data. Should specialists with knowledge in particular fields—social scientists, physicians, or genetic counselors—also offer short courses so that data can be properly contextualized? Is education enough? Should collectors and purveyors of data be required to disclose facts relevant to data interpretation, such as the population sampled, characteristics measured, and the presence of any systematic bias that would skew results?

In medicine, there is a well-established principle that the lower the prevalence of any given condition, the higher the likelihood for false positives for even the most accurate diagnostic test. This is one reason why physicians may hesitate to run diagnostic tests for patients who don’t meet the right criteria for them. An educated data or information-mediator is considered necessary in the medical domain as a barrier between a consumer and access to services. Physicians act as ‘knowledge brokers’, and they are trusted to act and disseminate information in good faith because they subscribe to strong ethical principles as part of their regulated professional ethos. How does data accountability work in other domains? How would this model of data arbitration apply in other sectors? How would data brokers be regulated?

The philosopher Nicholas Taleb makes a similar comment about the “Big Data” phenomenon, asserting that more information results in more false information: he writes, “big data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal).” How receptive are people to the notion that more information is not necessarily better, or that it can lead to wider misinterpretation? How does this notion affect how resources should be directed to making sense of the data?

Case Study 1: Personal Genetics

The company 23andMe offers direct-to-consumer genetic testing that gives personalized information on the consumer’s risk for various diseases based on a spit sample of (presumably) their DNA that they can mail into the company. In return, consumers receive results that convey information about probabilities compared to the larger population. For example, a user might be told that she has a 30% higher probability than average of having a rare lung disease. All too often, an ill-informed individual may interpret this to mean that she has a 30% likelihood of developing the disease. Even if she understands that this is not the case, she may not realize that a 30% higher probability than average is, for all intents and purposes, so trivial as to be absolutely meaningless with regards to a rare disease.

Unofficial diagnostic tests like 23andMe can heighten anxieties about one’s genetic risk for a range of diseases, without explaining those risks properly, or putting statistical information in layman's terms. Consumers may lack the proper education or tools to understand how that risk is computed, and how the information they have are given might be reasonably disputed. Some ethicists and officials are concerned that women will pre-emptively seek mastectomies if they have a heightened awareness of their ‘risk factors’ for breast cancer. Indeed, the reason that New York State requires genetic counseling for all who seek genetic tests is because so few people understand how to meaningfully interpret genetic information.

The FDA recently ordered 23andMe to stop offering personalized genetic health data to consumers amidst fears that users would react unreasonably to receiving alarming medical information that wasn’t delivered or curated by a trained medical professional. In addition to other quality control issues, the logic is that medical professionals have a different type of ethical obligation to their patients than a commercial enterprise has to its consumers with regard to the information it disseminates. These developments in health tracking and health data access raises a number of questions, such as: is health data generated through self-tracking commercial entities different than other kinds of health data generated by physicians or other experts? Who has the right to disseminate personalized health information? What interpretive tools should those entities provide to users? Who has the right to receive personalized health information? Who should be denied access? Could elevated health concerns from (mis)information lead patients to use health care resources unnecessarily?

Case Study 2: Obsessing Over Metrics

Any metric that is valued is gamed. An entire cottage industry exists to help with search engine optimization because companies want their sites to appear at the top of search queries. Authors and publishers try to game best seller lists by buying their own books because those lists signal quality. Each year, cheating scandals break out as teachers help students game assessment tests because their teaching is interpreted through their students’ performance.

Although gaming metrics is nothing new, the “big data” phenomenon has amplified this issue because more and more is dependent on results produced or indicated by data. This dependency reflects a certain idolatry of numbers, as though anything worth valuing can be quantified, and anything that can’t be quantified isn’t worth much. In other words, if a data-oriented solution to a problem isn’t available, people who are eager to verify their solutions with data-credibility can reframe the question because numbers are viewed as ‘scientific’ or trustworthy. To some degree, ‘the data’ has become a universally-accepted rationale for choosing one course of action over another, or for seeing meaning or patterns that appear significant because they are attached to a numerical or statistical value, regardless of how well understood that data is, or how those numbers are produced. Consider, for example, Klout scores and social media followers. These scores, intended to measure the influence that individuals have on a 1-100 scale, are now baked into search engine results. One factor in shaping Klout scores is follower count on major social media like LinkedIn, Facebook, and Twitter. The number of Twitter followers one has also shapes search engine results directly, both within Twitter and on third party sites. As data scientist Gilad Lotan learned, purchasing bots and fake followers is neither difficult nor expensive. Although services like StatusPeople exist to help people determine if an account is primarily followed by fake people, most people don’t notice; they see the follower count and presume that the followed is influential.

Services like Klout don’t seek to verify followers so when they pick up on these signals and pass them on to search engines, they reinforce inaccurate interpretations and bake them further into systems. People with high Klout scores receive perks, including free schwag and access to key opportunities. Of course, it’s not just marketing companies and search engines that rely on these numbers. When journalists cover people, they also often refer uncritically to people’s follower counts when discussing the importance of those people. Thus, even if everyone knows that metrics are gamed, they are repeatedly used, creating the appearance of truth or fact, even where none exists.

In some sense, the more closely we are networked into a high-accessibility society, the greater our echo chamber is, even though the concepts of “open access” and “greater transparency” suggest the opposite -- that truth is within reach because our browsing ability is engineered in tech-savvy ways. However, your ability to find a good answer, or to ask a good question, is not replaced by Google’s algorithmically-generated autocomplete function. How do we remain conscientious about information in light of greater access to it? How do we analyze metrics, or contextualize the data to avoid errors of misinterpretation?

Case Study 3: When Algorithms Imply Interpretation

Pages:   || 2 |

Similar works:

«Fazendo Gênero 9 Diásporas, Diversidades, Deslocamentos 23 a 26 de agosto de 2010 O FILHO DA MÃE, DE BERNARDO CARVALHO: ROTAS ENTRE O ESPAÇO E O CORPO Adenize Franco Algumas proposições Com fragmentos tais foi que escorei minhas ruínas/ Pois então vos conforto. (T. S. Eliot, A terra desolada) A produção literária de Bernardo Carvalho inscreve-o dentro da lista dos principais autores da atualidade. Reconhecido por romances como Nove Noites e Mongólia, o autor é nomeado, segundo...»

«The Role of Asian Currencies in the International Monetary System By Masahiro Kawai Dean and CEO Asian Development Bank Institute mkawai@adbi.org November 2008 This paper is prepared for the 2008 Macro Research Conference, “The Global Monetary and Financial System and Its Governance,” organized by the Tokyo Club Foundation for Global Studies, to be held in Tokyo on 11-12 November 2008. The author is thankful to Shigeru Akiyama and Doo Yong Yang for providing me with data. The findings,...»

«Introduction to von Willebrand Disease Mary Lesh RN, MS, CPNP OVERVIEW Von Willebrand Disease (VWD) is the most common hereditary bleeding disorder in humans, with an estimated prevalence ranging upward to 1% of the general population. Males and females are both approximately equally affected. VWD arises from a deficiency or dysfunction of von Willebrand factor (VWF). VWF is a large multimeric plasma glycoprotein that has two key functions in hemostasis. First, VWF mediates the adhesion of...»

«1 HUDIKSVALLS TINGSRÄTT DOM Mål nr B 280-12 2012-04-12 R1 meddelad i Hudiksvall PARTER (Antal tilltalade: 1) Åklagare Kammaråklagare Carl-Johan Granlund Åklagarkammaren i Gävle Målsägande 1. Ahmed Ahmed Läroverksgatan 38 A Lgh 1003 821 33 Bollnäs Målsägandebiträde: Advokat Carl Lindström Advokat Carl Lindström AB Box 104 826 23 Söderhamn 2. Ali Hussien Östergatan 1 A Lgh 1101 821 42 Bollnäs Målsägandebiträde: Advokat Carl Lindström Advokat Carl Lindström AB Box 104 826 23...»

«If Riviera Line in the Fifties © Copyright Dovetail Games 2015, all rights reserved Release Version 1.0 Train Simulator – Riviera Line in the Fifties 1 ROUTE INFORMATION 1.1 Background 1.2 The Route 2 LOCOMOTIVES 2.1 GW R Castle Class – Early BR livery 2.2 GW R King Class – Early BR livery 2.3 GW R Modified Hall Class – Early BR livery 2.4 GW R Grange Class – Early BR livery 2.5 GW R 57XX Pannier Tank – Early BR livery 3 SCENARIOS 3.1 01. [Castle] Introduction to the Castle 3.2 02....»

«DOCUMENT RESUME ED 373 721 IR 016 731 AUTHOR Hsu, T. Ella; And Others TITLE Effects of Learner Cognitive Styles and Metacognitive Tools on Information Acquisition Paths and Learning in Hyperspace Environments. PUB DATE 94 NOTE 19p.; In: Proceedings of Selected Research and Development Presentations at the 1994 National Convention of the Association for Educational Communications and Technology Sponsored by the Research and Theory Division (16th, Nashville, TN, February 16-20, 1994); see IR 016...»

«CliMAte ChANge AND SuStAiNABle WAter MANAgeMeNt iN CeNtrAl ASiA FCG International Dr. Mikko Punkari, Dr. Peter Droogers, Dr. Walter Immerzeel, Natalia Korhonen, Arthur Lutz, and Dr. Ari Venäläinen ADB CentrAl AnD West AsiA NO. 5 Working pAper series May 2014 ASIAN DEVELOPMENT BANK Central and West Asia Working Paper Series Climate Change and Sustainable Water Management in Central Asia Binsar Tambunan FCG International Dr. Mikko Punkari, Dr. Peter Droogers, Lead Portfolio Management...»

«Early American Silver at the Currier Museum of Art The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Galliher, Allison. 2015. Early American Silver at the Currier Citation Museum of Art. Master's thesis, Harvard Extension School. November 15, 2016 3:07:52 PM EST Accessed http://nrs.harvard.edu/urn-3:HUL.InstRepos:24078350 Citable Link This article was downloaded from Harvard University's DASH Terms of Use repository, and...»

«SITE NO. & NAME: RED HILL SITES 53.0 Dump of Sluicing Pipes 53.1 Sluice Head and Dump of Sluicing Pipe 53.2 Sluiced Hill 53.3 Stone Structure 53.4 Stone Cairns and House Sites 53.5 House Site 53.6 Sluiced Hill 53.7 Sluice Head 53.8 Sluice Head 53.9 Sluicing Paddock 53.10 Sluice Head 53.11 Dump of Sluicing Pipe 53.12 House Site 53.13 Stone Cairn 53.14 House Site 53.15 Red Knob (Sluiced Hill) LOCATION: Red Hill DIRECTIONS: Fryerstown to Campbells Creek main road. Take the dirt track that heads...»

«SECED 2015 Conference: Earthquake Risk and Engineering towards a Resilient World 9-10 July 2015, Cambridge UK NUMERICAL MODELLING OF SITE RESPONSE AT THE LSST DOWNHOLE ARRAY IN LOTUNG Sajana SUWAL1 Alessandro PAGLIAROLI2 and Giuseppe LANZO3 Abstract: The acceleration records of the downhole LSST array located in Lotung (Taiwan) were analyzed to study the seismic response of a deep cohensionless soft soil site. Two earthquakes characterized by maximum PGA respectively equal to 0.08 and 0.15g...»

«Patuxent Wildlife Research Center North American Breeding Bird Survey 12100 Beech Forest Road Laurel, MD 20708-4038 www.pwrc.usgs.gov North American Breeding Bird Survey MEMORANDUM TO COOPERATORS SUMMER 2016 CONTENTS BY PAGE: 1 — NEW OBSERVERS WELCOME 4 — ROUTE PROBLEMS 2 — 2015 ROUTE COVERAGE 4 — PARTICIPANT MILESTONES 4 — COORDINATOR UPDATES 5 — NOTES FROM THE FIELD NEW OBSERVERS WELCOME If this is your first year, thank you for joining the flock of thousands who make the BBS a...»

«Southeast Review of Asian Studies Volume 31 (2009), pp. 97–112 Tudi Gong in Taiwan CHRISTOPHER A. HALL Union Institute & University Studies of Tudi Gong 土地公 in English are relatively rare. This article reports the history, faces, roles, and duties of Tudi Gong, one of the lowest-ranked gods of the traditional Taiwanese pantheon, whose name can be translated as “Earth Lord.” Tudi Gong is the most ubiquitous and one of the most commonly worshipped gods in Taiwan; he is the...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.