FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 2 | 3 || 5 | 6 |   ...   | 20 |

«University of California Los Angeles Bridging the Gap Between Tools for Learning and for Doing Statistics A dissertation submitted in partial ...»

-- [ Page 4 ] --

There are some aspects of the philosophies espoused by Biehler, Konold, and Friel I agree with. In particular, Konold says tools for learning statistics should not be stripped down versions of professional tools for doing statistics. Instead, they should be developed with a bottom-up perspective, thinking about what features novices need to build their understandings (Konold, 2007). In considering how to close the gap, it is important to keep in mind how tools can build novices’ understanding from the ground up. Konold also emphasizes the importance of creating software that can grow beyond its initial static state (Konold, 2007). A tool that grows with users as they move from learning to doing is a large proposition, but I believe this goal can (and should) be met with a tool spanning the entire trajectory of experiences.

The rest of this work is concerned with methods for closing the gap. Next, we consider the existing tools on the market and how well they succeed at supporting the learning-to-doing trajectory.

Professional statistical systems are very complex and call for high cognitive entry cost. They are often not adequate for novices who need a tool that is designed from their bottom-up perspective of statistical novices and can develop in various ways into a full professional tool (not vice versa).

–  –  –

The current landscape of statistical tools This chapter describes the landscape of currently-available statistical tools, from the prosaic (Excel) to the bespoke (Wrangler, Lyra). Keeping in mind the gap between tools for learning and tools for doing statistics, we consider the features of each tool most relevant to closing the gap. Many positive examples are shown (the bespoke tools provide particular inspiration) but negative examples are also examined.

The tools currently available for learning and doing statistics generally break along that particular divide: those good for an introductory learner are generally not good for actually performing data analysis, and vice versa. However, since any user of software for doing statistics must necessarily begin as a novice, it is logical there should be a coherent trajectory for learners to take.

My interest in the gap between tools for learning statistics and those for doing statistics grew out of my experience with the Mobilize project (Section 5.1) where we have iterated through a variety of tools over the years. The experience of trying many statistical programming tools with learners led me to research others that were available, in search of the ideal tool.

While there is rarely an ideal tool for anything, through examining the available software, programming languages, and packages, I began to see ways in which the gap could be filled.

3.1 Spreadsheets Spreadsheet tools like Excel are probably the most commonly used to do data analysis by people across a broad swath of use-cases (see Figure 3.1 for a screenshot of Microsoft Excel for Mac 2011). Because of their common use, and the free availability of spreadsheet tools like Google Spreadsheets and the Open Ofce analogue of Excel, Sheets, they can be considered an accessible and equitable tool.

However, spreadsheets lack the functionality to be a true tool for statistical programming. They typically allow for only limited scripting, which means their capabilities are limited to those built in by the development company.

The locked-in nature of the functionality means they are only able to provide a limited number of analysis and visualization methods, and cannot be flexible enough to allow for true creativity. The toolbar in Figure 3.1 shows the possible visualization methods available in this version of Excel. The categories can be counted on one hand, although each category does have further modifications that can be expanded using the disclosure widget located on the button.

Beyond this, several high profile cases of scientific paper retraction have been based on internal errors within Excel. Because the underlying code is closedsource, Excel does not allow users to view how methods are implemented, which means it is very difficult for an individual to assess the validity of the internal code. Some dedicated researchers have tested Excel’s statistical validity over every software version Microsoft has released. Not only is every version flawed, Figure 3.1: Microsoft Excel for Mac 2011 but even with specific attention shed on the problem, Microsoft often either fails to repair the problem, or makes a change to another flawed version (McCullough and Heiser, 2008).

Additionally, spreadsheets tend not to privilege data as a complete object.

Once a data file is open, modification or deletion of data values is just a click away. In this paradigm, the sanctity of data is not preserved and original data can be lost forever. In contrast, most statistical tools discourage direct manipulation of original data. In tools used by practitioners to do statistical analysis (e.g., R, SAS), data is an almost sacred object, and users are only given a copy of the data to work with.

Data does not have structural integrity in a spreadsheet. Data values sit next to blocks of text and plots produced by data cover up data cells. Everything is included on one canvas. These pieces may be linked together, but there is no explicit visual connection. In a true statistical tool, results from the analysis are clearly separated from the data from which they were derived, and any data cleaning tasks performed in these tools can be easily documented.

This leads to the largest challenge with spreadsheets: their results are not reproducible. Data journalists have historically done analysis using tools like Excel (Plaue and Cook, 2015). Journalists must be careful about the analysis they publish, as it it must to be as verified as any other ‘source’ they might interview. Spreadsheets do not offer any inherent documentation. As a result, journalists developed their own reproducibility documentation, typically in the form of a document written in parallel with the analysis describing all the steps taken. This supplementary document is done separately, either by hand or in word processing software like Microsoft Word.

Because each stage of analysis in a spreadsheet is done by clicking and dragging, there is no way to fully record all the actions taken. Reproducibility is discussed more fully in Section 4.8, but one of its central tenets is it should be possible to perform the same analysis on slightly different data (e.g., from a different year). Spreadsheets do not make this possible, so they are not effective tools for data analysis.

However, the spreadsheet paradigm is not inherently troublesome. In fact, Alan Kay believes computer operating systems should essentially be spreadsheets, all the way down (Kay, 1984). The distinction is when Kay references spreadsheets, he is thinking of a reactive programming environment that can be built up into responsive tools to perform a wide variety of tasks. In this paradigm, objects can be linked together in a dependent structure, and whenever an input is changed, all the downstream elements are updated accordingly.

Of course, the reactive possibilities in spreadsheets can also lead to unintended consequences. In a study of spreadsheets used by Enron, researchers found 24% of spreadsheets with a formula included an error (Hermans and MurphyHill, 2015). This is likely because while spreadsheets allow for reactive linking of cells, they do not visualize the reactive connections, and it can be easy to double a formula or include unintended cells. The reactive programming environment Shiny (Section showcases some of the capabilities of this paradigm in a more reproducible data analysis environment.

3.2 Interactive data visualizations

If the average person has interacted with interesting data sets, it was likely in the context of an interactive data visualization. The New York Times produces some especially salient examples. The Times makes a point to create visualizations for the web that are much more than electronic versions of print graphics.

Instead of static graphics, their visualizations allow readers to manipulate representations of data themselves. For example, the Times has produced graphics allowing readers to balance the federal budget, predict which way states will vote in the presidential election, or assess whether they would save more money by buying or renting their housing (Carter et al., 2010; New York Times, 2012;

Bostock et al., 2014). The graphic companion to the article on buying versus renting, Figure 3.2, allows users to drag sliders on a variety of graphs to determine whether it is better for them to rent or buy, given the particular set of parameters.

Figure 3.2: “Is it better to rent or buy?” from The Upshot

The paradigm of interactive data visualizations on the web is beginning to approach the concept of the “active essay” (Yamamiya et al., 2009). Like active essays, interactive articles on the web can be updated dynamically depending on parameter manipulations, and some news outlets have begun including contextual information in their articles. Rather than hyperlinks taking the user away from the page they were reading, New York Magazine includes ephemeral popovers, and Medium has instituted in-context comments linked to a particular paragraph or sentence.

Interactive data visualizations of the type done by the Times are helping us move toward broader statistical literacy, but they are lacking on a number of levels. First, many people do not have the quantitative skills to interpret statistical graphics, despite this literacy becoming more crucial (Meirelles, 2011).

Then, even if a reader is able to interpret the graphics, visualizations tend to be highly scripted. This scripted quality is actually valued in data visualization, because visualizations should provide some context and storytelling for the data, rather than simply leaving users to explore (Cairo, 2013). But the script can also be limiting. If a reader is looking at a graphic about renting and buying in the United States, she cannot easily compare the data in other countries.

She also cannot access the demographic data to determine how many people fell into her particular demographic category. And there is no way to explicitly validate the authors’ analysis, other than completely reproducing it. In short, a reader is limited to exploring the story the journalists have provided. For true democratic data access, citizens need to be able to analyze raw data sources.

Some interactive data visualizations have been opening the hood on the analysis process, allowing readers to critique the creation process or algorithmic decisions. One notable example is the 2014 IEEE Spectrum rating of programming languages, shown in Figure 3.3. The article provides a default ranking (shown in Figure 3.3a), but it allows readers to create a custom ranking by adjusting the weights of all the data inputs (Figure 3.3b) (Cass et al., 2014).

It is possible to imagine a future where all journalistic products based on data are accompanied by this type of auditable representation of the process used to create them.

–  –  –

Figure 3.3: IEEE Spectrum ranking of programming languages

3.3 iPython notebook/Project Jupyter The iPython notebook is a movement toward both more reproducible research and interactive analysis output. The iPython notebook environment (Figure 3.4) allows a user to combine text and Python code, and to immediately see the output from the code chunks (P´rez and Granger, 2007).


–  –  –

When authoring an iPython notebook, it is possible to execute each of the code chunks separately, allowing the author to perform interactive manipulation during the creation process. The system also provides the capability to create interactive graphics, so if the author has decided to include them readers can interact with selected graphics.

However, once the notebook is published or shared, the ability to execute code is removed, and the only interactive elements are those programmed in by the author. Instead, all code and output is presented as a static file, which prevents readers from manipulating it. If a reader wants to modify the code, they must download the source code, edit it, and then re-share the results.

The iPython notebook project is currently under transition to become part of Project Jupyter, a larger umbrella project focused on scientific computation more generally. Just recently, GitHub1 announced that Jupyter notebooks will render directly on their site. This frees notebook authors from the necessity of hosting their notebooks elsewhere in order to share them.

The iPython notebook and Project Jupyter represent some movement toward the goals of interactive and reproducible analysis (like the scenarios discussed in Sections 3.2 and 4.8), but will require more work toward fully interactive notebooks that can be manipulated by readers.

3.4 TinkerPlots and Fathom

Specifically designed for learning statistics, TinkerPlots and Fathom are essentially sibling software packages. The two have similar functionality and slightly different intended users. TinkerPlots is described as being appropriate for students from 4th grade up to university, and Fathom is directed at secondary school and introductory college levels.

Both TinkerPlots and Fathom are excellent tools for novices to use when learning statistics. They comply with nearly all the specifications outlined by A code sharing website supporting collaborative coding projects and the version control software git.

Biehler (1997), allowing for flexible plotting, providing a low threshold, and encouraging play and re-randomization. They allow students to jump right in, to perform exploratory data analysis and to move through a data analytic cycle (e.g., asking questions, trying to answer them, re-forming questions), and have been shown to enhance student understanding (Watson and Donne, 2009).

TinkerPlots and Fathom are both standalone software, which means they must be installed directly on a user’s computer. They both offer versions for Macintosh and Windows computers. Clicking on the application icon will launch a blank canvas, with buttons and menus to support loading data, plotting, resampling, and more.

Pages:     | 1 |   ...   | 2 | 3 || 5 | 6 |   ...   | 20 |

Similar works:

«Center for TECHNICAL Reliable REPORT Computing Diversity Techniques for Concurrent Error Detection Subhasish Mitra 00-7 Center for Reliable Computing Gates Building 2A, Room 236 Computer Systems Laboratory June 2000 Dept. of Electrical Engineering and Computer Science Stanford University Stanford, California 94305 Abstract: This technical report contains the text of Subhasish Mitra’s PhD thesis “Diversity Techniques for Concurrent Error Detection.” Funding: This research was supported by...»

«PLACEMENT TESTING AND MORPHOSYNTACTIC DEVELOPMENT IN SECOND LANGUAGE LEARNERS OF ENGLISH by Patti A. Spinner B.A., Rutgers University, 1995 M.A., Ohio State University, 1999 Submitted to the Graduate Faculty of Arts and Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh UNIVERSITY OF PITTSBURGH ARTS AND SCIENCES This dissertation was presented by Patti A. Spinner It was defended on July 23, 2007 and approved by Dr. G. Richard...»

«PARYLENE AS A NEW MEMBRANE MATERIAL FOR BIOMEMS APPLICATIONS Thesis by Bo Lu In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy California Institute of Technology Pasadena, California (Defended April 26th, 2012) ii © 2012 Bo Lu All Rights Reserved iii Acknowledgements All the work in this thesis would definitely not be possible without the remarkable guidance from my Ph.D. research advisor, Dr. Yu-Chong Tai. Frequently, I feel I have been so lucky to be able to...»

«THE 13th INTERNATIONAL CONFERENCE OF ISSEI International Society for the Study of European Ideas in cooperation with the University of Cyprus On Nietzsche’s Concept of ‘European Nihilism’ Ruth Burch (PhD in Philosophy, Warwick University) Via Boggia 10 CH-6900 Lugano-Paradiso, Ticino Switzerland Email: burchru@hotmail.com In Nietzsche’s view, the traditional scientists and philosophical moralists are ultimately unstoppably attracted to nihilism. That is why, in The Gay Science,...»

«360 BC SOPHIST by Plato translated by Benjamin Jowett SOPHIST PERSONS OF THE DIALOGUE: THEODORUS; THEAETETUS; SOCRATES An ELEATIC. STRANGER, whom Theodorus and Theaetetus bring with them. The younger SOCRATES, who is a silent auditor. Theodorus. Here we are, Socrates, true to our agreement of yesterday; and we bring with us a stranger from Elea, who is a disciple of Parmenides and Zeno, and a true philosopher. Socrates. Is he not rather a god, Theodorus, who comes to us in the disguise of a...»

«Forthcoming in Philosophy East & West 57:4 (2007) Language and Ontology in Early Chinese Thought∗ Chris Fraser Department of Philosophy Chinese University of Hong Kong July 2005 (Revised January 2007) Correspondence: Chris Fraser (方克濤) (Assistant Professor) Department of Philosophy Rm. 430, Fung King Hey Bldg. Chinese University of Hong Kong Shatin, N.T., Hong Kong Telephone: 852-9782-0560 Fax: 852-2603-5323 E-mail: cjfraser@cuhk.edu.hk Copyright © 2005, 2007 Brief Summary for Table of...»

«Kosalan Philosophy in the Kāṇva Śatapatha Brāhmaṇa and the Suttanipāta by Lauren Michelle Bausch A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in South and Southeast Asian Studies and the Designated Emphasis in Critical Theory in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Robert P. Goldman, Chair Dr. Sally J. Sutherland Goldman Professor Alexander von Rospatt Professor...»

«Justifications and Excuses Marcia Baron∗ The distinction between justifications and excuses is a familiar one to most of us who work either in moral philosophy or legal philosophy. But exactly how it should be understood is a matter of considerable disagreement. My aim in this paper is, first, to sort out the differences and try to figure out what underlying disagreements account for them. I give particular attention to the following question: Does a person who acts on a reasonable but...»

«Assessing and Detecting Malicious Hardware in Integrated Circuits By Trey Reece Dissertation Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in Electrical Engineering December, 2014 Nashville, TN Approved: William H. Robinson, Ph.D. Bharat L. Bhuva, Ph.D. Gabor Karsai, Ph.D. Thomas D. Loveless, Ph.D. Bradley A. Malin, Ph.D. ELECTRICAL ENGINEERING Assessing and Detecting Malicious Hardware...»

«Young Children’s Representations of Emotions and Attachment in Their Spontaneous Patterns of Behaviour: An Exploration of a Researcher’s Understanding C. Arnold PhD Young Children’s Representations of Emotions and Attachment in Their Spontaneous Patterns of Behaviour: An Exploration of a Researcher’s Understanding C. Arnold PhD “A thesis submitted in partial fulfilment of the University’s requirements for the degree of Doctor of Philosophy” Awarding Institution: Coventry...»

«Visualization in Medieval Alchemy Barbara Obrist Abstract: This paper explores major trends in visualization of medieval theories of natural and artificial transformation of substances in relation to their philosophical and theological bases. The function of pictorial forms is analyzed in terms of the prevailing conceptions of science and methods of transmitting knowledge. The documents under examination date from the thirteenth to the fifteenth century. In these, pictorial representations...»

«3 $74 6 FULL-SCALE LEACHATE-RECIRCULATING MSW LANDFILL BlOREACTOR ASSESSMENTS David A. Carson US. Environmental Protection Agency Risk Reduction Engineering Laboratory (ML-CHL) 26 W. Martin Luther King Drive Cincinnati, Ohio 45268-3001 USA INTRODUCTION The integrated waste management hierarchy philosophy continues to develop as a useful tool to solve solid waste issues in an environmentally respot isible manner. Recent statistics indicate that approximately two thirds of municipal solid waste...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.