FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 9 | 10 || 12 | 13 |   ...   | 20 |

«University of California Los Angeles Bridging the Gap Between Tools for Learning and for Doing Statistics A dissertation submitted in partial ...»

-- [ Page 11 ] --

Learners tend to move through a standard sequence of levels of understanding as they begin to map data to visual representations. Typically, their first representations are idiosyncratic (Watson and Fitzallen, 2010). That is, they have some relation to the data at hand, but typically do not explicitly represent it. For example, when students are asked to represent a set of random draws from a collection, an idiosyncratic representation might be a drawing of a person taking a slip out of a bowl.

Students move on to a case-value plot. In case-value plots, every data value is explicitly shown on the plot. In other words, there is no abstraction away from the raw data, simply a visual encoding. Visualizations like dot plots and scatterplots are barely abstractions. They allow a reader to easily retrieve the exact values of the data. Studies show students start with case-value plots and gradually move toward more abstract representations (Kader and Mamer, 2008).

Students will often initially think a graph representing speed is actually representing location, or a graph of growth is a graph of height. They have difficulty with the abstractions. Instead, they think the graph is representing the most ‘real’ thing possible (Shah and Hoeffner, 2002).

The next level of abstraction is classifying (e.g., stacking similar cases together and then abstracting them as a bar, as in bar charts and histograms) (Konold et al., 2014). These visualizations add a count abstraction. The original data are still retrievable, but the reader must understand the height of the bar has encoded how many times a particular value has been seen. Even more abstract is the histogram. A reader could generate data that would produce the histogram (randomly choosing n values between a and a + x for all y bins in the histogram, where n is the height of the bar in the histogram, and (a, a + x] is the range of the bin), but the resulting data would not necessarily be the same as the original. It would preserve some qualities of the data, but not necessarily even the true summary statistics of center and spread.

The final level of abstraction is where data are fully abstracted to summary values, as in a box plot (Konold et al., 2014). In these plots, there is no way to map back to the original data. The reader could generate data to would produce the same boxplot, but the data might have almost no correspondence to the original data. Modifications to the boxplot such as the violin and vase plots reduce the abstraction level slightly, but still often leave the size of the dataset abstract (Wickham and Stryjewski, 2011). Some examples of common data representations and their abstraction level are shown in Table 4.2.

By providing a tool that grows as a user learns, we can support learners on their path toward more abstraction. Rolf Biehler emphasized the importance of “co-evolution” of user and tool (Biehler, 1997). TinkerPlots intentionally supports the natural trajectory to build from less abstract plots, as its default plot of a single variable is just a set of dots set randomly displayed in the plot window. However, while TinkerPlots supports the natural sequence of building graph understandings, it is characterized as a landscape-type tool (Bakker, 2002), because it does not prescribe this particular trajectory. If a student had a different natural inclination, she could build representations that felt natural to her. However, even with a landscape-type tool, the affordances of a tool such as TinkerPlots impact the sorts of plots and reasoning users develop (Hammerman and Rubin, 2004).

Again, the standard (named) plot types mentioned in Table 4.2 do not rep

–  –  –

Table 4.2: Levels of abstraction in standard statistical plots resent an exhaustive list of possible visualization methods, and users should have the opportunity to build their own encodings.

In fact, being able to create unique visual data representations may help users to better understand nonstandard visualizations they encounter in the wild. That is, learning to encode data visually will help them decode other visuals (Meirelles, 2011).

Most of the plot types described here are simple visualizations of one or two variables. But again, we want computational tools to do more than amplify human abilities (Pea, 1985). Methods like the Grand Tour (Cook et al., 1995;

Buja and Asimov, 1986; Asimov, 1985), or the generalized pairs plot (Emerson et al., 2013) can allow humans to look for patterns in more dimensions.

Beyond providing an interface to flexibly create novel plot types, the tool should support graphs as an interface to the data (Biehler, 1997). Behaviors like brushing and linking should do dynamic subsetting (Few, 2010).

As Biehler suggests, the tool should provide functionality for formatting, as well as interacting with, and enhancing graphics (Biehler, 1997). Formatting consists of tasks like the zoom, scale, data symbols, and graph elements. Interaction should allow for actions like select, group, modify, and measure. Enhancing should allow for labeling, the inclusion of statistical information, and other variables (Biehler, 1997).

Some current tools for learning statistics allow users to draw pictures on top

of their data, circling interesting features and providing annotations. Drawing functionality could be enhanced to become another method for interacting with the system. Researchers in the Communications Design Group are thinking about the ways in which drawing could become another level of vocabulary for humans to use as they interact with the computer. Ken Perlin has developed a tool he calls Chalktalk, which allows him to draw from a vocabulary of simple drawings to create the impression of interaction in his presentations (Perlin, 2015). For example, he might draw a crude pendulum and set it swinging. The behavior of these gestures has been coded behind the scenes, but they allow him to quickly and fluidly show examples almost like sketches. There is also work being done to add ‘smart’ drawing features to Lively Web (Section 5.2). This work is so new there is not yet a good pointer to it, but Calmez et al. (2013) describe the initial work making it possible.

The system should also make it possible to see multiple coordinated views of everything in the user’s environment. Rolf Biehler suggests a multiple window environment to allow for easy comparisons (Biehler, 1997). The importance of a coordinated view is supported by researchers who suggest allowing for multiple views of the same data may help students gain a more intuitive understanding (Shah and Hoeffner, 2002; Bakker, 2002). In many systems, this is supported by brushing and linking (Wilkinson, 2005).

4.5 Support for randomization throughout Requirement 5 Support for randomization throughout.

Computers have made it possible to use randomization and bootstrap methods where approximating formulas would once have been the only recourse. These methods are not only more flexible than traditional statistical tests, but can also be more intuitive for novices to understand (Pfannkuch et al., 2014; Tintle et al., 2012).

Randomization and bootstrap methods can help make inferences from data, even if those data are from small sample sizes or non-random collection methods (Efron and Tibshirani, 1986; Lunneborg, 1999; Ernst, 2004). While these methods are not new, they have recently been extended into a field of visual inference. Statisticians at Iowa State University have been working on methods to use randomized or null data to provide graphical inference protocols (Wickham et al., 2010; Majumder et al., 2013; Buja et al., 2009).

Humans are very adept at finding visual patterns, whether the patterns are real or artifacts. Graphical inference helps people train their eyes to more accurately judge whether a pattern is real or not. In these protocols, one plot created with the original data is displayed in a matrix of n decoy plots. If the user sets n = 19, the chance of randomly picking the true plot at random is = 0.05, which is the traditional boundary for statistical significance (Wick


ham et al., 2010).

The method for generating the null plots can vary. Sometimes it is as simple as randomizing one of the variables to break any linear relationship that might have existed between the two. However, this does not work on data where the relationship is more complex, e.g., a quadratic relationship. In those cases, data are generated from the null model and compared to the true data.

Using the null data, plots analogous to the one filled with real data are generated. If the user creates a matrix of decoy plots with the true plot randomly placed within the matrix, identifying the true plot means it is somehow different from randomness. These methods have been extended for use in validating models (Majumder et al., 2013; Buja et al., 2009; Gelman, 2004).

Of course, protocols must be followed so as not to introduce bias. For example, if the user has been working with the data or doing exploratory data analysis, familiarity will make it easier to recognize the true plot in the matrix.

However, even with prior knowledge, picking the real data out of a lineup of plots it is a very compelling exercise. I have introduced the idea to groups of high school teachers and students, and the ‘game’ of picking the right plot has proven to be very engaging. In fact, showing randomized plots can be a great extension to the ‘making the call’ activity of trying to determine the difference between groups – in this case, between real and randomized versions (Pfannkuch, 2006; Wild et al., 2011).

The application of randomization and the bootstrap is another place where tools for teaching statistics shine. All the popular applet collections provide functionality for simply randomizing or bootstrapping data (Chance and Rossman, 2006; Morgan et al., 2014). TinkerPlots and Fathom also provide interfaces for this (Finzer, 2002a; Konold and Miller, 2005). However, the tools for doing statistics have lagged behind. R provides the most complete functionality, but it has not been simple to use. Tim Hesterberg has prepared a document explaining how bootstrap methods could be integrated into the undergraduate curriculum (Hesterberg, 2014), as well as an R package called resample providing a simpler syntax.

Because of their intuitive nature and generalizability, randomization and bootstrap methods are ideal for novices. They can be used in a variety of contexts, including graphical inference methods bridging the gap between exploratory and confirmatory analysis.

4.6 Interactivity at every level Requirement 6 Interactivity at every level.

Interactivity is becoming the standard for the web, and data analysis should be no different. It should be possible for users to interactively develop an analysis, e.g., building up a plot by using drag-and-drop elements. The results of this analysis in the session should themselves be interactive. All graphs should be zoomable, it should be easy to change the data cleaning methods and see how that change is reflected in the analysis afterward, and parameters should be easily manipulable. This type of simple parameter manipulation will further support exploratory data analysis.

Finally, the product from the tool should also be interactive. Interactivity in published analysis would be of particular use for data journalism and academic publishing. As reproducibility becomes more valued in the academic community, data products are more often accompanied with fully reproducible code, and if the code were interactive, the audience – even if they do not know much about statistics – could play with the parameters and convince themselves the data were not doctored.

With a tool that made it simple to publish fully interactive results of data analysis, it would be easy to imagine data-driven newspaper articles accompanied by the reproducible code that produced them, allowing readers to audit the story. As noted in Chapter 3, bespoke projects such as the IEEE programming language ratings (Cass et al., 2014) provide readers access to the process used to create an analysis.

The draw of interactivity was also clear to Rolf Biehler, who inspirationally wrote, The concept of slider pushes the tool a step in the direction of a method construction tool where one can operate with general parameters. [...] It may be considered a weakness of systems like Data Desk that the linkage structure is not explicitly documented as it is the case with explicit programming or if we had written the list of commands in an editor. An improvement would be if a list of commands or another representation of the linkage structure would be generated automatically. (Biehler, 1997) The implementation of this vision may require something akin to the Shiny reactive environment to allow the system to keep track of all the downstream elements depending on the one above.

The power and usefulness of this type of functionality is easy to imagine, and likely the possibilities are even greater than can be imagined at present.

For example, if a user had used a cut point to create a categorical variable from a continuous variable, and then fed that categorical variable into a regression model, the system would allow them to manipulate the cut point to see the effect on regression parameters, interaction effects, etc. This example is examined further in Section 5.3.2.

Many other possibilities would be available in the world opened up by this type of functionality. All plots would be resizable, zoomable, and pan-able.

Clicking on an element in a plot would highlight the associated element in the data representation, while clicking on a non-data element (e.g., an axis, tick line or model line) would offer information about the element, and that information would be manipulable as well.

Pages:     | 1 |   ...   | 9 | 10 || 12 | 13 |   ...   | 20 |

Similar works:

«Speaker Normalisation for Large Vocabulary Multiparty Conversational Speech Recognition Giulia Garau Centre for Speech Technology Research University of Edinburgh Edinburgh EH8 9AB, UK NIVER U S E IT TH Y OF H G E R DI U NB Doctor of Philosophy Centre for Speech Technology Research School of Informatics University of Edinburgh 2009 Abstract One of the main problems faced by automatic speech recognition is the variability of the testing conditions. This is due both to the acoustic conditions...»

«Chances, Credences and Counterfactuals Richard Bradley LSE April 24, 2016 Abstract 1 Introduction This paper examines the relation between three concepts: rational degrees of belief (or credences), counterfactuals and chances. All three notions are hotly debated in philosophy and I will necessarily have to take lot for granted if any progress is to be made on the question of how they are related. In particular I will assume that the degrees of belief of a rational agent can be represented by a...»

«Visualization in Medieval Alchemy Barbara Obrist Abstract: This paper explores major trends in visualization of medieval theories of natural and artificial transformation of substances in relation to their philosophical and theological bases. The function of pictorial forms is analyzed in terms of the prevailing conceptions of science and methods of transmitting knowledge. The documents under examination date from the thirteenth to the fifteenth century. In these, pictorial representations...»

«Roam: A Scalable Replication System for Mobile and Distributed Computing David Howard Ratner University of California, Los Angeles January, 1998 A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science UCLA Computer Science Department Technical Report UCLA-CSD-970044 Thesis committee: Gerald J. Popek, co-chair W. W. Chu, co-chair Eli Gafni Mario Gerla Donald Morisky c Copyright by David Howard Ratner To my father Robert, my...»

«DECARBONISING LOW GRADE HEAT FOR A LOW CARBON FUTURE A thesis submitted to Imperial College, London for the degree of Doctor of Philosophy by Robert Sansom October 2014 Control and Power Group Department of Electrical and Electronic Engineering IMPERIAL COLLEGE LONDON Robert Sansom October 2014 2 Robert Sansom October 2014 The copyright of this thesis rests with the author and is made available under a Creative Commons Attribution Non-Commercial No Derivatives licence. Researchers are free to...»

«Understanding Civic Engagement among Youth in Diverse Contexts By Holly Lynn Karakos Dissertation Submitted to the faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in Community Research and Action May, 2015 Nashville, Tennessee Approved: Maury Nation, Ph.D. Andrew J. Finch, Ph.D. Paul Speer, Ph.D. Sonya Sterba, Ph.D. ABSTRACT Research suggests that youth civic engagement is beneficial for individuals and...»

«Investigation of Nanostructured Semiconducting Metal Oxide and Conducting Polymer Thin Films for Gas Sensing Applications A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy Abu Zafar Md Sadek B.Sc.Eng. (Electrical and Electronic Engineering), Bangladesh University of Engineering & Technology (BUET), Dhaka, Bangladesh M.Eng. (Telecommunications), The University of Melbourne, Australia School of Electrical and Computer Engineering RMIT University,...»

«Revised August 20, 2013 Adriel A. Hilton, Ph.D. Assistant Vice President for Inclusion Initiatives Grand Valley State University Allendale, Michigan Business Address: Home Address: Grand Valley State University 579 Hampton Circle, N.W. 221 Student Services Building Apartment 2B 1 Campus Drive Walker, Michigan 49534 Allendale, Michigan 49401-9403 (616) 791-1911 (Home) (616) 331-5051 (Office) (305) 491-7125 (Mobile) (616) 331-3880 (Fax) adriel_hilton@hotmail.com hiltona@gvsu.edu EDUCATION Doctor...»

«Alkemie Revue semestrielle de littérature et philosophie Numéro 7 / Juin 2011 La Solitude Directeurs de publication Mihaela-Genţiana STĂNIŞOR (Roumanie) Răzvan ENACHE (Roumanie) Comité honorifique Sorin ALEXANDRESCU (Roumanie) Marc de LAUNAY (France) Jacques LE RIDER (France) Irina MAVRODIN (Roumanie) Sorin VIERU (Roumanie) Conseil scientifique Paulo BORGES (Portugal) Magda CÂRNECI (Roumanie) Ion DUR (Roumanie) Ger GROOT (Belgique) Arnold HEUMAKERS (Pays Bas) Carlos EDUARDO MALDONADO...»

«SPIRITUALISM, SCIENCE, AND THE SUPERNATURAL IN MID-VICTORIAN BRITAIN* RICHARD NOAKES I: INTRODUCTION In December 1861, a few months after he published the first instalment of his supernatural masterpiece, A Strange Story, the distinguished novelist Edward Bulwer Lytton told his friend John Forster that he wished to make philosophers inquire into [spirit manifestations] as I think Bacon, Newton, and Davy would have inquired. There must be a natural cause for them — if they are not purely...»

«Why Does Time Seem to Pass? Simon Prosser University of St Andrews (Forthcoming in Philosophy and Phenomenological Research) 1. The Problem According to the B-theory, the passage of time is an illusion. Although times are objectively ordered, with every time earlier or later than every other, no time is objectively past, present or future. 1 The A-theory, by contrast, says that time passes. Here I shall use the term ‘A-theory’ to include any ‘dynamic’ view of time; the term thus...»

«Loughborough University Institutional Repository Chemical kinetics modelling study on fuel autoignition in internal combustion engines This item was submitted to Loughborough University's Institutional Repository by the/an author.Additional Information: • A Doctoral Thesis. Submitted in partial fulllment of the requirements for the award of Doctor of Philosophy of Loughborough University. https://dspace.lboro.ac.uk/2134/6533 Metadata Record: c Zhen Liu Publisher: Please cite the published...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.