FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 15 | 16 || 18 | 19 |   ...   | 20 |

«University of California Los Angeles Bridging the Gap Between Tools for Learning and for Doing Statistics A dissertation submitted in partial ...»

-- [ Page 17 ] --

The history not only makes the interactive system reproducible (see Section

4.8 for more about this requirement), it allows users to move forward and backward through their history. In Figure 5.8, the cursor is indicating a move ‘back in time’ to the way the interface was when the bin width was set to 0.6. The interface also allows the user to rewrite history and see what the situation in the past would have looked like with a slight parameter tweak. There are obviously open issues surrounding how to deal with those alternate histories, but other researchers in the Communications Design Group are experimenting with novel methods to address those problems (Warth et al., 2011).

5.3 Additional Shiny experiments

The goal of ‘interaction all the way down’ (Section 4.6) in a statistical programming tool does not always immediately present itself as a useful feature. However, given my experience teaching high school students, professional development for high school teachers, and undergraduates, I am able to see many possible use cases. Because of the effort it took to create LivelyR (again, mostly on the part of Lunzer), it was clear we needed a simpler solution to mocking up interface possibilities.

As a result, I have been creating a few experiments in Shiny to show some of the features that could be available in a full system complying with the requirements listed in Chapter 4.

5.3.1 Conditional visualizations A very simple Shiny app I developed for a data visualization course I taught is shown in Figure 5.910. In the class, we were looking at a visualization about the saving habits of men and women, made by Wells Fargo (Harris Poll, 2014). In the original visualization, a donut chart is broken down into overall percentages of the surveyed population who saved a particular percent of their savings. For example, 18% of those surveyed saved more then 10% of their income. However, in addition, conditional percentages were given by gender. For that same slice of the donut, it said “men - 26%” and “women - 9%” Obviously, those numbers are for those groups individually, but upon first glance it seemed like the graphic was suggesting that 26% of the 18% were men and 9% women.

In order to help my students understand the difference between the two breakdowns, I created a series of graphics: one with the data broken first into saving categories, and then gender (5.9b), and the other breaking on gender first (5.9a). While this app will not win any awards for visualization style, it conveyed the point easily, and allowed my students to flip back and forth between the versions.

Interactive version available at https://ameliamn.shinyapps.io/ConditionalPercens

–  –  –

(b) Split on saving method first Figure 5.9: Conditional percentages 5.3.2 Interaction plot with manipulable data cut point Another example stuck out from a modeling course for which I was the teaching assistant. In the course, students were asked to prepare a linear model to predict API scores11 from a number of other factors about schools. The assignment asked students to discuss interaction effects in their final paper.

Because the data had many numeric variables and the students preferred the clarity of studying interaction plots when both variables were categorical, many groups chose to split a numeric variable into a categorical variable with two classes. Of course, the choice of cut point was somewhat arbitrary. Some groups chose the mean of their variable, others the median, while others chose a cutoff that they believed to be significant given some contextual knowledge.

However, because they were programming in R and the choice of cut point was so early in their analysis, many groups did not realize how sensitive their analysis was to their earlier parameter choice.

Difficulty understanding the impact of cut points is not unique to my students. Other researchers have observed similar behavior in students and teachers (Hammerman and Rubin, 2004; Rubin and Hammerman, 2006).

In order to display how a completely interactive system could help even intermediate and advanced analysts, I created a Shiny app which allows a user to pick a variable in the dataset to convert to a categorical variable, and then allows them to manipulate the cut point. Because Shiny uses a reactive framework, even time the cut point is manipulated the model output, interaction plot, and coefficient interpretation change. A view of this Shiny app is shown in Figure 5.1012.

The app makes it very simple to see where the choice of cutpoint makes the In this context, API means Academic Performance Index, not to be confused with Application Programming Interfaces.

Interactive version available at https://ameliamn.shinyapps.io/InteractionPlot/

Figure 5.10: Shiny app demonstrating fragility of interaction based on cutpoint

interaction effect flip, and even provides a visual cue as to why that might be happening– the vastly different sizes of data in the two groups. However, this app had to be hand-coded by me, using the Shiny server/UI framework, so it is not something an introductory student could develop on their own.

5.4 Discussion of first forays

The work discussed in this chapter has helped to ease some of the transition between learning and doing, particularly for high school students associated with the Mobilize Project. However, the tools developed only hint at the richer interfaces that are possible. The experience of creating LivelyR prompted Lunzer and myself to reconsider our choice of target technology, because the combination of R and Lively Web caused us to get locked in to particular interface choices too early. Instead, we should be attempting to create as many distinct possibilities as possible (as in a charrette) and using user studies to learn which are most successful. These experiences motivate my future work.

–  –  –

Conclusions, recommendations, and future work Nearly 20 years after Rolf Biehler’s 1997 paper, “Software for learning and for doing statistics,” much of Biehler’s vision has been realized through the development of TinkerPlots and Fathom. These landscape-type tools for learning statistics and data analysis allow novices to jump into ‘doing’ statistics and experience the playful nature of the cycle of exploratory analysis, moving from questioning to analysis and back to questioning. However, new developments in computation and data analysis are beginning to trickle down into introductory material, and need better support. In particular, the drive for reproducible research has trickled down from science (Buckheit and Donoho, 1995) to introductory statistics (Baumer et al., 2014), and needs to be supported by tools for learning.

However, the movement of best practices is not simply from professional tools to tools for learning. In fact, tools like Fathom and TinkerPlots have strengths professional tools are sorely lacking, like support for interactive graphics, parameter manipulation, building new plot types, and integrated randomization. We can imagine a tool combining the strengths of both paradigms, eliminating the gap between tools for learning and tools for doing statistics.

In Chapter 3 we considered the strengths and weaknesses of tools currently on the market. While R has been gaining followers because of its strengths, like its status as a free and open source programming language, the R programming paradigm is very stilted and does not support exploratory analysis as well as it could. Projects like Shiny, RMarkdown and the iPython notebook are making it possible to combine textual programming languages with interactive and publishable results, but they typically provide either dynamic or interactive graphics, never dynamic-interactive.

Conversely, TinkerPlots and Fathom make it simple for everyone, including novices, to interact with their data. However, this interactivity comes with tradeoffs, particularly in terms of sharing results (proprietary file formats make it hard to share interactive results with others) and reproducibility (as there is no linear documentation of the analysis process).

In Chapter 4, we discussed the attributes necessary for a modern statistical programming tool bridging the gap between being a tool for learning and a tool for doing. These attributes include easy entry for novice users, data as a first-order persistent object, support for a cycle of exploratory and confirmatory analysis, flexible plot creation, full support for randomization, interactivity at every level, inherent visual documentation, simple support for narrative, publishing, and reproducibility, and the flexibility to build extensions. While there are efforts to move toward this ideal tool, no existing products satisfy all the requirements.

Chapter 5 describes a set of experiments I have undertaken in the space of closing the gap between tools for learning and tools for doing statistics. One component of this is curricular: high school level material developed through the NSF grant Mobilize, including data science units to be added to courses in computer science, Algebra I and Biology, and a freestanding course called Introduction to Data Science. As part of the Mobilize project I have also considered appropriate computational tools (R, Deducer, and R within RStudio), and developed additional functionality in the MobilizeSimple package.

In joint work with Aran Lunzer, we considered ways in which interactive tools would provide capabilities not possible using pen-and-paper, including an interaction history to make using an interface more reproducible. Finally, I created a few illustrative Shiny apps, to demonstrate the microworld capabilities of the system.

6.1 Current best practices Because there are currently no tools containing all the attributes discussed in Chapter 4, it makes sense to consider the best practices using existing technology. As we have seen repeatedly throughout this work, there is a contrast between tools for learning and tools for doing. On the tools for learning side, Fathom appears to have the most features complying with the requirements. It has a low threshold, provides plenty of interactivity for analysis authors, and has a low price point. However, if Fathom is to be used in an introductory class, efforts must be taken to scaffold its use toward the next tool. For example, the end of the semester could include a basic exploratory data analysis project in R.

For those more interested in starting students on a professional tool, but providing better ‘on-ramping’ to the tool, the use of R within RStudio is recommended. In addition, scoping decisions should be made to only introduce students to a small set of R commands and one unified syntax. In the Mobilize project, we have followed the lead of Project MOSAIC and have used the formula syntax, using mosaic, lattice graphics, and the additional tools available from the MobilizeSimple package. This package includes the integrated lab exercises described in Section, which allow students to move through structured activities without leaving the RStudio interface. This approach is similar to that taken by tools like swirl and DataCamp, but the Mobilize labs offer more space for creativity and inquiry by not locking students into a particular trajectory. R is a landscape-type tool, which does not specify any particular trajectory. The Mobilize labs provide more of a route through the material, while

allowing for exploration around the edges.

As shown in Section 5.3, Shiny has potential as a tool for creating microworlds or minitools, allowing novices to explore within an environment built in a target language. However, while Shiny apps allow users to interact with data, they still suffer from a hard gap between using the interactive tool and using the target language (i.e., R).

6.2 Future work

None of the work presented in this dissertation is ‘the’ system, so my goal for the future is to begin building larger experimental prototypes in order to explore the possibilities. At present, I imagine a blocks-programming environment along the lines of Scratch, allowing novices to begin doing statistics and data analysis. However, underlying this environment would be a textual programming environment more like the target language (e.g., R).

The challenge is there should be a tight coupling between the visual representation in the block and the language underneath it. And, it should be possible to build up additional visual blocks to add to the system, and share with others. In other words, we want a bijection between visual blocks and textual programming, rather than an injection. If the blocks programming system is fixed and the only way to move forward is to write in the textual language, then the language becomes injective.

There are several components to be developed in order for this sort of system to work properly. The first is the domain-specific language to underly the visual component. The challenge with developing these primitives is to make them descriptive enough to capture all the basic tasks necessary, while still providing the possibility to create new functionality. The language should be expressive enough to be used in many circumstances, but limited enough it can be captured by the (small) working memory of humans – either 7 ± 2 or 4, depending on who you consult (Miller, 1955; Cowan, 2000; Shah and Hoeffner, 2002).

The second is the visual system itself. The interfaces of tools like TinkerPlots, Fathom, Data Desk, and JMP provide some inspiration, but as they do not capture a reproducible workflow or encourage integrated narrative, there are changes to be made. My future work will focus on paper prototyping in order to “get the design right and the right design” as Bill Buxton says (Buxton, 2007).

Once the design has been solidified, and the underlying language is clear, there is a challenge of implementation. As reactive programming in R gets more support through Shiny, I am hopeful implementation in R will be possible. However, it is likely additional computational components will need to be included to support all the functionality. For example, research and anecdotal experience suggest packages used through the browser are best for novices, because they remove many technical difficulties. However, in-browser support for data is very limited, and running on a centralized server can lead to delays.

6.3 Final words

Pages:     | 1 |   ...   | 15 | 16 || 18 | 19 |   ...   | 20 |

Similar works:

«MODULATING IMMUNE RESPONSE INSIDE BIOMATERIAL-BASED NERVE CONDUITS TO STIMULATE ENDOGENOUS PERIPHERAL NERVE REGENERATION A Dissertation Presented to The Academic Faculty by Nassir Mokarram-Dorri In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Materials Science and Engineering Georgia Institute of Technology May 2015 COPYRIGHT© 2015 BY NASSIR MOKARRAM-DORRI MODULATING IMMUNE RESPONSE INSIDE BIOMATERIAL-BASED NERVE CONDUITS TO STIMULATE ENDOGENOUS...»

«Constructing Musical Associations through Instruments: The Role of the Instrument Maker in the Maker-Instrument-Player Network within the Neo-Medievalist Gothic Music Scene William Klugh Connor, III Royal Holloway University of London Department of Music This thesis is submitted to partially fulfill the requirements for the degree of Doctorate of Philosophy in Ethnomusicology. Declaration of Authorship I, William Klugh Connor, III, hereby declare that this thesis and the work presented within...»

«Masaryk University Faculty of Arts Department of English and American Studies English Language and Literature (Teaching English Language and Literature for Secondary Schools) Bc. Miroslav Kohut Superheroes: The Philosophy Behind the Modern Myth Masters’ Diploma Thesis Supervisor: doc. PhDr. Tomáš Pospíšil, Dr. 2014 I declare that I have worked on this thesis independently, using only the primary and secondary sources listed in the bibliography... Author’s signature 2 I want to thank...»

«Vagueness and Borderline Cases Item type Electronic Dissertation; text Authors Daly, Helen Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Downloaded 16-Oct-2016 19:27:00 Link to item...»

«Learning of Protein Interaction Networks Yanjun Qi May 2008 Language Technologies Institute School of Computer Science Carnegie Mellon University qyj@cs.cmu.edu A dissertation submitted to Carnegie Mellon University in partial fulfillment of the requirements for the degree of Doctor Of Philosophy Thesis Committee: Ziv Bar-Joseph (Carnegie Mellon University, Chair) Judith Klein-Seetharaman (University of Pittsburgh & Carnegie Mellon University, Chair) Christos Faloutsos (Carnegie Mellon...»

«Environment and Planning D: Society and Space 2014, volume 32, pages 739 – 752 doi:10.1068/d13111p Organismic spatiality: toward a metaphysic of composition Tano S Posteraro Department of Philosophy, Pennsylvania State University, University Park, State College, PA 16801, USA; e-mail: tano.sage@gmail.com Received 29 August 2013; in revised form 7 December 2013 Abstract. The task of this paper is the construction of a theory of organismic spatiality. I take as a starting point Gilles...»


«Copyright by Rebecca Marie Doran Eaton The Dissertation Committee for Rebecca Marie Doran Eaton certifies that this is the approved version of the following dissertation: Unheard Minimalisms: The Functions of the Minimalist Technique in Film Scores Committee: Eric Drott, Supervisor Byron Almén James Buhler Edward Pearsall Charles Ramírez Berg Unheard Minimalisms: The Functions of the Minimalist Technique in Film Scores by Rebecca Marie Doran Eaton, B.A.; M.M. Dissertation Presented to...»

«CHOICE THEORY: AN INVESTIGATION OF THE TREATMENT EFFECTS OF A CHOICE THEORY PROTOCOL ON STUDENTS IDENTIFIED AS HAVING A BEHAVIORAL OR EMOTIONAL DISABILITY ON MEASURES OF ANXIETY, DEPRESSION, LOCUS OF CONTROL AND SELF-ESTEEM by Scott D. Reeder A dissertation submitted to the faculty of The University of North Carolina at Charlotte in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Counseling Charlotte 2011 Approved by: Dr. John R. Culbreth Dr. Edward A....»

«MEGILL’S MULTIVERSE META-ARGUMENT Klaas J. Kraay Ryerson University This paper appears in the International Journal for Philosophy of Religion 73: 235-241. The published version can be found online at: http://link.springer.com/article/10.1007%2Fs11153-011-9324-3.ABSTRACT In a recent paper in THIS JOURNAL, Jason Megill (2011) offers an innovative meta-argument which deploys considerations about multiple universes in an effort to block all arguments from evil. In what follows, I contend that...»

«A LOCATION MODEL FOR WEB SERVICES INTERMEDIARIES By YI SUN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA Copyright 2003 by Yi Sun I would like to dedicate this work to my family, especially my wife Ziya and our daughter Emily. Their love has supported me throughout all the hesitations and frustrations of this dissertation. ACKNOWLEDGMENTS This dissertation could...»

«HOW METHODISTS WERE MADE: THE ARMINIAN MAGAZINE AND SPIRITUAL TRANSFORMATION IN THE TRANSATLANTIC WORLD, 1778-1803 by LIAM IWIG-O’BYRNE Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY THE UNIVERSITY OF TEXAS AT ARLINGTON May 2008 Copyright © by Liam Iwig-O’Byrne 2008 All Rights Reserved ACKNOWLEDGEMENTS I would like to thank my committee for their work on my behalf,...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.