FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 14 | 15 || 17 | 18 |   ...   | 20 |

«University of California Los Angeles Bridging the Gap Between Tools for Learning and for Doing Statistics A dissertation submitted in partial ...»

-- [ Page 16 ] --

The experience of developing and deploying the MobilizeSimple package underscored my earlier understanding, which is that users need support as they learn a language. Exactly how to support people is the salient question.

5.2 LivelyR Through my work with the Communications Design Group, I was able to collaborate with Aran Lunzer to develop a tool we are calling LivelyR. The Communications Design Group is an independent research lab headed by Alan Kay.

It draws together researchers from Kay’s non-profit, Viewpoints Research Institute, as well as employees from SAP.

The product of this work is an interface we are calling LivelyR8. The interface is a bespoke system (see Section 3.10 for other examples of such systems).

An R server is running in the background, either locally on a user’s computer or on a centralized server. Results are much quicker (and therefore, interaction is smoother) when R is run locally, but of course that requires a local installation.

As discussed in Section 5.1.4, relying on a local installation is a barrier to accessibility. However, because LivelyR is more provocation than prototype, we were not as concerned with the realities of use cases.

R interacts with a JavaScript programming environment called Lively Web.

Lively is an open source tool making it easy to develop applications in the web browser (Ingalls et al., 2013). Each component of a Lively page is ‘live,’ so a user can give it behavior using code, modify the shape or size of it, move it, delete it, and copy it. The results can be shared as completely ‘live’ products (Ingalls et al., 2013). When a user goes to a page produced by LivelyWeb – even the project’s main web page – every element is fully moveable, transformable, and programmable. In order to program the elements, the user does need to know JavaScript, but edits can be made on the fly without having to download source code and edit it ‘offline.’ This stands in direct contrast to projects like RMarkdown and the iPython notebook, where the results may be interactive (if the author made them that way), but in order to edit the user needs to move away from the live implementation.

The Lively team also maintains a “parts bin” where users can share components they have developed, either by dragging and dropping pieces together or writing code (Lincke et al., 2012). There was no existing connection between the Lively server and a server running R, but Robert Krahn created a web socket Lunzer is the primary author on this work, as he did the coding, but we worked through the conceptual ideas together. Our joint work has been supported by thoughtful input from many of our colleagues, most notably Robert Krahn, Dan Ingalls, Alan Kay, Bret Victor, Alex Warth, and Ted Kaehler. The written descriptions included here are based on Lunzer and McNamara (2014); Lunzer et al. (2014) connection. Additionally, the interaction available off the shelf with Lively was not sufficient for the types of behavior we wanted to implement, so Lunzer wrote additional code hacking Shiny and ggvis to provide reactive behavior. In contrast to the standard ggvis functionality, LivelyR uses JavaScript to initiate and control all behavior. This work is discussed in more depth in Lunzer and McNamara (2014); Lunzer et al. (2014).

The behavior of LivelyR was based on my commitment to interactive statistical programming tools, and Lunzer’s longstanding work on subjunctive interfaces (Lunzer and Hornbæk, 2008). For an overview of the functionality of LivelyR as of May 2014, see Lunzer (2014)9.

5.2.1 Histogram cloud One feature that has garnered a lot of attention when we have shown this work is what I call the ‘histogram cloud.’ The data used in the example shown in Figure 5.5 is from the R sample dataset mtcars.

In the screenshot, a scatterplot of weight (wt) versus miles per gallon (mpg) is shown in the center of the plot window. At the top of the image, a few simple summary statistics are displayed: the number of data points currently in use (which changes as subsets are applied), the mean and standard deviation of both variables, and the Pearson correlation coefficient. Green arrows indicate the range of the data included. The arrows can modified interactively to subset the data, but the view shown is using the complete data set.

At the bottom of the screen is a rectangular display of the data set, with the x- and y-variables noted, and ranges for each of the variables. The 0 and 100 indicate 100% of the data are included for each variable. Again, this could be interactively modified to only include a particular percentile of the data.

https://vimeo.com/93535802 For this version of LivelyR, we made it possible to plot several different types of plot in the same plotting window. Therefore, the left vertical axis is the wt axis, but the right vertical axis is the count for the histogram(s) of mpg.

Researchers have found this type of layering of multiple plots is difficult for users to understand, so this functionality would ideally not be included in a production tool (Isenberg et al., 2011; Few, 2008). However, providing histograms along the margins of scatterplots (typically outside the plot region, rather than inside as shown here) is a common visualization feature (Emerson et al., 2013).

Typically, statistical packages provide default bin widths and bin offsets, and the affordances of the system provide a disincentive to modify the parameters.

For example, in base R, the hist() command uses a default bin width based on the Sturges algorithm (Sturges, 1926). The geom_bar() command in the R package ggplot2 uses range/30 and though it does provide a warning, stat_bin: binwidth defaulted to range/30.

Use ‘binwidth = x’ to adjust this.

many people stick with what is given.

There are several other accepted algorithms for choosing optimal histogram bin width (Wand, 1997), but generally the parameters should be tuned to a particular data set. Choosing appropriate bin widths is one of the pieces of data science that ends up being more of an art.

Therefore, we sought to make it so easy to modify the defaults there would be no reason not to. This is in line with Biehler’s idea of a “stretchy” histogram, allowing users to pull the height of bars up or down (Biehler, 1997). In this interface, the bars cannot be directly manipulated, but the parameters are easily manipulable.

Outside the scope of the screenshot in Figure 5.5 but visible in Figure 5.7 are the slider controls for bin width and bin offset. Users can interactively maFigure 5.5: LivelyR interface showing a histogram cloud of miles per gallon nipulate these sliders independently, to see a series of static histograms with the particular parameter value. However, Lunzer’s work often gives users the ability to modify several parameters together, in order to see more ‘what if?’ possibilities. In this case, it is possible to make a ‘sweep’ of one of the parameters. In Figure 5.5 the sweep is of the bin offset parameter. Therefore, each histogram in the cloud has the same bin width, but they all have slightly different bin offsets, where the bin begins. Once this sweep is in place, the user can use the slider to modify the bin width of the cloud, which will produce a new set of histograms with the same (new) bin width, but a variety of bin offsets.

When people are presented with the histogram cloud for the first time, they typically express a sense of wonder. Because the full range of possible histograms for a particular data set has never been accessible to them, they find it fascinating how many variations are possible. The cloud also gives a sense of the ‘true’ shape of the distribution. Obviously, kernel density estimation can provide similar information about the true distribution of the data, but understanding kernel densities requires another layer of abstraction. Understanding histograms is a complicated task for novices (Watson and Fitzallen, 2010; Friel, 2008), but the histogram cloud only requires an additional small cognitive step in order to be understood.

Inspired by the popularity of the histogram cloud when showing LivelyR, I have begun to think about how the concept could be extended to a 2-D setting.

Mapmaking is often appealing to novices because it is very grounded in reality.

As Tom MacWright noted, “The problem with maps is that the world looks like a map. We don’t have that problem with other visualizations.” One of the challenges with map-making, particularly of choropleth maps, is the areal units used do not have much meaning in terms of the variable being mapped. For example, mapping incidences of traffic accidents by zipcode or Census block is typically not useful, because traffic accidents tend to happen along streets. Because data are often measured in standard areal units, there is typically not much that can be done.

In geography, this is called the Modifiable Areal Unit Problem (MAUP). Geographers and geostatisticians have developed some methods for dealing with these data. The methods are usually spoken of in terms of ‘scaling.’ Upscaling is the easiest task, and it involves making a map at a less detailed spatial resolution than the data collection method (e.g., taking a map aggregated at the county level and turning it into one aggregated at the state level). Side-scaling is somewhat more complex, as it involves taking two similarly-sized areal units and translating between them (e.g., moving from zipcodes to Census tracts).

The most complex is down-scaling, which involves taking data at a less-detailed level down to a more detailed level (Atkinson, 2013).

There are a variety of methods to deal with this problem, including data fusion and area-to-point kriging. Also relevant are efforts using data augmentation, much like what was discussed in Section 5.1.1, which will use auxiliary information to help with disagreggation. For example, the Disser project helps disaggregate Census data by bringing in information about zoning to determine housing density (Martin-Anderson, 2014).

Again, because viewers found the ability to move bins in the histogram cloud and see the resulting changes in distribution in 1-D, it seemed likely an analogous action would be appealing in 2-D. However, this extension is still a work in progress.

5.2.2 Regression guess line Another feature built into LivelyR is the ability to create a ‘regression guess’ line. This is another feature suggested by Biehler, who suggests, “eye-fitted lines with residual analysis should precede the method of least squares” (Biehler, 1997). As in the previous section, a video of the interaction is available to more clearly display the functionality (Lunzer, 2014).

Similar to the histogram cloud discussed above, this feature allows users to manipulate one or multiple parameters in order to find the best fit line. In this case, the parameters are the start and end points of the line the user is guessing. Choosing the best line can be done by eye by modifying first one end of the line and then the other, then judging how well the line fits the scatterplot of points. The interface provides additional information in the form of the residual sum of squares (RSS) value, which a user can also use as a guide, manually attempting to optimize RSS.

Because LivelyR complies with Lunzer’s conception of a subjunctive interface (Lunzer and Hornbæk, 2008), it also allows users to try a ‘sweep’ of parameter values. In this case, the user defines a sweep of point locations on one end of the line, then moves the other end by hand. This allows the user to visually compare a selection of lines with a more dynamic set of end points. Additionally, the interface provides an ephemeral plot of the RSS value, which allows the user to more directly attempt to optimize the value by aiming for the local maximum on the plot. A screenshot of this functionality is shown in Figure

5.6. Again, this falls into the category of allowing learners to discover things by themselves, as Biehler suggests.

Of course, even the LivelyR implementation does not do the most superb job of supporting discoverability. The interface provides RSS as a measure to optimize without ever explaining why one might want to optimize it, and there is no visual support to suggest why the movement of the line is increasing or decreasing the RSS. A second generation of this type of tool should include a more self-articulating version of this, where the residuals or the squares are visually represented on the screen. More work should be done to study how novices conceptualize the sum of squares, and to determine the most effective way of conveying the concept visually.

–  –  –

5.2.3 Small multiple callout plots In all the plot scenarios involving sweeps of parameters, LivelyR makes it possible to see each of the possible scenarios broken out into small multiple plots (Tufte, 2001). An example of this is shown in Figure 5.7, where each of the histograms from the histogram cloud in section 5.2.1 is broken out into an individual plot.

Once again, this feature allows for exploration over multiple parameters. In

–  –  –

this case, although the small multiples each display one of the set of histograms from the histogram cloud, the user can use a second input device (in Lunzer’s experiments, an iPad configured for use as a lefthand input) to select a histogram they want to compare with all others. In Figure 5.7, there is a light ghosted histogram shown in the background of all the small multiples. This is the histogram being used for comparison. In this example, the histogram being used for comparison is the beginning of the sweep (notice the small multiple in the upper left does not have a ghost histogram), but the tool is more generic.

5.2.4 Documentation of interaction history

Many of the features discussed above are available in other software packages, particularly those for learning statistics (see Section 2.2 for more on these tools).

However, LivelyR offers a feature we have not witnessed in any common interactive tools: a history. Each time a user performs an action in the LivelyR interface, whether changing the subset of the data, modifying the histogram bin width, or sweeping values for a regression line guess, a line is added to the history list. The history list is visible on the right side of the screenshot in Figure

5.7 and is shown in more detail in Figure 5.8.

–  –  –

Pages:     | 1 |   ...   | 14 | 15 || 17 | 18 |   ...   | 20 |

Similar works:

«WHAT'S THE PROBLEM? An investigation into the social construction of 'problems' through the case of boys and their education. Susan Askew Thesis submitted for the degree of Doctor of Philosophy Institute of Education University of London. '. ' ' Abstract This thesis examines the social construction of 'problems' related to boys' education and solutions to them. It illustrates postmodernist arguments that 'truth' is relative and partial, knowledge is produced by and for particular interests, in...»

«Osmoticand Stroke-Induced Blood-Brain Barrier Disruption Detected by Manganese-Enhanced Magnetic Resonance Imaging A dissertation submitted to the faculty of Worcester Polytechnic Institute in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biomedical Engineering by David G. Bennett June 2007 Approved: Christopher H. Sotak, Ph.D. Major Advisor Professor Department of Biomedical Engineering Worcester Polytechnic Institute George D. Pins, Ph.D. Karl G. Helmer,...»

«Supply and Demand Identifying Populist Parties in Europe and Explaining their Electoral Performance Stijn Theodoor van Kessel University of Sussex Thesis submitted for the degree of Doctor of Philosophy July, 2011 ii I hereby declare that this thesis has not been and will not be, submitted in whole or in part to another University for the award of any other degree.Signature: iii Contents List of Tables and Figures v List of Abbreviations viii Acknowledgements x Summary xii 1 Introduction 1 1.1...»

«LINGUISTIC TRANSFER IN ANDEAN SPANISH: SYNTAX OR PRAGMATICS? BY ANTJE MUNTENDAM DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Spanish in the Graduate College of the Urbana, Illinois University of Illinois at Urbana-Champaign, 2009 Urbana, Illinois Urbana, Illinois Doctoral Committee: Assistant Professor Karlos Arregui-Urbina, Chair Professor Rakesh Bhatt Professor Silvina Montrul Professor Pieter Muysken, Radboud University Nijmegen...»

«APRES MOI LE DELUGE: INDIVIDUALS, INCENTIVES, AND CONFLICT TERMINATION A Dissertation submitted to the Faculty of the Graduate School of Arts and Sciences of Georgetown University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Government By John Furman Daniel, III, B.A. Washington, D.C. August 31, 2010 Copyright 2010 by John Furman Daniel, III All Rights Reserved ii APRES MOI LE DELUGE: INDIVIDUALS, INCENTIVES, AND CONFLICT TERMINATION John Furman Daniel,...»

«Studies of n-type doping and surface modification of CVD diamond for use in thermionic applications MUHAMMAD ZAMIR OTHMAN UNIVERSITY OF BRISTOL SCHOOL OF CHEMISTRY BRISTOL MARCH 2014 A dissertation submitted to the University of Bristol in accordance with the requirements of the degree of Doctor of Philosophy in the Faculty of Science, School of Chemistry Word Count : 52, 223 Abstract This thesis presents the investigation of potential shallow n-type donors that are candidates to be used as...»

«TRADITIONALIST APPROACHES TO SHARĪ‘AH REFORM: MAWLANA ASHRAF ‘ALI THĀNAWI’S FATWA ON WOMEN’S RIGHT TO DIVORCE by Fareeha Khan A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Near Eastern Studies) in The University of Michigan Doctoral Committee: Professor Sherman A. Jackson, Co-Chair Professor Barbara D. Metcalf, Co-Chair Professor Alexander D. Knysh Professor Muhammad Qasim Zaman, Princeton University for Ibbs ii...»

«The Council of Yahweh: Its Structure and Membership by Marylyn Ellen White A Thesis submitted to the Faculty of Theology of the University of St. Michael’s College in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Theology Awarded by the University of St. Michael’s College © Copyright by Marylyn Ellen White 2012   ii   The Council of Yahweh: Its Structure and Membership Marylyn Ellen White Doctor of Philosophy in Theology University of St. Michael’s...»

«Justification, Reasons and Truth By Ian Paul Schnee A dissertation submitted in partial satisfaction of the Requirements of the degree of Doctor of Philosophy in Philosophy in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Barry Stroud, Chair Professor Niko Kolodny Professor Anthony A. Long Fall 2010 © 2010 Copyright Ian Paul Schnee All rights reserved Abstract Justification, Reasons and Truth By Ian Paul Schnee Doctor of Philosophy in Philosophy...»

«Folk Songs of Pagla Kanai: A Critique of Non-Sectarianism Hussain Ahmed Liton, M.A. ================================================================= Language in India www.languageinindia.com ISSN 1930-2940 Vol. 13:9 September 2013 ================================================================= Figure 1: Pagla Kanai's memorial sculpture Courtesy: www.newagebd.com Abstract The focus of this paper is to investigate the Non-Sectarian philosophy reflected in the folk songs of the Bengali poet...»

«Philosophy and Phenomenological Research Philosophy and Phenomenological Research Vol. XC No. 1, January 2015 doi: 10.1111/phpr.12160 © 2015 Philosophy and Phenomenological Research, LLC Assertion and Assurance: Some Empirical Evidence JOHN TURRI University of Waterloo I report three experiments relevant to evaluating Krista Lawlor’s theory of assurance, respond to her criticism of the knowledge account of assertion, and propose an alternative theory of assurance. Introduction Krista...»

«2012-2013 Burnett International College Catalog 2013-2014 Cataog BURNETT INTERNATI ONAL Col lege 2013-2014 1 Volume III Table of Contents Welcome 6 Mission _ 7 Vision 7 School Philosophy 7 Legal Ownership _ 8 Board of Trustees 9 School Administration _ 9 Faculty _ 9 HOLIDAYS AND BREAKS12 Class Start & End Dates for Enrollment 12 Hours of Operation_ 13 GENERAL ADMISSIONS REQUIREMENTS14 International/Non-U. S. Schools_14 International Students 14 ACADEMIC POLICIES_ 15 Attendance _ 15 Clinical...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.