FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 6 | 7 || 9 | 10 |   ...   | 20 |

«University of California Los Angeles Bridging the Gap Between Tools for Learning and for Doing Statistics A dissertation submitted in partial ...»

-- [ Page 8 ] -- RStudio On the other hand, RStudio is an Integrated Development Environments (IDE) for R (RStudio Team, 2014). It was initially developed by J.J. Allaire, who is now supported by a team of expert R programmers, including several who have been mentioned previously in this document (Yihui Xie and Hadley Wickham in particular).

RStudio offers many useful features, such as code completion, file management, and comprehensive code history. For introductory college users, many professors have found the support RStudio provides makes it much easier to pick up R (Baumer et al., 2014; Muller and Kidd, 2014; Pruim et al., 2014; Horton et al., 2014a). It is also easier for high school teachers and students, as discovered during the Mobilize project and in McNamara and Hansen (2014).

RStudio can be run as a desktop application for Mac, Windows, or Linux, but it is also available as a server install. By using a server version, instructors can manage package installations from a central location and provide quick bug fixes to all students at once. Users go to a particular web address, log in, and find their R session just how they left it. All files can be hosted on a central server, which means students can do their work from any computer without having to worry about moving data from place to place.

Because of how simple it makes access, many colleges use RStudio servers for their students. In particular, Smith College, Mount Holyoke College, Duke University, and Macalester College all use this arrangement. For Mobilize, a server version was used for our high school teachers and students (discussed further in Section 5.1).

When a user opens RStudio, whether in the desktop version of the application or through a server, the initial screen will look much like what is seen in Figure 3.22. In fact, in the book “Start Teaching with R,” the authors warn the server version and desktop version “look so similar that if you run them both, you will have to pay attention to make sure you are working in the one you intend to be working in.” (Pruim et al., 2014).

The RStudio screen has four panes, which are shown in their default arrangement in Figure 3.223. RStudio does allow the user to move panes in the options menu, but this layout is the default when a user first launches the program, so we will act as though it is standard.

–  –  –

The first pane allows users to view files and data. This is a useful feature because it allows users to view a spreadsheet-like representation of their data, which they can scroll through (Figure 3.23). In standard R, views of data are much more piecemeal, and users must type commands like head(data) and tail(data) to view the first (or last) few rows. In contrast, RStudio helps smooth the transition from a spreadsheet tool. This is also the pane where documentation can be written by users, such as.R files or RMarkdown documents (discussed at more length in Section 3.8.3).

The second pane allows users to view a list of all the objects loaded into the The following text has been modified from McNamara (2013a)

–  –  –

working environment, as in Figure 3.24a. For example, all datasets loaded will appear here, along with any other objects that have been created (special text or spatial formats, vectors, etc). Again, this helps make using R more concrete.

Every time a user creates a new variable, they see its name appear in the environment tab, and can click on its name to inspect it more closely. The other tab in the second pane is a code history, which includes the complete history of all code typed, over all R sessions, as in Figure 3.24b. The history is searchable, so the user can use the search box at the upper right of the pane to search through their code history.

–  –  –

Figure 3.24: Second pane in RStudio: environment and history The third pane provides integrated views of files, plots, packages and help.

When a user runs code in the Console creating a plot, the Plots tab will be automatically selected, as in Figure 3.25b. The automatic plot tab selection makes it very simple for users to know when the code they have run created a plot. In contrast with the standard R GUI, which has a floating window for plots that can easily get ‘lost,’ this integrated tab keeps everything cohesively together.

The packages tab (Figure 3.25c) gives a visual summary of all the packages the user has installed and which are loaded into the current working session.

Again, the ability to visually keep track of packages supports users. In the standard R GUI, users have to type installed.packages() to see the list of packages they have installed, and this command is often not taught. By making it simple to see the list of packages, RStudio is encouraging users, even novices, to view their installed packages. Similar to the plots tab, the help tab will be automatically selected whenever the user runs help code in the Console, like help(plot) or ?plot.

The fourth pane is the console, and provides the command line for users to enter R code. The console pane is where the majority of work takes place – the other panes provide support for the user, but no code interpretation. The console looks similar to the standard R GUI, but it also provides support for users.

For example, if a user begins to type a function and then hits the ‘tab’ key on their keyboard, RStudio will do code completion and provide a hovering hint of the documentation of the function (Figure 3.26).

RStudio provides many support features, in particular a unified interface where windows cannot get ‘lost.’ It also provides visual cues; to objects in the working environment, to installed packages, and to files in the working directory. The data preview functionality helps ease the transition from spreadsheet programs. And even in the most programming-oriented area, the Console, RStudio provides coding support features like tab completion and code hints. RStudio has been used successfully in many introductory college statistics classes (Baumer et al., 2014; Muller and Kidd, 2014; Pruim et al., 2014; Horton et al.,

–  –  –

2014a) and with high school teachers and students through the Mobilize Project (Section 5.1). However, even though it lowers the barrier to entry for R, RStudio still requires users to code, so there is a startup cost associated with using it.

–  –  –

Other commonly used tools for doing statistical analysis are Stata, SAS, and SPSS. All three tools are stand-alone software, and all combine elements of graphical user interfaces with command-line tools. They are used in a variety of disciplinary contexts, so the argument for teaching them is ‘students will need to use this in the future.’ They are often popular in industry, because they come with guarantees of validity and technical support.

Stata (Figure 3.27) is often the tool of choice for introductory statistics courses taught in an economics department, as it is used routinely by economists. Stata does support users writing their own routines, so it is extensible. It also includes a command-line language, so it can be used to create reproducible research, in the sense that analyses can be re-run to get the same results.

However, Stata does not integrate with tools like the iPython notebook or knitr, so there is no easy way to produce reproducible reports (Rising, 2014).

Another major drawback to Stata is its price. As of 2015, educational pricing was $445 for an annual license or $895 for a perpetual license. A single business license costs $845 per year or $1,695 for a perpetual license. The company does offer group discounts, but these are also expensive (for example, a 10-user lab costs $1,850 plus $160 each, unless purchasing the “small” version which only deals with data up to 1,200 observations).

SAS has similar benefits and drawbacks to Stata. It is often used in pharmaceutical and business applications because it comes with a guarantee of acFigure 3.27: Stata interface curacy. SAS is also hugely expensive for corporate use, in part because of the guarantee of accuracy and included support. For an individual it costs $9,000, and enterprise and government licenses require users to submit a request for a quote.

However, the company makes the software free for educational use, both as desktop software and via the cloud, so students can access it via a web browser, much like RStudio. This is a shrewd business decision, because it grooms students to become experts at their software. A screenshot of SAS is shown in Figure 3.28.


–  –  –

The other product from SAS often used in an educational setting is JMP.

JMP is a drag-and-drop, menu-driven graphical user interface. Some features of JMP are shown in Figure 3.29. The backbone of JMP is SAS, but JMP provides a simple visual interface. Again, JMP is expensive ($1,540 for an individual) but they offer academic discounts: $50 for a yearly license for undergraduate and graduate students.

JMP provides many features useful for novices, like interactive brushing and linking, generalizable data cleaning, and visual model support.

Like TinkerPlots and Fathom, while JMP does produce interactive graphics within an individual session, these interactive results cannot be exported. Instead, a work session can be printed or pasted into a document. The student version of JMP does not support exporting graphics, but individual licenses do.

SPSS (Figure 3.30) is typically used by social scientists and is focused on a menu-driven interface, although it does have a proprietary command-line syntax allowing for reproducibility. However, the syntax is hard to understand and code is generally only created by copying and pasting, versus users generating code themselves (Academic Technology Services, 2013). SPSS is also very expensive– $5,760 for a 12-month individual license, or $95.75 for a one-year student license.

Although Stata, SAS, and SPSS are commonly used in industry, none of them seem to be supportive of learners. They all provide specific types of graphics, and most work is done using menus and wizards, so they do not make clear what the tool is actually doing. Using these tools creates ‘users’ not ‘creators’ of statistics (see Section 1.5 for more on the distinction). All three tools obscure the underlying computational processes and reduce statistical procedures to button clicks. Although they all provide some capability of extending the software with scripting, none of them have the community of statisticians sharing work that R has. Further, they all suffer from a lack of transparency about how internal routines were coded, they do not produce reproducible reports, and their pricing is prohibitive for the secondary school use-case.

(a) JMP dynamic querying

–  –  –

The most inspirational is JMP (Figure 3.29), which makes data analysis visual and interactive, providing many of the features of software for learning statistics with the power of a tool for really doing statistics.

3.10 Bespoke tools In addition to the tools discussed above, there are a number of ‘bespoke’ tools for doing particular things with data. The most salient examples are Data Wrangler, Open Refine, Tableau, and Lyra.

Data Wrangler (Figure 3.31) began as a project from the Stanford Visualization Group in 2011 (Kandel et al., 2011b). Their goal was to provide a visual representation of data transforms, as well as a reproducible history of those transforms. For example, a user could select an empty row and indicate it should be deleted, at which point the Wrangler interface would suggest a variety of generalizable transformations that could be built from that one ‘rule’ (e.g., delete all empty rows, or always delete the 7th row). Once the user specifies a transform, it is applied to the data and added to the interaction history. The interaction history can be exported as a data transformation script in a variety of languages.

–  –  –

Wrangler can also perform simple database manipulations, in the same way dplyr manipulates data in R. The tools Wrangler provided were so useful the authors were able to convert their academic research project into a corporate venture, which is now known as Trifacta.

Very similar to Data Wrangler is Open Refine (Verborgh and Wilde, 2013).

The project was initially called Google Refine, but has since been turned into an open source package. Like Wrangler, Open Refine can help clean data and document the data cleaning process. It can also be used for data exploration and data matching, including geooding. Open Refine is shown in Figure 3.32.

Again, the results of the refining process are available as a re-useable script.

Both Data Wrangler and Open Refine provide great alternatives to the

–  –  –

spreadsheet paradigm. They privilege data as a complete object, and document all modifications. By suggesting methods of generalizing data transformations, they remove much of the grunge work of spreadsheet analysis. The other benefit of generalized data transformations is they encourage the user to think computationally. Instead of just doing ‘whatever works,’ there is user incentive to find a way to describe the data cleaning rule in a way that works generally.

Tableau (Figure 3.334 ) is a bespoke system for data visualization. As such, it does not provide much support for data cleaning. Tableau makes it simple for users to create interactive graphics that can be easily published on the web.

Tableau will suggest the ‘best’ plot for particular data, which is both a blessing and a curse (Mackinlay et al., 2007). It can lead to much more appropriate uses of standard plots, but it also does not support novices’ learning trajectory.

A user can make a plot without having any idea of what it means. Similarly, Screenshot from Lin (2012).

Tableau makes it possible to fit models to data, but again does not make it clear what these models mean or how appropriate they may be. Like the tools discussed in Section 3.9, Tableau is expensive– $999 for an individual license or $1,999 for an individual professional license. However, as with SAS, they make the tool free to students.

Figure 3.33: Tableau

Pages:     | 1 |   ...   | 6 | 7 || 9 | 10 |   ...   | 20 |

Similar works:

«The Delicacy of Causal Ascription and Bell’s Theorem Erik Curiel August 24, 2009 Contents 1 Introduction 1 2 Jones and Clifton’s Argument Part 1: The Formal Situation 5 3 Jarrett’s Argument 10 4 Jones and Clifton’s Argument Part 2: The Delicacy of Causal Ascription 16 5 Conclusion 19 The utility of a notion testifies not to its clarity but rather to the philosophical importance of clarifying it. Nelson Goodman Fact, Fiction and Forecast 1 Introduction Quantum mechanics predicts...»

«Shifting the Geography of Reason XII: Technologies of Liberation Caribbean Philosophical Association June 18-21, 2015 IBEROSTAR Paraíso Beach Hotel, Riviera Maya, Quintana Roo, Mexico 18 de Junio / June 18th JUEVES / THURSDAY 8:30 a.m. – 10:30 a.m.QUINTANA ROO: BIENVENIDA Y APERTURA DE LA CONFERENCIA / WELCOME AND COMMENCING THE CONFERNCE Jane Anna Gordon, President of the Caribbean Philosophical Association Rosario Torres Guevara, Vice-President of the Caribbean Philosophical Association...»

«The Operas of J. N. von Poissl (1783-1865) Aesthetics and Ideology Martin John Pickard Submitted in accordance with the requirements for the degree of Doctor of Philosophy The University of Leeds School of Music May 2012 ii The candidate confirms that the work submitted is his own and that appropriate credit has been given where reference has been made to the work of others. This copy has been supplied on the understanding that it is copyright material and that no quotation from the thesis may...»

«Confronting the Challenge of Socialism: The British Empire Union and the National Citizens’ Union, 1917-1927. Ian Thomas BA (Hons). A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Master of Philosophy. August 2010. This work or any part thereof has not previously been presented in any form to the University or to any other body whether for the purposes of assessment, publication or for any other purpose (unless otherwise...»

«Introduction The Pluralist Stance Stephen H. Kellert, Helen E. Longino, and C. Kenneth Waters MINNESOTA STUDIES IN THE PHILOSOPHY OF SCIENCE VOLUME XIX Scientific Pluralism STEPHEN H. KELLERT, HELEN E. LONGINO, AND C. KENNETH WATERS, EDITORS Terms and Conditions: You may use the content only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.upress.umn.edu/html/contact.html...»

«TEDDI: Tamper Event Detection on Distributed Cyber-Physical Systems Dartmouth Computer Science Technical Report TR2016-804 A Thesis Submitted to the Faculty in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science by Jason Reeves DARTMOUTH COLLEGE Hanover, New Hampshire May 2016 Examining Committee: (chair) Sean W. Smith, Ph.D. Sergey Bratus, Ph.D. David F. Kotz, Ph.D. Zbigniew T. Kalbarczyk, Ph.D. Ryan Bradetich, Ph.D. F. Jon Kull, Ph.D. Dean of...»

«Regulatory Fit with Message Framing: Its Role as a Reducer of the Post-Purchase Cognitive Dissonance of Consumers Vincent Hugh Brown A dissertation submitted to the Royal Holloway School of Management in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of London London, October 2012 Supervisors: Dr. Isabella Chaney, and Dr. Sameer Hosany Adviser: Dr. Catherine Liston-Heyes Regulatory Fit with Message Framing: Its Role as a Reducer of the Post-Purchase...»

«The Emergent Holographic Scene Compositions of movement and affect using multiplexed holographic images A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy Martina Mrongovius Bachelor of Applied Science (Applied Physics) School of Architecture + Design Spatial Information Architecture Laboratory RMIT University September 2011 Declaration I certify that except where due acknowledgement has been made, the work is that of the author alone; the work has not...»

«Rooted Cosmopolitanism in the Poetry of Seamus Heaney, Derek Walcott, and Joseph Brodsky by Jamie L. Olson A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (English Language and Literature) in The University of Michigan Doctoral Committee: Professor Laurence A. Goldstein, Chair Emeritus Professor George J. Bornstein Professor Linda K. Gregerson Associate Professor Michael Makin © Jamie L. Olson Acknowledgements One cannot complete a...»

«ICP ETCHING OF SILICON FOR MICRO AND NANOSCALE DEVICES Thesis by Michael David Henry In Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy CALIFORNIA INSTITUTE OF TECHNOLOGY Pasadena, California Defended May 19, 2010 ii  2010 Michael David Henry All Rights Reserved iii I would like to dedicate this thesis to multiple people: Most importantly I dedicate this work to my wife Shaleigh. I also dedicate this work to the scientists and engineers who have invested much...»

«Honours Handbook 2016 Departments of Philosophy The University of Adelaide and Flinders University Last updated: November 30, 2015 Welcome to Honours in Philosophy!~ Honours in Philosophy is both challenging and fun. It will help you develop into a more independent researcher, able to address philosophical questions in depth and with greater confidence, and w ith a greater breadth of advanced disciplinary knowledge. Honours will also provide the opportunity to study an area of your own...»

«Eric Teachout Philo 253 Ethics and the Good Life Prof. Anthony Rudd December 14, 2012 Final Essay Kierkegaard vs. Camus on the Nature of Knowledge and Subjectivity In the nature of philosophical studies, perhaps the most fundamental of all questions is the existential “why” which expresses an innate desire of humans for meaning and unity in life. Yet, when these studies turn to discourse, sparks fly and confusion can often follows. In The Myth of Sisyphus, a famous work in the field of...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.