WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 4 | 5 || 7 | 8 |   ...   | 20 |

«University of California Los Angeles Bridging the Gap Between Tools for Learning and for Doing Statistics A dissertation submitted in partial ...»

-- [ Page 6 ] --

Lexical scoping is a quality either lauded or critiqued in R. In general, the term means variables are only available in particular environments, but in R, lexical scoping means the language looks within local environments first (for example, within the function being used) but if it cannot find a particular variable it will continue to look outside to the next highest environment (the next highest function) until it reaches the global scope. Again, this type of scoping can be good or bad. It means R is less restrictive than some other languages, but it can also lead to unintended side effects.

R work typically takes place at a command-line interface (CLI). In fact, R can be used directly at the command line, as seen in Figure 3.11. A slightly friendlier interface is the regular R graphical user interface (GUI) that comes with the language when it is downloaded (Figure 3.12). GUIs are discussed in more depth in Section 3.8.5, but essentially the GUI allows a user to move some tasks from command line typing tasks to more menu-oriented tasks. The regular R GUI provides a file menu, a red stop button for when code gets out of hand, and not much else. Section 3.8.5 examines more advanced GUIs for R as well as the integrated development environment RStudio. All these approaches add support for users, but still require syntactic knowledge.

Figure 3.11: R at the command line

Like many programming languages, R has both a language base and additional libraries, allowing users to extend its functionality. The additional libraries are called ‘packages’ and most are hosted on a centralized server called the Comprehensive R Archive Network (CRAN) (R Core Team, 2015). Having CRAN makes it simple for users to install new packages, as they do not have to go hunting for where a particular package is hosted. Additionally, R provides great library management support. By using the command install.packages(), a particular package can be pulled from CRAN, unpacked, and installed in precisely the right location. This simplifies the installation process and makes it Figure 3.12: Regular R graphical user interface much more straightforward than library management in Python, for example.

Because R has the statistical community invested in it, and because it is open-source and easy to modify, there are many additional packages for R. As of this writing, CRAN hosts over 6,500 packages. However, there has been a recent movement away from hosting packages on CRAN toward hosting them on GitHub. Using Hadley Wickham’s devtools package, package installation from GitHub is as seamless as from CRAN, although there are fewer checks on packages prior to their installation. The reduced checks could allow unscrupulous package writers to distribute packages with nefarious aims (a worst case scenario would be some type of trojan horse virus). However, because of the high bar packages must pass to be included on CRAN, the relaxed checking also promotes more creative packages and allows users to try out packages still in development. The flexibility of being able to install packages from GitHub has outweighed the risks for many users.

One good affordance of R is it makes it very difficult to modify original data.

Instead, R loads a copy of the data into the work session and the user works with the copy. Once it is loaded, it is possible to ‘clean’ the data, i.e., standardize some of the fields so labels are consistent, convert 0s to NAs, etc. None of these actions are taken on the data itself. Instead, they take place on the copy of the data you have loaded. Because R is a language, it is possible to follow the trail of actions taking the original data file to the cleaned version. Code can be saved and verified by another party if there are questions of reproducibility. Of course, it is still possible to save the modified data file over the original, but the process of making changes to a local copy makes it much less likely.

There are many other affordances of data analysis systems that shape the way users think about and work with data. For example, while many languages rely on constructs like ‘for’ loops and ‘while’ statements (often called control statements), other languages, like R, support vectorized operations. Instead of having to explicitly state an operation should be done for every entry in a list, R allows for the operation to be applied to the list itself, and the program’s inherent paradigm will know to do it in vectorized format. R includes control statements like ‘for’ loops, but they are used less often than in other similar languages.

As with any tool, R has its shortcomings as well. The main drawback of R is its status as a programming language. Many of the other tools discussed here are much more toward graphical user interfaces, while R is a language, meaning users need to provide the correct function calls with appropriate syntax and arguments. Adding to this is R’s inconsistent syntax, which makes it hard to learn for novices and programming experts alike. This is discussed below in Section 3.8.1.

There have been efforts to simplify the coding aspects of R over the years.

Some of these efforts are curricular, reducing the number of commands to which novices are exposed, or providing more consistent syntax (Verzani, 2005; Kaplan and Shoop, 2013). Other efforts are Graphical User Interfaces (GUIs) like Deducer (Fellows, 2012) and RCommander (Fox, 2004), discussed further in Section 3.8.5. However, none of these efforts have truly solved the problem.





–  –  –

One complex aspect of R is the multitude of syntaxes it supports. Where most programming languages would have one standard syntax, R has many.

The two main syntaxes most users encounter are the dollar sign syntax and the formula syntax. The dollar sign syntax uses the $ operator to denote when an object is within another object. For example mtcars$wt indicates the wt variable within the mtcars dataset. The formula syntax is so named because it is most commonly found in functions performing modeling, which use a formula specification. This syntax uses a ~ operator, but the entire syntax is different.

Instead of referring to variables within datasets, the user refers to the variables directly and then notes the dataset later.

For a more thorough example, see below. In this example, a set of three plots are made to compare the weight (wt) and miles per gallon (mpg) of cars with different numbers of cylinders (cyl).

First, the dollar sign syntax:

par(mfrow=c(1,3)) plot(mtcars$wt[mtcars$cyl == 4], mtcars$mpg[mtcars$cyl == 4]) plot(mtcars$wt[mtcars$cyl == 6], mtcars$mpg[mtcars$cyl == 6]) plot(mtcars$wt[mtcars$cyl == 8], mtcars$mpg[mtcars$cyl == 8])

Then, the formula syntax:

xyplot(mpg~wt | as.factor(cyl), data=mtcars) The plot from the dollar sign example is seen in Figure 3.13 and the plot from the formula example is seen in Figure 3.14. Perhaps the most obvious observation of these two plots is they do not appear to be ‘the same.’ This is because the formula syntax example standardizes the axes automatically, while the dollar sign syntax example generates the best axes for each plot.

21.5 21.0 20.5 mtcars$mpg[mtcars$cyl == 4]

–  –  –

19.5 19.0 18.5 18.0 1.5 2.0 2.5 3.0 2.6 2.8 3.0 3.2 3.4 3.5 4.0 4.5 5.0 5.5

–  –  –

The plotting example makes the formula syntax appear far superior, but there are tasks that are much easier in the dollar sign syntax as well. For example, creating a new variable in the formula syntax requires something like mtcars = mutate(mtcars, avgWT = mean(wt)) rather than mtcars$avgWT = mean(mtcars$wt) in the dollar sign syntax.

One method for simplifying R for novices is to expose them to only one syntax. In projects limiting exposure to only one syntax (Project MOSAIC and Mobilize), the choice has been made to use the formula syntax (Pruim et al., 2015a; Gould et al., 2015), although either choice would have been effective.

In order to focus on the formula syntax, graphics are made using the lattice package (Sarkar, 2008), summary statistics are computed with the mosaic package (Pruim et al., 2015b), and modeling can continue to be done in base R.

–  –  –

Again, one of the strengths of R is the large quantity of additional packages available to extend its functionality.

3.8.2.1 mosaic Project MOSAIC and its associated R package, mosaic, have advanced the state of R for education (Pruim et al., 2015a,b). By making summary statistics available in the formula-based R syntax, the mosaic package allows for a standardization within the introductory curricula. By using the mosaic package, along with lattice graphics (Sarkar, 2008), students can stay firmly within the formulabased syntax for an entire introductory statistics course.

In addition, the project has been improving RMarkdown templates (discussed further in Section 3.8.3), and creating interactive widgets – what Bieher might call microworlds – to allow users to interact with R more dynamically (Biehler, 1997). However, even with these advantages, students still need to type syntactically correct code and cannot easily create their own interactive graphics.

3.8.2.2 The Hadleyverse

In recent years, simpler and more flexible R packages have become more plentiful. A main driver of this trend is Hadley Wickham, the Chief Scientist at RStudio (RStudio is discussed more in Section 3.8.5.3). Wickham developed the flexible graphics package ggplot2 as his doctoral dissertation (Wickham, 2009).

ggplot2 is an implementation of The Grammar of Graphics (Wilkinson, 2005), which means it supports the creation of novel plot types within a structured syntax.

Wickham has also developed packages to help users deal with data manipulation, such as plyr, reshape2 and dplyr (Wickham, 2011, 2007; Wickham and Francois, 2015). Wickham has said he wants to build tools that allow users to easily express 90% of what they want to be able to do, while only losing 10% of the flexibility. He acknowledges there will always be edge cases falling outside the capabilities of his packages. But, he is not trying to create a complete language, simply domain-specific languages for data analysis tasks.

3.8.2.3 Shiny and manipulate Shiny is an R package developed by the RStudio team enabling R programmers to create interactive visualizations for the web (Chang et al., 2015).

In order to develop a Shiny app, an author must create two R files, one called ui.R and one called server.R. These two files must be developed in parallel, taking particular care to match variable names between the processes happening on the back end (in the server file) and those visible in the app (in the UI file).

In the server file, authors can define reactive expressions. The reactive expressions can be used to build a system that responds to user input, updating only those values that depend on the modifications made by the user.

Shiny supports interface features like sliders, radio buttons, check boxes, even text input. Typically, though, the resulting visualizations are themselves static. The user cannot zoom into them naturally in the way they would with a d3 web graphic. Instead, the designer would have to incorporate sliders for the x- and y-ranges, and the user would manipulate those to impact the zoom.

Shiny also supports simple publishing, as local Shiny apps can be ported to RStudio’s shinyapps.io hosting site with one button click.

However, authoring Shiny apps is not a task for novices. In order to create an interactive graphic, the user first needs to understand R syntax and which parameters she wants to manipulate. The user must also have a basic understanding of reactive programming, and must be able to match up variable names and outputs in the paired server/UI paradigm Shiny uses. Much more useful would be a tool where users could create interactive graphics using direct manipulation of objects on the screen, without needing to know R syntax.

Its challenges notwithstanding, Shiny is a very powerful tool, and it has been enabling R programmers to build interactive tools that have gained viral success, such as the dialect map (Katz and Andrews, 2013) published on the New York Times that eventually received more views than any article in the history of the paper (Leonhardt, D. (@DLeonhardt), 2014).

As it stands, Shiny can be useful as what Biehler calls a “meta-tool,” enabling teachers to “adapt and modify material and software for their students” (Biehler, 1997). There are many nice examples of these types of tools, including the gallery being curated by Mine Cetinkaya-Rundel (Cetinkaya Rundel, 2014) ¸ ¸ and tools for exploring database joins (Ribeiro, 2014). Two Shiny apps I developed are discussed in Section ??.

A simpler package with a similar idea is the manipulate package, which was also developed at RStudio (Allaire, 2014). manipulate is easier for novices to use, although it still requires some knowledge of R. However, instead of requiring the server/UI paradigm Shiny requires (which is useful for publishable web documents), manipulate works directly at the command-line to produce interaction that cannot be published but can be used for educational purposes.

3.8.3 knitr/RMarkdown While not specifically for doing statistical analysis, several projects by Yihui Xie are extending the capabilities for reproducible research, both with R and more generally.

During his doctoral dissertation, Xie wrote the knitr package, which expanded the capabilities of previous functionality called Sweave from base R (Xie, 2013). knitr makes it possible to combine text, code, and the results from the code. This is much like the iPython notebooks discussed in Section 3.3, but is more flexible in terms of languages and output formats. The most canonical examples are including R code in L TEX or Markdown text, but the package is A much more flexible (Xie, 2013). In fact, knitr allows users to combine any type of code (Python, C++, etc) with any textual format.

Users write text and code (delimited as such by particular syntax depending on the textual format they are using), then ‘knit’ the source document to create

–  –  –

Figure 3.15: Code and associated output from RMarkdown a fully formatted HTML or PDF document.



Pages:     | 1 |   ...   | 4 | 5 || 7 | 8 |   ...   | 20 |


Similar works:

«Epistemic Contextualism: A Normative Approach Robin McKenna University of Edinburgh rbnmckenna@gmail.com Penultimate draft. Final version forthcoming in Pacific Philosophical Quarterly. Abstract In his Knowledge and Practical Interests Jason Stanley argues that the view he defends, which he calls interest-relative invariantism, is better supported by certain cases than epistemic contextualism. In this paper I argue that a version of epistemic contextualism that emphasizes the role played by the...»

«Utilizing Soft Computing Methods in Analyzing Build-Operate-Transfer (BOT) Contracts Neda Shahrara Submitted to the Institute of Graduate Studies and Research in partial fulfillment of the requirements for the Degree of PhD in Civil Engineering Eastern Mediterranean University September 2015 Gazimağusa, North Cyprus Approval of the Institute of Graduate Studies and Research Prof. Dr. Serhan Çiftçioğlu Acting Director I certify that this thesis satisfies the requirements as a thesis for the...»

«DESIGN, FABRICATION, AND OPERATION OF HYBRID BIONANODEVICES FOR BIOMEDICAL APPLICATIONS By ROBERT MATTHEW TUCKER A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA © 2009 Robert Tucker To my Family and Friends, for their unwavering support ACKNOWLEDGMENTS First and foremost, I thank my advisor Dr. Henry Hess for his guidance, friendship, and support throughout the...»

«Respect the One You Love? Prepared for presentation at Rocky Mountain Ethics Congress (ROME) Kathleen Poorman Dougherty Associate Professor Department of Philosophy College of Notre Dame of Maryland 4701 N. Charles St. Baltimore, MD 21210 kdougherty@ndm.edu Draft Copy: Please do not quote without permission. Dougherty, 1 Respect the One You Love? Kathleen Poorman Dougherty In her article, “Strength of Character,” Margaret Holmgren considers a case in which children learn that their parents...»

«ARP2/3 COMPLEX HAS A NEUROPROTECTIVE ROLE AND IS REQUIRED FOR MATURE DENDRITIC SPINE HEAD MORPHOLOGY A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY MARCELA MALDONADO IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY LORENE M. LANIER AUGUST 2010 © Marcela Maldonado August 2010 Acknowledgements I would like to thank my committee members, Lorene Lanier, Paul Letourneau, Dezhi Liao, and Lihsia Chen for their support...»

«UNIVERSITÉ DE MONTRÉAL TOWARDS SINGLE BACTERIUM DETECTION: A MICROELECTRONIC/MICROFLUIDIC HYBRID SYSTEM BASED ON A CMOS TECHNOLOGY ZHAO LU DÉPARTEMENT DE GÉNIE INFORMATIQUE ET GÉNIE LOGICIEL ÉCOLE POLYTECHNIQUE DE MONTRÉAL THÈSE PRÉSENTÉE EN VUE DE L’OBTENTION DU DIPLÔME DE PHILOSOPHIE DOCTOR (GÉNIE INFORMATIQUE) AVRIL 2012 © Zhao Lu, 2012. UNIVERSITÉ DE MONTRÉAL ÉCOLE POLYTECHNIQUE DE MONTRÉAL Cette thèse intitulée: TOWARDS SINGLE BACTERIUM DETECTION: A...»

«Subsidiary Innovation and Diffusion: An Integrated Approach on Learning of Subsidiaries from Diverse Local Environments Shaohua (Carolyn) Mu Dissertation submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy In Management Donald E. Hatfield, Chair Devi R. Gnyawali, Co-Chair David L. Brinberg Ming-Jer Chen James R. Lang August, 2003 Blacksburg, Virginia Keywords: Subsidiary learning,...»

«Power, structures, and norms: determinants and patterns of NATO-Russia relations since 1997 Dissertation zur Erlangung des akademischen Grades doctor philosophiae (Dr. phil.) eingereicht an der Philosophischen Fakultät III der Humboldt-Universität zu Berlin von M.A. Anna Steinel Geboren am 18.11.1976 in Ixelles (Brüssel), Belgien Präsident der Humboldt-Universität zu Berlin Prof. Dr. Christoph Markschies Dekan der Philosophischen Fakultät III Prof. Dr. Thomas Macho Gutachter: 1. Prof. Dr....»

«Treatment of trochanteric and subtrochanteric hip fractures Sliding hip screw or intramedullary nail? Dissertation for the degree of philosophiae doctor (PhD) at the University of Bergen Scientific environment “The Intertan Study” (papers I and IV) was performed at the Orthopaedic Department, Haukeland University Hospital (HUS), and in close teamwork with the Clinical Research Unit and the Department of Radiology at HUS. “The Intertan Study” was also based on a close collaboration with...»

«A Methodology for the Design and Implementation of Communication Protocols for Embedded Wireless Systems by Thomas Eugene Truman B.S. (University of California, Berkeley) 1992 M.S. (University of California, Berkeley) 1994 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering – Electrical Engineering and Computer Sciences in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Professor Robert...»

«PERCEPTIONS AND EXPERIENCES OF THE SOCIALIZATION PROCESSES AMONG EARLY CAREER FOREIGN NATIONAL TENURE-ACCRUING FACULTY IN A RESEARCH I UNIVERSITY By MUEEN AIZAZ ZAFAR A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA © 2011 Mueen Aizaz Zafar This work is dedicated to my mother, brothers, and sisters for making it possible for me to get this far despite losing my...»

«University of Iowa Iowa Research Online Theses and Dissertations 2011 T regulatory cells and the germinal center Carla-Maria Alana Alexander University of Iowa Copyright 2011 Carla-Maria Alexander This dissertation is available at Iowa Research Online: http://ir.uiowa.edu/etd/1117 Recommended Citation Alexander, Carla-Maria Alana. T regulatory cells and the germinal center. PhD (Doctor of Philosophy) thesis, University of Iowa, 2011. http://ir.uiowa.edu/etd/1117. Follow this and additional...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.