«University of California Los Angeles Bridging the Gap Between Tools for Learning and for Doing Statistics A dissertation submitted in partial ...»
The output document has nicely formatted text, code with syntax highlighting, and all the results from the code, including numeric summaries and plots. If the user changes something in the source document, they have only to re-knit the document to see the updated results in their output document. For an example of the RMarkdown code and corresponding HTML output, see Figure 3.15.
This paradigm supports reproducible research. It makes it very simple to share analysis results in such a way that the reader can easily audit them, and the same analysis can be easily run on new data, changing only one line in the source code and re-knitting the report to see the results.
knitr functionality is available through any R session, but the most embedded support is through RStudio, where the ﬁle menu of gives users the choice of RMarkdown document and R Sweave (L TEX), among others. Users can also A choose the output format they would like from their RMarkdown code. For a document, the choices are HTML, PDF, or Word.
Once a document type has been chosen, RStudio provides a new document, as shown in Figure 3.16. However, rather than presenting the user with a blank page (as might be seen in word processing software, for example), there is some minimal code already there. This template demonstrates the most important features of the syntax, and it provides a working example using sample datasets that should run on any system.
Providing a minimal working example has been shown to support novices, particularly through research on the acquisition of Java. Michael K¨lling calls o this the “recipe” format (K¨lling, 2010), and suggests a student should never o start with a blank page. Instead, they should always be presented with something that ‘works’ (meaning it either either runs or compiles, depending on the type of language) that they can modify. In Java, sometimes these recipes are in the form of the layout for a function, and other times more like a documentation scheme.
In the realm of statistical computing, RStudio provides document templates for all of their document types. Starting with a recipe provides two supports for learners. First, it is always easier to edit than it is to write (the “just put something down” philosophy), and second, when given a working example, students can verify their system is set up correctly. If the minimal example does not run, they know they need to get more help from their instructor in terms of installation.
Because of all the support they provides for reproducible research, RMarkdown and knitr are clearly moving toward the future of statistical programming.
However, the re-knitting step that must take place between every change and its appearance in the compiled document is a stumbling block. And because of its connection to textual languages, it is not clear how knitr could extend in a visual setting.
Figure 3.16: New RMarkdown document with template 3.
8.4 Tools for learning R There have been eﬀorts to provide tools speciﬁcally to guide students through the process of learning R. Two of the most promising are swirl and DataCamp.
188.8.131.52 swirl swirl is an R package intended to teach R at the command line (Carchedi et al., 2015). It was developed by Nick Carchedi, a graduate student at Johns Hopkins University.
swirl has a number of advantages, the biggest being that students do not need to leave their R environment to see the curricular materials to which they are referring. Compared to other approaches for learning R, which might include looking at a book next to a computer and then trying exercises on the screen, or switching back and forth between a browser window with some documentation and an R session, swirl does have the advantage. By combining everything into one window, it makes the experience of learning R more focused and less likely to be sidelined by other issues (e.g., the book falling closed, losing the browser window).
Another advantage of swirl is its status as an open-source R package. The code is hosted on GitHub, and any instructor can develop their own swirl ‘module’ to cover a particular lesson or set of lessons. These modules are available on GitHub and can be easily loaded in through the swirl interface, so students can pull in modules on whichever topic they want to learn. Currently, there are very few modules, which makes it hard to assess the usefulness of the platform.
However, as more R experts port their material to swirl modules it will become easier to assess.
The authors of swirl suggest using it within RStudio, which means users can see their plots in the plot pane and the items they add to their workspace in the environment pane. These features are discussed more extensively in Section 184.108.40.206. However, beyond encouraging students to use RStudio, the authors have not taken full advantage of the features available in RStudio (e.g., presentations, markdown documents, etc).
Figure 3.17: swirl introducing R
A view of a swirl session is shown in Figure 3.17. Much like the Mobilize labs discussed in Section 220.127.116.11, swirl will prompt the user to try running a piece of code to see what it does. Once the task is completed, swirl will give more information and then ask another question. Because the system is set up to wait for a correct response, it is a very directed form of learning. Unlike the inquiry-based model upon which Mobilize was based, swirl is a linear trajectory of learning. Of course, this form of learning was determined by the format the authors chose to use for swirl. Because it uses a command-line tool and does not have any fancy machine learning behind it, support of very ﬂuid learning trajectories could not be expected.
In the ﬁrst version of swirl, the questions were quite vague and the required answers were very speciﬁc. The authors had not built in an escape function, so if a user got stuck, their only recourse was to close their R session. Additionally, if a user wanted to play around with a function to see what it did, they had to endure endless error messages from the system. There was no capability to go back, which made it feel limiting. The user could only enter code in one particular instance, and once they did it right the system would move on completely.
However, the second version has corrected for all these problems and more.
The questions have been simpliﬁed and it is much easier to escape. For example, if a user is in a code prompt and types info(), they will be shown a list of other commands they can use for utility functions, like skip() to skip the question, play() to try things out without swirl interpreting the code as an answer to a question, or bye() to end the swirl session.
The info() command helps ease frustration, but still requires the cognitive load of remembering the command. A quick reference of some sort would be very useful – taking advantage of RStudio’s support of presentations open in the viewer pane throughout an entire session, or providing custom documentation ﬁles to be loaded in the help pane. Again, the authors have not taken advantage of all the capabilities of RStudio yet.
One particularly ill-advised design choice is the use of red font for all instructions. As we found when implementing Deducer (Section 18.104.22.168) with teenage users in the Mobilize project, users associate red with errors, so red text can be very anxiety-producing (McNamara and Hansen, 2014; Ellior et al., 2007).
Overall, swirl’s main drawback is its rigidity. Even with the improvements in version two, the material feels linear and the question and answer format does not seem structured to provide for long-lasting learning.
22.214.171.124 DataCamp A competitor to swirl is the website DataCamp (Figure 3.18) (Cornelissen et al., 2014). DataCamp uses the format seen on popular sites such as CodeAcademy and applies it to R. Each element of learning R is deﬁned as a module, and as a user moves through the modules they earn points. The interface asks questions that the user must supply an answer to in order to move on, although it will also provide hints.
DataCamp sidesteps some of the issues from swirl by situating the learning on the web. Users do not have to install R, RStudio, or any supporting packages on their computer. Instead, everything takes place in the browser. However, working in the browser is a somewhat artiﬁcial experience, as real R work will take place in another interface. The other disadvantage to Data Camp is that it is not free. The site allows users access to some introductory material without a subscription, but to see all the modules in their entirety, users must pay $25 per month or $50 per year.
3.8.5 GUIs and IDEs for R
In contrast to the Command Line Interface (CLI) that characterizes much of programming, Graphical User Interfaces (GUIs) and Integrated Development Environments (IDEs) are used by programmers to reduce cognitive load in a variety of languages. IDEs are support tools for textual coding, which often provide a source code editor with debugging support, code completion, and sometimes automated code refactoring. A common example is Eclipse, which is used by Java programmers. There have been eﬀorts to simplify Eclipse for novices, like the Gild project (Storey et al., 2003).
The term GUI can be used more broadly. In computer science, the term was initially used to suggest a graphical system allowing a user to interact with a (typically text-only) programming language without having to understand the underlying syntax. However, as personal computers have become more graphical, the term GUI has begun to be used more broadly to mean any computer interface using menus and buttons for interaction. Under this deﬁnition, Word, Skype, and Google Chrome are all GUIs.
Both GUIs and IDEs are attempts at simplifying the use of computers, and there have been eﬀorts to use them to simplify and improve R. As with the history of statistical programming languages, there are too many to exhaustively mention, so we will consider a small selection.
126.96.36.199 R Commander R Commander (Figure 3.19) is a GUI for R. It was developed by John Fox as a way to use R in introductory statistics classes without students having to learn syntax (Fox, 2004). Much like the tools for learning discussed in Section 3.4, R Commander provides a limited set of possible tasks and provides a graphical user interface. While the interface is much more route-type than the landscapetype TinkerPlots and Fathom (Bakker, 2002), R Commander also makes it possible to do summary statistics, graphics, and simple models.
R Commander is a button- and menu-driven GUI for R, and any action taken in the GUI results in the corresponding R code appearing in the Script Window (top pane) and output appearing in the Output Window (bottom pane).
Any messages appear at the bottom of the GUI.
While this method allows students to use the tool for introductory statistics courses, it does not facilitate play (as Fathom does) or develop computational thinking (as learning R does). The connection between actions taken with the menus and the resulting code is implicit rather than explicit, and there is little reason for users to manipulate the code.
R Commander oﬀers one of the same beneﬁts as Fathom and TinkerPlots (namely, the user does not need to know how to code in order to use it) but does not take the rest of them (ﬂexible and creative exploration). It is not clear R Commander would help bridge the gap better than other tools for learning, so there is little incentive to use it.
Figure 3.19: R Commander GUI 3.
8.5.2 Deducer Deducer is another GUI for R which, much like R Commander, provides access to some of R’s functionality through menus and wizards2.
Deducer also preserves command-line access to R’s full potential and displays source code as the result of each action in the GUI. It allows novices to perform basic data analyses and produce exploratory plots, maps, and word clouds.
Deducer was developed by a graduate student named Ian Fellows, who saw a disconnect between the way psychologists at UCSD worked with data and the possibilities available in R (Ohri, 2013; Fellows, 2012). Fellows continued work on Deducer at the Center for Embedded Network Sensing (CENS) at UCLA, speciﬁcally considering the Mobilize project (Section 5.1) as a use-case. Deducer’s primary advantage is that it is free, which makes it possible to use in educational settings where funding is tight. It was used in a live educational setting during the second year of the Mobilize project. While Deducer was slightly easier for teachers to learn than R, the disadvantages of the system easily outweighed this slight advantage. See Section 5.1.4 for more on Deducer in the Mobilize project.
Deducer suﬀers from one of the same problems as the standard R GUI: there are many free-ﬂoating windows, and it is easy to lose one.
Another detail is the automatically-created R code, which should be nonthreatening if we want users to make the association between the actions they take in wizards (for example, the plot-creation wizard) and the resulting code, is bright red. In many users’ minds, red signiﬁes ‘error,’ (Ellior et al., 2007) so many users initially believed they had done something wrong.
The term wizard refers to pop-up windows guiding the user through particular tasks. One commonly-encountered wizard is the letter writing wizard in Microsoft Word, which guides users through the salutation and closing of a formal letter. In Deducer, the wizards were for tasks like graph creation or modeling.
Figure 3.20: The initial Deducer screen
Additionally, the plot menus produce beautiful ggplot2 graphics (Wickham, 2009), which is ideal for users because they are inclined to feel pride for having created something appealing. Unfortunately, the resulting ggplot2 code printed into the console is much too diﬃcult for users to parse. It is the diﬃculty discussed with R Commander taken to the extreme. It is highly unlikely a novice will make the implicit connection between their actions and the code. Much like R Commander, the simplicity of the system is not coupled with any inherent playfulness, and it does not seem to actually teach R.