«University of California Los Angeles Bridging the Gap Between Tools for Learning and for Doing Statistics A dissertation submitted in partial ...»
For students enrolled in IDS, the labs appear to be a coherent piece of the RStudio implementation. When students log in to the server version of RStudio, they can call up whichever lab they are completing by using the load_labs() function. Whichever lab they select is then displayed in the Viewer pane of RStudio (as described in Section 126.96.36.199). Because of the embedded slide viewing functionality in RStudio, students can page through the lab at their own speed and answer questions in their ‘journal,’ either on paper or in an RMarkdown document.
This essentially provides the type of functionality oﬀered by swirl (Section 188.8.131.52), but without getting locked into answering questions. All the curricular materials are in the same browser window, so students do not need to ﬂip back and forth between screens, but they are still free to play with code.
To see the full text and installation instructions for all the labs, see https://github.
5.1.4 Technology Over the years I was involved with the Mobilize project, we iterated through a variety of technological tools7. In the ‘pre-pilot’ stage of the grant, immediately preceding my appointment, the platform of choice was R, using the base R GUI, as described in Section 3.8. Although we were working with computer science teachers, this proved to be very diﬃcult for them, as they had little prep time and struggled to pick up a new programming language on the ﬂy.
Our next endeavor was Deducer, a graphical user interface (GUI) for R. For more information about Deducer, see Section 184.108.40.206. We used Deducer in the 2011-2012 school year. Deducer was slightly easier for the teachers to learn, but its status as a menu-driven standalone application presented a number of logistical challenges.
This work is adapted from McNamara and Hansen (2014).
Because Deducer is completely menu driven, it is very hard to document, and thus diﬃcult to use in an educational context, where supporting materials are always needed. Rather than providing a reproducible code example, each step of our documentation needed to be a screenshot of the particular conﬁguration of the system.
Teachers often wanted features not available in the current version, and even though Deducer was a tool developed in conjunction with the grant, long development times prohibited this from happening. The development time was the result of Deducer being implemented in Java, which required the grant to hire an expert developer and pass them design speciﬁcations before changes could be made.
Finally, Deducer had to be installed on each computer individually. This was problematic for two reasons. First, any bugs present in the installed version are essentially set in stone until another version is available. Second, because many teachers lacked the time and resources to do installations themselves, we sent grant employees into the ﬁeld to do installations. Sometimes this was very complicated, as installation required administrator passwords to which teachers did not have access, or which had been lost to time. As a result, some features did not work properly.
Learning from the experience of Deducer, we chose to move back toward R in the 2012-2013 school year. However, instead of using the standard R GUI as discussed in Section 3.8, we chose to deploy using the server-side version of RStudio. The use of the RStudio server version allowed us to sidestep the installation problems associated with Deducer. Because students could access RStudio through the browser, they could access the tool from any computer with internet access, and did not have to worry about user privileges. Files were stored on the server, so they could not be lost to nightly hard drive wipes.
Simply making the switch to RStudio made a huge diﬀerence. One of the
challenges of the base R GUI and with Deducer is there are many disconnected windows that ﬂoat seemingly at random on the screen. Users often minimize windows (e.g., the plot or help window), and then struggle the next time they make a plot because they cannot ﬁnd their results. One of the simple solutions RStudio oﬀers is it ‘sticks’ all windows together, so the plot window cannot be lost, since it is stuck to the others.
Another improvement to getting started using R was the development of an R package called MobilizeSimple. The aim of MobilizeSimple is to provide simplifying functions to reduce the number of lines of code necessary to perform tasks such as making maps and word clouds. I was the original author of MobilizeSimple, with subsequent revisions and iterations developed by James Molyneux. This package is discussed in more depth in Section 220.127.116.11.
Finally, we provided as much documentation as possible to support teachers as they began implementing the curriculum in their own classrooms. First, package documentation for the MobilizeSimple package and all the other packages used by the project are accessible through the standard R help() function and the Help tab in RStudio.
Teachers were also provided with a one-page summary, or ‘cheatsheet,’ of all functions necessary for completion of the unit, including minimal working examples, and a pdf document describing how to put together functions to accomplish tasks in the curriculum (e.g., text analysis and map creation).
The project’s Youtube page hosted videos showing the basics of logging in to RStudio, the tabs and features of RStudio, and tasks like plotting, mapping, and working with text. A wiki containing the the same material as the pdf document and cheatsheet – updated on a daily basis as I received questions or comments from teachers – was accessible on the web. Finally, I oﬀered – and the current team still oﬀers – support via an email ticketing queue, answered within 24 hours (often, within 2-3 hours).
However, even with all these support materials, ECS teachers ran into snags.
Without access to the research and evaluation results from the grant team it is impossible to make any formal statements. However, my anecdotal experience answering email and hearing from people who observed classroom behavior suggests many teachers, perhaps insecure about their skill in R, relied heavily on my videos and wiki as curricular materials. Although there is a full, day-byday curriculum guide for ECS, replete with classroom activities and questions for the students to work through, teachers rarely used this. Instead, they had students work through examples I had posted on the wiki, copying and pasting commands into the console.
However, the modiﬁcations have helped R became usable enough that teachers were able to deploy successfully in IDS, and the project settled on R within RStudio for the remainder of the life of the grant.
Another interesting observation from the early years of Mobilize was that no matter which tool the teachers were ﬁrst exposed to, they disliked changing away from it. Even if they acknowledged the new tool was objectively better, they were opposed to the change. This is supported by research in other areas.
In a route-type software trajectory, Bakker observed participants were hesitant to make the switch to the next minitool (Bakker, 2002).
Running in parallel to our search for the right tool for data science at the high school level was a conversation about whether teachers and students needed to truly learn to code. Because there was still such a high startup cost to learning R, it became clear we could not expect teachers to try to introduce it into their class if students were only going to see it for a day or two. However, we still wanted those students to have the experience of asking and answering questions with data, so the technology team on the grant developed a few tools to allow students to view their data without having to code.
18.104.22.168 Dashboard tools
Jeroen Ooms developed several tools for simple visualization of the studentcollected data. The ﬁrst is an embedded visualization tool (Figure 5.2) within our data access platform, ohmage (Tangmunarunkit et al., 2013). It allows students to select one or two variables to analyze, and then produces an appropriate type of plot.
Figure 5.2: ohmage in-browser data visualization engine Ooms has also produced a standalone data visualization tool (Figure 5.
3) that is currently custom-tailored to the data students are producing.
These tools have been invaluable for teachers and students participating in the Mobilize project. They allow students to begin working with data without ﬁrst having to learn a lot of computational skills. For this reason, they’re an excellent ﬁrst step into the world of data analysis, particularly in short curricular excursions like the math and science units. However, much like the visualizations from the New York Times, they prescribe what a person can do with them. There is limited support for subsetting, and the raw data cannot
be worked upon, only be browsed.
22.214.171.124 R package The Mobilize project limits what learners see of R, particularly during their ﬁrst semester of a course, to the formula syntax. In order to do this, we have used the mosaic package, pkglattice graphics, and a few additional functions in the MobilizeSimple package.
The motivation for the creation of MobilizeSimple was a one-day professional development session for ECS teachers in 2011, after which they were expected to be proﬁcient enough with R to teach it to their students. California does not have a certiﬁcation for computer science teachers (Lang et al., 2013), so most of the participating teachers has degrees in mathematics or business, and were only nominally computer scientists. As a result, they were novice programmers and novice statisticians.
Using the Exploring Computer Science Mobilize unit discussed in Section 126.96.36.199, I enumerated the tasks teachers needed to be able to accomplish in R, noting how many lines of code it would take to do them. My heuristic was if a task could be done using one function call or one line of code, I would leave the existing R functionality as it was. However, there were several tasks requiring
many lines of code, such as:
• Creating a map, using as base map pulled from Google Maps or OpenStreetMaps.
• Plotting points on top of the map and adjusting the sizes of points.
• Creating a map with the same base map, but with points scaled by some other variable in the data set.
• Initializing and processing textual data, including removing stop words, making all words lowercase, and stemming common words.
• Creating and modifying a word cloud based on absolute frequencies of word appearance as well as by percentages (e.g., show only the top 1%).
• Creating and modify a bar chart showing the same types of frequencies and percentages.
• Performing basic regular expressions, like pulling 4 out of a string that says “4 hours of sleep.” For tasks requiring more than one line of code, I wrapped up the code into a helper function. For a concrete example, let us examine text analysis. It is possible to do text analysis in R using the package tm (Feinerer and Hornik, 2008). However, to get a basic word cloud (Figure 5.4) of some text data, the
following code is necessary:
MobilizeSimple library(tm) library(RColorBrewer) library(wordcloud) twitter = read.csv("twitterwithdate.csv") textCorpus = Corpus(VectorSource(twitter$message)) processedText = tm_map(textCorpus, tolower) processedText = tm_map(processedText, removePunctuation) processedText = tm_map(processedText, removeNumbers) processedText = tm_map(processedText, removeWords, stopwords("english")) tdm = TermDocumentMatrix(processedText) m = as.matrix(tdm) v = sort(rowSums(m), decreasing = T) d = data.frame(word =names(v), freq = v, row.names = NULL) pal = brewer.pal(9, "BuGn") pal = pal[-(1:4)] wordcloud(d$word, d$freq, random.color = T, min.freq = 2, color = pal) I did not want to completely hide the process when creating the wrapper function for MobilizeSimple, but the goal was to reduce the number of lines to code while maintaining the computational process. In MobilizeSimple, the same
task is wrapped into three functions:
text = InitializeText(twitter$message) processedText = ProcessText(text, removestopwords = TRUE) MakeWordCloud(processedText) The plot is identical to the one seen in Figure 5.4. The simpliﬁed ﬂow in MobilizeSimple still captures the workﬂow of the text analysis process. A user has to initialize the text to get it stored in the right format for R to consider a textual corpus, process it to standardize capitalization, remove stop words, and ﬁnally plot the word cloud. The package abstracts away the details students do not need to know. Note the ProcessText function defaults to making every word lowercase and removing numbers and punctuation, but if the user wants the function to remove stop words they must pass that argument. It was
a conscious decision to make the simplest processes standard, but require extra arguments to do other motivational tasks. For example, if a student is confused or frustrated when they see “the” as the largest word in their word cloud, they will hopefully be motivated to study the documentation to learn how to remove words like this.
This workﬂow provides a simple algorithm to getting word clouds to work, and requires users to pass arguments to functions. Thus, it is capturing some components of computational thinking, but it does not require the burden of many lines of code, like the tm package would, or totally remove the computational experience as GUIs so often do.
Critics of this package have cited its function naming and syntax, and as the package authors we think these concerns are legitimate. Future versions of MobilizeSimple will use more standard R naming conventions, such as using lowercase function names (Wickham, 2014a). It will also be modiﬁed to use the so-called ‘formula syntax’ discussed further in Section 3.8.1. While the MobilizeSimple package has not yet been added to CRAN, it can be installed from Github using the devtools package (McNamara, 2013b).