«University of California Los Angeles Bridging the Gap Between Tools for Learning and for Doing Statistics A dissertation submitted in partial ...»
Many of the tools currently used for teaching statistics are guilty of not providing a high enough ceiling. For example, TinkerPlots does not even provide simple statistical modeling functionality.
Most tools for doing statistics provide the ceiling, but not the low threshold.
For example, in R it is easy to create new components of the system using old components, and then share them through a centralized repository where other users can easily ﬁnd and import others’ work.
Again, Konold argues a high ceiling is not necessary for a statistical tool for students (Konold, 2007), but in order to close the gap between tools for learning and tools for doing statistics, extensibility is a requirement. Even Biehler argues that “adaptability (including extensibility) is a central requirement for data analysis systems to cope with the variety of needs and users” (Biehler, 1997).
Making sure a system can be used to extend itself is crucial, because a system author can never think of all the features a priori.
4.10 Summary of attributes Again, the overall structure of a tool complying with these requirements could take a variety of forms. Currently, I imagine a multi-layered system, with a visual drag-and-drop blocks programming environment providing easy entry for novices. Visual representations can help novices understand the underlying workings of a data system (Shneiderman, 1994), which suggests this is the appropriate direction. Once users have reached a certain point, they will likely want to extend their capabilities, and could move on to the next ‘layer’ down, a domain-speciﬁc programming language (DSL). The DSL would provide simple primitives, allowing users to develop their own elements to extend their analysis, and create their own visual blocks to add to the layer above. If users reached the limits of that layer, they could move down again, to the standard statistical programming tool (e.g., R) in which the DSL was implemented. The underlying domain-speciﬁc language could be implemented in any target language (e.g., R, Python or Julia), but because the community of statisticians is currently using R that is what I am currently imagining.
This falls in line with Biehler’s belief that “a co-evolution of tool and user should be facilitated” (Biehler, 1997). In other words, as users top out in the capabilities of one aspect of the system, they can move on to a more complex version allowing for more functionality and ﬂexibility. A solution Biehler proposes is “embedded microwords” (Biehler, 1997). In an embedded microworld, a larger system is used to make a microworld, which can then be extended in the larger system. However, Biehler warns embedding sometimes brings language artifacts from whatever the microworld is embedded in.
As Cliﬀord Konold urges, the system should be built from the bottom up (Konold, 2007), so the ﬁrst step is to consider the tasks a novice would need to accomplish and work from there. Some examples of tasks are discussed in the next chapter.
The best way to predict the future is to invent it.
Given the current state of tools for doing and teaching statistics (Chapter 3), and my vision of future statistical programming tools (Chapter 4), it is clear there is work to be done. Since 2011, I have produced a variety of initial attempts toward this vision of the future. Most of my work can be categorized either under the umbrella of Mobilize (described in Section 5.1) or the Communications Design Group (Section 5.2), although I also describe several Shiny widgets I created as microworlds for courses I taught (Section 5.3).
Mobilize is a six year, $12.5 million National Science Foundation grant focused on integrating data science and computational thinking into a multiplicity of STEM (science, technology, engineering and math) disciplines. The grant has eight “principal investigators” (PIs) and six institutional partners, the most important of which is the Los Angeles Uniﬁed School District (LAUSD). The LAUSD is the second-largest school district in the United States, after the City School District of the City of New York. For three years, I was a graduate student researcher on the grant, and the experience inspired and motivated my
work on statistical programming tools for novices.
The Mobilize team has developed curriculum and supporting technology to allow high school teachers to integrate data science into their classes. In addition, we hold regular professional development1 sessions for in-service teachers2, and our research and evaluation teams study what makes elements of the curriculum work (or not work).
Currently, our data-centric curriculum exists as insertable units in computer science, math, and science, and as a stand-alone, year-long data science class, called Introduction to Data Science (IDS). IDS was piloted in the 2015-2016 school year in the LAUSD. For more detail about these courses, see Section 5.1.3.
All the curricula are based on the idea of engaging students with data through the process of participatory sensing, whereby they learn to collect data about themselves and their communities, often using mobile technology. By grounding the analysis in data with a direct contextual relationship with the students, we hope to make our lessons more compelling. Once data have been collected, students begin a process of Exploratory Data Analysis (EDA).
The Mobilize curricula all attempt to follow current best practices in statistics education. In particular, students participate in almost all the activities
suggested by Biehler et al. (2013):
“1. Students can practise graphical and numerical data analysis by developing an exploratory working style.
2. Students can construct models for random experiments and use computer simulation to study them.
3. Students can participate in ‘research in statistics,’ that is to say they participate in constructing, analyzing and comparing statistical methods.
4. Students can use, modify and create ‘embedded’ microworlds in the software for exploring statistical concepts.” (Biehler et al., 2013) Professional development is additional training for teachers In-service teachers are those currently certiﬁed as teachers and actively teaching All of these activities are woven throughout the Mobilize curricula, with the possible exception of 4. Students do not construct microworlds, but they do ask and answer questions using data. In IDS in particular, students learn to code in R within RStudio.
5.1.1 Participatory sensing Participatory sensing is a data collection method in which people participate in data collection using sensors. The term was developed by researchers at UCLA’s Center for Embedded Networked Sensing, an institutional partner and source of many PIs for both the Exploring Computer Science and Mobilize grants (Burke et al., 2006).
The term participatory sensing tends to defy deﬁnition, so it is most easily explained in terms of negative examples, or by its component parts, breaking it into its component words (‘participatory’ and ‘sensing’) for easier understanding.
‘Sensing’ means using sensors, and sensors could be anything from thermometers, scales, and pressure gauges to electronic sensors like the GPS and accelerometer in a modern smartphone. Often when we think of sensor data, we imagine it coming from electronic sensors programmed to collect data in a particular way (Kanhere, 2011), such as a water clarity sensor programmed to move along a trajectory in a pond, taking readings every 3 feet and storing the results. In the context of participatory sensing, sensors are not as autonomous, although there may be automatic data collection modes accompanying the human elements.
‘Participatory’ means humans participate in the data collection process. So, the water clarity sensor from above is not an example of participatory sensing.
It is certainly sensing, and, like all data collection methods, does involve humans in some way, but does not directly involve participants in the act of data collection. The Audubon bird count, which has been taking place since 1900, is a participatory data collection exercise without sensors. Each year, thousands of participants across the country report the breeds of birds they have seen in their area (Silvertown, 2009).
Starting in 2002, the Audubon Society has supplemented their traditional Christmas Bird Count with a program called eBird, which “engages a vast network of human observers (citizen-scientists) to report bird observations using standardized protocols” (Sullivan et al., 2009). More than 500,000 users have visited the site and have gathered 21 million bird records (Sullivan et al., 2009).
This activity is more closely approximating participatory sensing because it uses electronic data recording to standardize records, but it is still light on the sensing component.
For an example of both participatory as well as sensing, consider the Mobilize snack campaign. For this data collection exercise, students use a smartphone app to record information about every snack they eat. It is participatory because students are doing the data collection and are also the trigger for when data collection occurs. Unlike the sensor in the pond, programmed to record data every 3 feet, students are ‘programmed’ to record data every time they eat a snack. It is sensing because they are using smartphones to collect the data.
Although they are entering reasonably qualitative survey data, the phone is recording precise GPS coordinates and time stamps.
Another element characteristic of participatory sensing is the idea data need to be collected regularly over space and time to get an idea of how the phenomena varies. So, for example, measuring the heights of students in a class does not count as participatory sensing, because although students are participating in the data collection, and the ruler/yardstick could count as a sensor, there is no variation over space or time. It would not make sense to continue taking repeated measurements of the students’ heights throughout the week. If there was variation in the measurements, it would be due to random error rather than to some true variation.
Participatory sensing is an inherently democratic process. It involves people collecting data about themselves, for themselves. The data do not get sent to some higher authority and disappear. Instead, the participants should be able to see and participate in the data analysis process, or at the very least, receive information from the ﬁnal analysis. It is an eﬀort to give back access to data to those who create it, and balance the scales of information power, if only slightly.
22.214.171.124 Teenagers and participatory sensing
In 1993, George Cobb exhorted instructors to expose students to real-world data and involve them in collecting data. If both could be achieved at once, he called it a “best of two worlds” situation (Cobb, 1993). Statistics courses have gradually moved toward using more relevant data to engage students (Friel, 2008; Hall, 2011). Particularly since the explosion of data on the internet, data are being collected about students constantly, whether they know it or not (Meirelles, 2011). Whether they are tracking runs with Nike+ or simply giving oﬀ their location by posting on Instagram with location services enabled, students – and people in general – are producing digital data streams constantly.
Many statistics courses attempt to focus on data relevant to students’ lives (particularly those which they may be aware of producing already), such as social media data from Facebook or Twitter, data about music, or physical activity data (Gould, 2010; Lee and DuMont, 2010). With the rise of the “data natives” (young people who have grown up in the big data era where services are continuously predicting what an individual will do next), there will only be more data to use (Rogati, 2014) Involving students in participatory sensing is one more step along this trajectory. Teenagers are especially likely to be involved in participatory cultures (Jenkins et al., 2009). They are drawn to cultures with “relatively low barriers to artistic expression and civic engagement, strong support for creating and sharing one’s creations, and some type of informal mentorship whereby what is known by the most experienced is passed along to novices.” (Jenkins et al., 2009). Some examples of such cultures include blogs and Facebook. Obviously, there are access issues if participatory cultures are dependent on computer or internet access, and young adults often have trouble seeing the way media shape our culture. But, urban youth are actually more likely to be online content creators (Jenkins et al., 2009). Recognizing young adults as entrenched in participatory cultures suggests introducing data analysis within the context of participatory sensing may be highly appropriate.
In this context, the use of participatory sensing data in Mobilize is a natural next step. It is a “best of two worlds” situation, where students are handed back the power over their own data. The Mobilize technical team has developed an app making it easy for students to deploy ‘campaigns’ on issues they care about, and collect data directly on their phones (Tangmunarunkit et al., 2013). Each piece of Mobilize curriculum uses at least one participatory sensing campaign to ground the rest of the unit. So far, our campaigns have been less science-focused and more sociological (e.g., sleep patterns, advertising in neighborhoods, snacking habits).
126.96.36.199 User rewards for participatory sensing
The eBird project has produced research suggesting data collection participants submitted more data to the project when the program instituted more userrewards (Sullivan et al., 2009). For example, users have proﬁles on the site and there is a competitive component where proliﬁc contributors are featured on a leaderboard. Users are also rewarded with data visualizations including the data they helped collect (Sullivan et al., 2009). Showing the results of the data collection process is an important piece of citizen science (Bonney et al., 2009).
However, not all citizen science initiatives include this type of user incentive.