«University of California Los Angeles Bridging the Gap Between Tools for Learning and for Doing Statistics A dissertation submitted in partial ...»
Even the distinctions in the histories given by De Leeuw and Biehler suggest a dichotomy between tools for teaching and learning statistics, and those for legitimately doing statistics. De Leeuw explicitly states he is not concerned with “software for statistics”(De Leeuw, 2009) while Biehler is only interested in novices’ ability to grasp a tool with minimal instruction (Biehler, 1997). Whatever your perspective, it is clear there is gap between these two types of tools.
There are two main approaches taken with regard to the gap – either basing techniques for learning statistics on techniques for doing professional statistics, or using technology simply to illustrate statistical concepts (Biehler et al., 2013).
In fact, there are such distinct perspectives on this issue it is useful to have a visualization of their relationship, as seen in Figure 2.1. This ﬁgure shows the spectrum of opinions on the use of computers in education, from learning without a computer to learning with a tool speciﬁcally designed for students, to learning on a professional tool used by practitioners.
Interestingly, this spectrum is paralleled almost completely in the computer science education community, which has been studying novices’ use of computers much longer than statisticians have. The question in computer science has been whether it is better for learners to begin on a simpler programming language or to start directly in a language used by experts.
In statistics, there are some educators who believe the mathematical underpinnings of statistics are suﬃcient to provide novices with an intuitive understanding of the discipline. Because of this, they do not ﬁnd it imperative statistics education be accompanied by computation. In this paradigm, students learn basic concepts about sampling, distributions, and variability, and work through formulas by hand. At most, they use the computer to assist with their arithmetic calculations (statistical computation). This argument has two roots: one is some statisticians do not believe computers are necessary to do statistics, and the other is that even if doing real statistics requires a computer, students do not need to use one when they are ﬁrst starting out. However, this is an unfortunate line of reasoning, because it prevents students from seeing real applications of statistics (i.e., the interesting stuﬀ) until they have progressed to a second or third course.
One example to consider is the Advanced Placement Statistics course and associated exam. The AP Statistics teacher guide states “students are expected to use technological tools throughout the course,” but goes on to say technological tools in this context are primarily graphing calculators (Legacy, 2008).
Instead of building in computer work, the guide suggests exposing students to computer output so they can learn to interpret it.
Paralleling this argument in computer science, some experts believe stu-
dents should begin learning about programming without a computer. For example, the website “CS Unplugged,” promises “computer science... without a computer!” (Bell et al., 2010). It provides lessons to get students working with physical materials to understand concepts of data representations, algorithms (e.g., sorting, ranking), and other basic topics. Even at more advanced levels there is argument for beginning programming projects outside the computer (Rongas et al., 2004). Computing great Edsger Dijkstra argues we should not teach students how to program in the current language de jour. Instead, we should require novices to write formal proofs of their programs so they understand the logic underpinning the computation (Dijkstra, 1989).
On the other side of the spectrum, much of the research on computer science education (particularly in the context of language acquisition) has been done on Java (Fincher and Utting, 2010; K¨lling, 2010; Utting et al., 2010; Storey et al., o 2003; Hsia et al., 2005). This is due to Java being the language of choice in AP Computer Science since 2003. The use of Java on the AP CS exam follows Pascal and C++, which were used previously. The choice to move from C++ to Java for the AP exam was based on the popularity of object-oriented languages (which both C++ and Java are) and the fact that Java is simpler to learn than C++.
All three of the languages that have been used for the AP exam are compiled, which means programmers write code and pass it through a ‘compiler’ to get a packaged piece of code that can be run by the computer. In contrast, ‘interpreted’ or ‘scripting’ languages allow dynamic work, typing code and running it piece-by-piece to see smaller results. These two paradigms (compiled and interpreted) can be used to describe nearly any programming language, and they are diﬀerent in approach.
While both compiled and interpreted languages are ‘real’ programming languages, and the use of any real programming language in introductory computer science classes can be seen as an argument for the right end of the spectrum, compiled languages are generally considered to be tougher. So, it is interesting Java is used in AP Computer Science, while AP Statistics shies away from using a computer tool at all.
Additionally, recent eﬀorts to increase diversity in computer science have found interpreted languages can be more accessible because they allow novices to get results more quickly (Alvarado et al., 2012). Almost all languages used for statistical computing are interpreted, which suggests they may be more accessible for learners than the alternative, but also makes it hard to apply the body of computer science education research (generally on compiled languages) to the problem.
The argument for teaching students a real programming language is that they will have a skill they can apply in the workforce or in college. This echoes the argument that students should learn to use statistical programming tools used by professionals.
Over time, there has been a movement toward students as true ‘creators’ of computational statistical work. Deb Nolan and Duncan Temple Lang argue well for this in their paper, “Computing in the statistics curriculum,” where they suggest students should “compute with data in the practice of statistics” (Nolan and Temple Lang, 2010). They are promoting R, which is discussed in more detail in Section 3.8. Many universities and colleges are modifying their statistics courses to fall in line with Nolan and Temple Lang’s recommendations, but to date, modiﬁcations have happened primarily at the graduate level and are still trickling down to the undergraduate level or below.
However, both in statistics and computer science, there is a middling perspective on the spectrum. This perspective holds students should be using computers to learn, but specialized tools should be developed speciﬁcally for learners.
In computer science, there is a ﬁeld of programming languages developed particularly for novices. One of the earliest and most famous examples is Seymour Papert’s LOGO language (Papert, 1979). Papert was a student of Piaget, the psychologist responsible for most current theories of childhood development. Using his knowledge of how children learn, Papert developed LOGO in the 1970s. It used ‘turtle graphics,’ so called because an icon of a turtle served as the cursor on the screen, and some implementations used a robotic turtle on the classroom ﬂoor. This was to allow students to ‘see’ themselves in the cursor, and all the directionality was in relation to the turtle (meaning ‘up’ wasn’t necessarily the top of the screen, rather in the direction the turtle was pointed).
This and many other features were carefully constructed based on child psychology.
More accessible and modern, Scratch has often been used as a ﬁrst foray into programming (Resnick et al., 2009). It was implemented in Squeak, which is a dialect of Smalltalk. It is a blocks programming environment where students drag and drop elements together to create games, animations, or simple drawings. In the Exploring Computer Science curriculum (Section 5.1.3), Scratch is the most ‘programming-like’ element remaining.
After working with the language for a period of time, students will often ﬁnd tasks they want to do that are not possible in Scratch. This realization can prompt them to dig further into the implementation in Squeak to modify things about the Scratch interface. However, this is not a natural transition, and something like the gap between tools for doing and teaching statistics can be seen here. The Scratch team is not interested in raising the ceiling of the tool. Instead, they are focused on lowering the threshold (making it easier to get started) and “widening the walls,” providing more broad applications within the same simple framework (Resnick et al., 2009).
Another middling approach from computer science is the concept of lan-
guage levels. In this paradigm, experts identify pieces of a programming language most crucial for beginners to understand, and fence oﬀ just those parts of the language. One version of this is called DrJava, and includes three levels (Elementary, Intermediate, Advanced) mimicking the Dreyfus and Dreyfus hierarchy of programming ability (novice, advanced beginner, competence, proﬁciency, expert) (Hsia et al., 2005). Students can move from one level to the next, learning additional features and concepts of the language. Usually, these levels are nested, meaning all the features learned in level one are also available in level two, and so on.
In the context of tools for learning statistics, a distinction is made between route-type and landscape-type tools (Bakker, 2002). Route-type tools drive the trajectory of learning, determining the tools available to students and the skills they should be using. Most applets are route-type tools, because they only allow for one concept to be explored. Landscape-type tools are less directive, and allow users to explore the ﬁeld of possibilities themselves. TinkerPlots and Fathom are both landscape-type tools.
The route/landscape dichotomy is not paralleled well in computer science, as most programming tools are, by nature, landscape-type tools. However, while programming languages can be used to accomplish many tasks, they tend to obscure the range of tasks possible. Instead, users only consider the functions to which they have been exposed. Fathom and TinkerPlots make the landscape of possibilities more visible by arranging them in a palette of tools. While routetype tools for learning feel more restrictive, they can be appealing to teachers because they make it clear what students should be doing at any time, and limit the prior knowledge the teacher must bring to the classroom.
Novices who begin with a tool designed for learning – whether it is an applet or a full software package – encounter a lower barrier to entry, but there is still a cognitive task associated with learning how to use the interface. When they move on to additional statistical tasks, either in higher-level coursework or in the corporate world, they need more complex tools, and are forced to learn yet another interface. Unfortunately, because researchers have not studied this transition, at least in the context of statistics, there is little scaﬀolding to make the transition easier.
Instead, users are essentially returned to the novice state. Their experience with the learning tool presumably does not transfer well to their use of the tool for truly doing statistics. So in a sense, the eﬀort to learn that initial tool is wasted. Alternatively, novices could be started immediately with a tool for doing statistics, which would allow them to skip the ‘wasted’ eﬀort of learning to use an introductory tool, but would still incur the high startup costs of learning the tool. Whether beginning with a professional tool immediately or transitioning to it after a learning tool, the threshold for entry can be so high as to make users believe they are not capable of learning it.
There are many approaches that could be used to close this gap. For example, explicit curricular resources making reference to the prior tool and couching tasks in terminology from the previous system might make the transition easier. Likewise, providing some sort of ‘ramping up’ as users reach the end of the abilities of the learning tool and ‘ramping down’ at the beginning of the tool for doing could make the gap feel less abrupt.
2.4 Removing the gap My argument is the gap between learning and doing statistics should be removed entirely by creating a new type of tool, bridging from a supportive tool for learning to an expressive tool for doing.
The belief there should be no gap between tools for learning and tools for doing statistics is not shared by everyone. In particular, Biehler’s 1997 paper urges statisticians to take responsibility for developing appropriate tools for use in education, and Cliﬀ Konold argues for the need for separate tools for these two cognitive tasks (Biehler, 1997; Konold, 2007). These arguments are summarized and expanded on by Susan Friel, who also believes students should not be using industry-standard tools (Friel, 2008). Instead, all three authors contend separate computer tools should be created with the speciﬁc goal of teaching students statistics.
I take issue with the idea of separate tools for teaching, as it removes novices from the experience of being true ‘creators’ of statistical products. As we will see in the more detailed analysis of tools, products like Fathom and TinkerPlots only allow users to work with data up to a certain size, and do not provide methods for sharing the results of analysis.